- Wey, Alex;
- Hemmatian, Babak;
- Avram, Rachel;
- Feucht, Sheridan;
- Spitalnic, Kate;
- Garg, Muskaan;
- Eickhoff, Carsten;
- Pavlick, Ellie;
- Sandstede, Björn;
- Sloman, Steven
Text-generation algorithms like GPT-2 (Radford et al., 2019) and GPT-3 (Brown et al., 2020) produce documents which resemble coherent human writing. But no study has compared the discourse linguistics features of the artificial text with that of comparable human content. We used a sample of Reddit and news discourse as prompts to generate artificial text using fine-tuned GPT-2 (Grover; Zellers et al., 2019). Blind annotators identified clause-level discourse features (e.g., states and events; Smith, 2003), and coherence relations (e.g., contrast; Wolf and Gibson, 2005) in prompts and generated text. Comparing the >20000 clauses, Grover recreates human word co-occurrence patterns and clause types across discourse modes. However, its coherence relations have shorter length and lower quality, with many nonsensical instances. Therefore, annotators could perfectly guess the human/algorithmic source of documents. Using a corresponding GPT-3 sample, we discuss aspects of generation that have and have not improved since Grover.