Zipf’s law asserts that words form a power law distribution: word frequency is inversely proportional to rank. Relatively recent cognitive and usage-based linguistics argue that speech differs structurally from writing. Except for a few older analyses performed on tiny corpora, studies of Zipf’s law prior to 2021 have been done on written corpora and use informal methods to determine Zipfianness.
We argue that recent work indicating that transcribed speech forms a Zipfian distribution can be extended to the speech of traditionally developing children. Further, we show that the transcribed speech of children with a clinical diagnosis of autism spectrum disorder is non-Zipfian. These judgements are made using formal statistical techniques developed in Clauset (2009). They include the Kolmogorov-Smirnov statistic for goodness-of-fit and likelihood ratio to rule out other distributions.
Clauset, A., Shalizi, C. R., and Newman, M. E. J. (2009). Power-Law Distributions in Empirical Data. SIAM Review, 51(4), 661–703.