Zipf`s Law of Abbreviation

G. K. Zipf`s groundbreaking research on the relationship between word frequency and other word characteristics led to the formulation of various linguistic laws. The most popular is Zipf`s law for word frequencies. We focus here on two laws that have been studied less intensively: the law of sense-frequency, i.e. the tendency of more common words to be more polysemic, and the law of abbreviation, i.e. the tendency of more common words to be shorter. In a previous article, we tested the robustness of these zipfian laws for English by roughly measuring the length of words in number of characters and distinguishing the language of adults and children. In this article, we extend our study to other languages (Dutch and Spanish) and introduce two additional measures of length: syllable length and phonemic length. Our correlation analysis shows that the distribution-frequency-meaning and the abbreviation law as a whole apply in all languages analyzed. Because of the same principle of least effort, language users are more likely to use shorter words than longer words (Piantadosi et al., 2011; Zipf, 1949), a law often referred to as Zipf`s abbreviation law. Zipf (1949) explained this phenomenon from an evolutionary point of view, arguing that more commonly used words would benefit more from a reduction in size, which would lead to more effective communication. Speakers save their effort by using a united force by using a few different words, but each of them frequently.

Listeners, on the other hand, save their efforts with a force of diversification and prefer many different words in their communication, each of which is relatively rare. In linguistics, the law of abbreviation (also called Zipf`s law of abbreviations) is a linguistic law that qualitatively states that the more frequently a word is used, the shorter that word is and vice versa; The less frequently a word is used, the longer it is. [1] This is a statistical regularity found in natural languages and other natural systems that claims to be a general rule. Our proposed cognitive effort hypothesis obviously corresponds well to Zipf`s principle of least effort. But even if the forces of unity and diversification that Zipf identified as causes of conformity to Zipf`s law are clearly defined in terms of cognitive effort, this is not necessarily true for all parts of the principle. For example, Zipf`s law of abbreviation refers, at least in part, to physical exertion (Zipf, 1949). This optimization of physical exertion could be due to various processes – for example, language development – and seems to be more static. But as we have seen, the linguistic unit that is probably closest to Zipf`s law of abbreviation – the length of the utterance – has shown models consistent with our hypothesis of cognitive effort. This is an important finding, as it means that at least some of the economic optimization that led to Zipf`s law is dynamic and implies that speakers and listeners do not actively choose to reduce their effort, but that the limited cognitive resources available lead to a reduction in cognitive effort. Thus, the economics of linguistic contribution is due at least in part to automatic cognitive processes that are not under the cognitive control of the language user. Williams et al.

(2015) proposed that Zipf`s law occurs at the sentence level. Because spoken dialogue is very dynamic, sentence and statement segments are usually short (Levinson & Torreira, 2015). Because these segments are short and subject to repeat, we expected the frequency distribution between statements to be inconsistent. Therefore, we recorded the frequency distribution via utterances, where each unambiguous statement is taken as a single countable observation. Frequency distributions over word length follow Zipf`s law of abbreviations, with shorter words being more frequent (Piantadosi et al., 2011; Zipf, 1949). The tendency to reduce word length can be explained by the principle of least effort (Ferrer-i-Cancho et al., 2022; Zipf, 1949). Since this trend probably occurs at several levels in language and communication, we hypothesized that Zipf`s law of abbreviation also applies to the level of utterance length, with an inverse relationship between the length of an utterance, measured by the number of words, and its frequency. In fact, we have found that this is the case on a small scale in dialogues between a dialogue system and a user (Linders & Louwerse, 2020).