Why Does Language Complexity Resist Measurement?

Insofar as linguists operate with a conception of languages as closed and self-contained systems, there should be no obstacle to comparing those systems in terms of simplicity and complexity. Even if complexity ‘trade-offs’ between sub-systems of phonology, morphology and syntax are considered, it ought to be relatively straightforward to quantify constitutive elements and rules, and assign each language system its place on a complexity scale. In practice, however, such attempts have turned up a series of problems and paradoxes, which can be seen in work by Peter Trudgill and Johanna Nichols; the latter has proposed an alternative means of measuring complexity which presents new problems of its own. This paper makes the case that overcoming the difficulty of measuring simplicity and complexity requires confronting the normative and interpretative judgments that enter into how language systems are conceived, identified and analysed.


INTRODUCTION
Linguistic simplicity and complexity have been the site of such profound scepticism over such a long period that one has to admire the defiant persistence of those who pursue its investigation. Their work generally shows a keen awareness of the conceptual and methodological difficulties which the question represents, and a determination to get on with their research despite the various ways in which language complexity resists measurement.
Sometimes, intentionally or not, these researchers subtly signal their own scepticism. A case in point is when Nichols (2019) examines how the presence of grammatical gender in a language apparently correlates with a high overall level of systemic complexity. Although Nichols applies the commonly used method of 'inventory complexity', based on the number of elements in the system and of rules applied to them, she cautions that this is not a very accurate or satisfactory measure of complexity, not least because it does not measure non-transparency, which is the kind of complexity that has been shown to be shaped by sociolinguistics (Trudgill, 2011); but it is straightforward to calculate (though data gathering can be laborious), and appears to correlate reasonably well with other, better measures. (Nichols, 2019: 64) By 'non-transparency' Nichols means the degree to which an element falls short of an idealised situation (transparency) in which one form maps to one and only one meaning, and vice-versa. Two languages with the same number of elements and rules will be assessed as having identical inventory complexity, when in fact, if one has more transparency than the other, it is less complex.
Trudgill repeatedly cites the Latin ablative plural inflection -ibus (as in hominibus 'from the men') as lacking transparency, because it cannot be divided into a plural morpheme and an ablative morpheme, and moreover it is identical to the dative plural form (hominibus 'to the men'). This makes it more complex than its Turkish equivalent adamlardan 'from the men', which segments transparently into adam 'man', lar (plural) and dan (ablative) (Trudgill, 2011: 92). Yet an inventory complexity analysis would say that Turkish is the more complex language, having more inflectional morphemes. Inventory analysis is focussed on forms rather than meaning, and misses the complexity which inheres in the form-meaning relationship.
Nichols however continues to use inventory analysis for practical reasons ('straightforward to calculate'), supplementing it with what she calls 'descriptive complexity', the amount of information required to describe a system. 1 This has the reverse strengths and weaknesses of the inventory measure, being more resistant to quantification but able to accommodate a much wider range of complexifying factors. As an example, her inventory complexity analysis of Mongolian and Russian singular core grammatical cases is:

Declensions
Genders Mongolian 1 0 Russian 5 3, plus animacy The descriptive complexity analysis is based on what is required for 'a descriptively and theoretically adequate synchronic grammar', and is as follows: Mongolian noun paradigms: Display 1 paradigm, plus 1 extended; Access phonological information.
By the inventory measure it would appear that Russian noun paradigms are 8 times more complex than Mongolian ones. Nichols does not venture a numerical figure for the descriptive measure, the point of which may simply be to reassure anyone sceptical about inventory measurement that both methods show Russian to be considerably more complex. Nichols's inclusion of descriptive complexity implies, or at least implicitly acknowledges, scepticism about the more standard inventory complexity, whilst offering evidence that its flaws do not cancel or outweigh what it reveals. It is simultaneously a critical and a defensive moment.
Such moments, when analysts raise a criticism of their own methodology, then proceed to dismantle or contain the criticism, offer valuable insight into what the practitioners understand their analysis to be doing. Because it is themselves and not colleagues who are under their critical gaze, they omit the usual gestures of courteous deference and get straight to the point; they lower their guard, relax the authoritative scientific voice and let us hear echoes of the debate transpiring in their own mind. I have opened with this look at Nichols's alternative form of measurement in order to establish that I am not launching a critique, but looking at how the people directly invested in this scientific enterprise are struggling with basic matters of how it is conducted and what it purports to show. I want to suggest that it is worth considering whether the issues may be linked to developments in other areas of linguistics, historical and contemporary.
Within research on linguistic complexity we find a continuum with, at one end, work of a deeply quantitative nature, aimed at developing a precise scale of complexity; in the centre, work that is quantitative but cautious about precise measures because of the obstacles to obtaining them; and at the other end, work that does not try to establish numerical measures, only descriptive ones. 2 I shall focus on the second two, as represented at their best in the work of Nichols and Trudgill respectively. Nichols, despite her caution, takes on the quantitative burden sufficiently that whatever methodological conclusions I may deduce from her work can be taken to apply a fortiori to more gung-ho quantitative researchers.
I shall also look at what precisely the quantitative measures are weighing up, which are never raw production data, but the generalised results of analysis, in the form of phonologies, grammars, lexicons and other sub-systems, and ultimately the language system as a whole. This involves selecting certain manifestations of the language for examination, and leaving others aside; and then applying certain ways of analysing a language system, whilst again ignoring others. There is no universally accepted analytical format, and indeed we sometimes find the same linguists applying different types of analysis at different stages of their career. It will become clear that analysis involves normative judgments at numerous levels on the linguist's part, judgments which the field's methodological doctrine requires to be hidden and denied. It is this covert normative content that, I shall argue, keeps the complexity of languages from being readily measured and compared.

EXPERIENCING COMPLEXITY
Structurally, there is no reason in principle why one language should not be more complex than another, in whole or in part. The existence or non-existence of a feature such as gender inflection of inanimate nouns and the adjectives which modify them seems like a clear-cut example of relative complexity and simplicity, as do the larger or smaller phonological inventory 1 The two types are implicit in Miestamo's (2008: 26) statement that 'complexity should be defined, to put it in the most general terms, as the number of parts in a system or the length of its description'.
2 Different points on this continuum are occupied by studies that compare a small number of languages and those that use larger and more 'ecological' datasets, and by those incorporating measures based on entropy or patterns of co-occurrences along with feature counts. All of these raise significant issues and in some cases call for mitigations which the limited scope of this article regrettably demands that I leave aside. of a language, the number of its morphophonological rules, verb tenses and moods, obligatory syntactic permutations and so on.
On the other hand, what people grow up doing becomes second nature to them. Whatever language they are accustomed to is simpler for them than ones they are unaccustomed to, so for individual speakers of two different languages, their mother tongues present no degrees of complexity for them on a psychological or practical level. No language has been shown to be harder than another for children to learn as their first language; if children simplify some structures, for instance regularising irregular verbs in English (I goed rather than I went), it is from the adult point of view that this constitutes a simplificationimplicitly an oversimplification, when I goed is classed as not 'correct' English, or just not English. 3 This throws into question what, if anything, the apparent differences in structural complexity really mean. They might be mere artefacts of our structural analysisexcept that the simplicity or complexity of languages is part of the everyday experience of multilingual people, including students of a second language (Pallotti, 2014 includes a good summary of work on 'outsider' or 'relative' complexity). Multilinguals are not some rare exception that can be ignored, but 'make up a significant proportion of the population' (Bialystok et al., 2012: 240). 4 As a learner of Arabic and various European languages, I find the gender inflection of nouns and adjectives to be a complexity relative to the absence of such gender inflection in English. Cantonese has no inflections, making it seem to me, as a learner, to offer an altogether simpler structure, though I have found its system of tones difficult to master. These reactions are not at all unique to me, but are shared by other learners.
Even monolinguals regularly encounter complexity within their one language, complexity which has to be 'translated' into a simpler form in order to be understoodwhat linguists analyse in terms of 'register', where it can be unclear whether the simplicity or complexity is located within what Saussure termed langue, the system, or parole, use of the system. As Hiltunen (2012: 41) states, 'Legal syntax is distinctly idiosyncratic in terms of both the structure and arrangement of the principal sentence elements'. If you can speak and understand Legal English perfectly, and I can manage only bits of it, it is not evident that the register in question is the same langue as I possess, just put to use differently. And whether we are dealing with one langue or two, the systematic divergences which I perceive as complexities in 'legalese', a lawyer might argue exist in order to eliminate ambiguity and imprecision, and hence represent greater simplicity. 5 Neither of us is likely to assert that they are equally simple.
Being part of everyday experience is prima facie evidence that something is not an illusion or an analytical artefact, but real. The idea of simpler and more complex language structure has both logic and common experience on its side. The lack of evidence that any particular language is harder than any other for mother-tongue speakers to learn does not prove the equal structural complexity of languages. It is when linguists set out to measure simplicity and complexity that problems arise on the conceptual and methodological levels. The problems are exacerbated by attempts to explain, elucidate and interpret them in more general mental and cultural terms, which leads us into the putative psychology of peoples. Although Joseph and Newmeyer (2012) conclude that once such interpretations are set aside it should be possible to conduct sound investigations of linguistic complexity, that does not eliminate basic methodological and conceptual obstacles to its measurement.

SYSTEM AND STRUCTURE
Trudgill's (2011) 'sociolinguistic typology' aims to establish correlations between linguistic complexity and how the language community is constituted in terms of size, stability, amount of contact with outsiders, density of social networks and amounts of communally-shared information. The conditions which permit complexity to develop are when the community is relatively small and stable, has little contact with adult outsiders, and has networks and information contained enough to produce a 'society of intimates'. The reverse conditions favour simplification. Trudgill uses pronoun systems as an example (2011: 175-8), noting that 'the smallgroup indigenous languages of Australia typically have at least 11 personal pronouns, involving first, second, and third persons; singular, dual, and plural numbers; and inclusive and exclusive "we"' (174), whilst the South African language !Ora has a '31pronoun system, which distinguishes between male and female [and additionally has a 'common gender'] in the first and second 3 The difference between not correct English and not English is more problematic than linguists generally take it to be, and will be discussed in Constitutive and Regulative Rules. Children sometimes ask 'Why I went, and not I goed?', to which the typical parent will reply 'That's just how it is', whilst expecting that a professional linguist could provide a better answer. In fact a linguist will say the equivalent of 'That's just how it is', but at greater length and in a different register, evoking for instance causal mechanisms or evolutionary trajectories or simply the term 'suppletion'. Naming the phenomenon provides a sense that it is under our control, and can even be taken as the equivalent of explaining it, by linguists and non-linguists alike. In general, though, the lack of a detailed explanation for a phenomenon such as suppletion is exceptional. Linguists find it disturbing, and may trot out the observation that suppletion tends to occur with high frequency words. This is not exactly an explanation either, but at least points to something the average non-linguist might not notice. 4 Bialystok et al. actually say this about 'bilinguals', who are sometimes taken as a separate category from 'multilinguals', though I am using 'multilingual' to mean anyone who is not monolingual. Bialystok et al. begin their article by saying that 'It is generally believed that more than half of the world's population is bilingual', citing Grosjean (2010) as their authority. There is in fact no reliable measure for or against the general belief, which additionally depends on the vexed matter of what gets counted as the same or different languages. 5 A parallel argument is put forward in Morris (1938: 26), the founding document of pragmatics, with regard to the 'special and restricted languages' of the sciences and the arts, as opposed to 'universal' languages ('English, French, German, etc.' as used in non-specialist contexts). In the latter, 'it is often very difficult to know within which dimension a certain sign is predominantly functioning, and the various levels of symbolic reference are not clearly indicated. Such languages are therefore ambiguous and give rise to explicit contradictions'. as well as third persons, which has dual number, and which contrasts exclusive and inclusive "we"' (175). 6 He remarks that This contrasts dramatically with, say, the simple 8pronoun system of French: je nous tu vous il ils elle elles or the 7-pronoun system of Standard English: It is noteworthy that he specifies 'Standard English', as he gives the most minimal inventory of pronouns in order to make the contrast with !Ora as stark as possible. In much of the English-speaking world there is a second-person plural form: you all or y'all, you lot, you guys or yous guys, you ones or you'uns or yinz, and still other variants, all understandable to speakers of English including those who do not use them, and therefore part of their 'grammar'. That is also the case with thou and ye, used by vast numbers of English speakers in specific contexts, and by smaller numbers in particular dialects. If y'all is classified as a plural, it is hard to justify not labelling you two a dualand in fact both we and you can be followed by a specifying numeral without limit. Linguists would not generally classify these as distinct forms; but why not? Whatever answer might be given to that question, for example that they are not morphological forms but syntactic combinations, involves an analytical judgment resting on where one sees morphology ending and syntax beginning, when some linguists deny that any boundary exists between them.
Trudgill ignores the impersonal pronoun one, and more problematically, its French counterpart on, since in the French case it cannot be claimed that its use is limited to 'high' registers. In fact it is the most common first-person plural form in spoken French, and is also used for first-person singular reference (as in English), and sometimes for second-person. The French of many regions has an exclusive form nous autres 'we (others)' (where it is the person addressed who is excluded), alongside the inclusive nous 'we'; and a form vous autres 'you (others)' for the second-person plural, alongside vous as the singular polite form 'you'something left out of Trudgill's chart entirely, and which would make the '8-person system' less simple. 7 With the third-person pronouns the 'simple' system is in the throes of complexification. English speakers are experiencing a grammatical evolution that has been transpiring over several decades within the third-person singular pronoun system. In an earlier phase of the language, the masculine singular was also the generic form; and with reference to a specific person, the choice of masculine or feminine was made by the speaker, based on the perceived physical gender of the person referred to. The evolution has resulted in an augmentation of this system, with several new, nonbinary pronominal forms having developed, in addition to use of the plural, sometimes with a singular verb, or of both the masculine and feminine; and with, in many contexts, speakers expected to use the preferred pronouns specified by the person referred to.
Earlier system: he/him/his (masc. & gen. sg.); she/her/ hers (fem. sg.); it/its (neut. sg.) New system: he/him/his (masc. sg.); she/her/hers (fem. sg.); it/its (neut. sg.); they/them/theirs, zie/zim/zis, sie/ sie/hirs, ey/em/eirs, ve/ver/vers, tey/ter/ters, e/em/ers (all non-binary, with choice specified by person referred to) By both of Nichols's (2019) measures, inventory complexity and descriptive complexity, the new system is considerably more complex than the older one. Less clear is whether linguists would accept that the new system should be taken into consideration in an assessment of the complexity of English pronouns. The division of labour in linguistics is such that the new system is considered the business of a sociopolitical discourse world separate from the structural analysis which is the basis of complexity measures. Even Trudgill's sociolinguistic typology does not try to break down the wall between them: he takes the systems to be what the grammars say they are, and assesses complexity on the basis of that alone; then uses social characteristics of the language community to explain why they are simple or complex. That is consistent with the dominant view within linguistics: when linguists see individuals discussing a question of language form such as the use of non-binary pronouns, that seems ipso facto to disqualify it from the sort of 'natural' development they associate with language structure, and to make it instead a matter of how the structure is used.
Linguists take language structure to be unconscious. For so long as speakers other than professional linguists are talking about a structure, it is suspect: it figures in parole, but not (yet) in langue. The discourse about non-binary pronouns is not part of natural unconscious language structure; moreover, if it is not exactly prescriptivist, it comes close enough, and the first creed of modern linguistics is that it deals with description rather than prescription. Only when the pronouns stop being talked about, and are just used, will they be treated as real by linguists who work with language structure alone, rather than the social or political dimensions of language. And only then can measurement commenceat which point further conceptual and methodological difficulties arise.
To call the new English pronoun system more complex is not a value-free description. There is no more powerful philosophical and scientific dictum than Occam's razor. 8 Other things being equal, simplicity is preferable to complexity. Linguists working in this area (e.g. Miestamo, 2008;Hawkins, 2009) have sometimes 6 !Ora, a Khoe-Kwadi language, is called 'extinct' by Trudgill. According to Vossen (2013: 10), !Ora (also known as !Gora, !ora, Korana) is still 'said to be spoken by just a handful of persons in South Africa. For a long time it was believed to be extinct'. 7 In Quebec French nous autres means just 'we', with no exclusivity implied. The fact that nous autres and vous autres are written as two words (unlike their Spanish equivalents nosotros and vosotros) likely plays a part in their 'invisibility' as distinct pronominal forms. They frequently appear as a single word in dialect writing. 8 The history of Occam's razor is ironically complex, but William of Occam did write 'Frustra fit per plura quod potest fieri per pauciora' (It is futile to do through more things what can be done through fewer, Summa totius logicae i.12). Ball (2016) offers an interesting perspective on 'the tyranny of simple explanations' in the history of science. stressed that what is complex in one perspective may be simple in another. To someone fighting a long-term battle against being boxed into a gender they reject, the evolution of the English pronoun system may well seem like a simplification: it allows non-binariness to be expressed with the same structural ease as binary divisions are. To the eyes of a linguist like Trudgill, if he were to accept it as part of the language system, it would appear as a complexification of the system; and it would run counter to his prediction that complexification will not occur in today's postintimate societies. What the prediction leaves out is that social intimacy can take new formsincluding the online social 'bubbles' within which the most recent demands for changes to the pronoun system have developed and spread.
The changes reduce transparency, in as much as several forms have been introduced for the same meaning of non-binariness, but at least as important is the fact is that non-binariness is itself a meaning that previously was not represented in the system. It was already there, as a meaning, for large numbers of people, and was denied linguistic expression by the majority, for whom it was too complex to have to deal with, even though, conceptually, the unity of non-binariness is simpler than division into genders. 9 Trudgill does not attempt to quantify complexity, and his statement that !Ora pronouns are more complex than French or English ones may well standone would want to know more about the contexts of use for all the forms before making a definitive judgmenteven after we have drawn aside the curtain and revealed the Wizard of Norwich pulling levers to make the European-African contrast appear as 'dramatic' as possible. If however Nichols were to turn Trudgill's statement about !Ora into a calculation that its pronoun system has 3.88 times the Inventory Complexity of French and 4.43 times that of English -figures that might even be increased if reckoned by Descriptive Complexity, since Trudgill gives no scope for any factor other than person and number for the French and English pronounsit should be clear how the numbers depend directly on the choices made in the analysis.

INVENTORY COMPLEXITY: MEASUREMENT AND REDUCTION
The levels and categories of linguistic analysis were created with the aim of identifying order, rather than measuring it. In a sense, identifying order within a language is a way of simplifying it for purposes of analysis and understanding: when a set of hundreds of Latin words is reduced to one root verb and half a dozen morphological categories (person, number, tense, aspect, mood etc., which in combination take hundreds of inflectional endings to express them), that certainly simplifies the picture for the analystwho may then assume that this was the mental system of every ancient Roman speaker of Latin. That is a deductive leap. As Sapir (1921: 39) famously wrote, 'All grammars leak', which is a way of saying that a grammar can never be more than an approximative account, and never definitive. When the complexity of grammatical categories is being compared in two or more languages, the measurements are taken from two or more approximative accounts, usually made by different analysts.
For purposes of comparison, the same categoriesconsonant, gender, passive, definite etc.need to be applied in analysing the two or more languages. In the best of circumstances, the grammatical accounts being used will have been drawn up after investigation of whether the categories are actually the same across the languages, and not assumed to be the same because of partial overlap and use of the same English grammatical category (or whatever language the analysis is written in) to translate them. This is not always the case, and some of the serious consequences are laid out by Haspelmath (2018) (for an alternative perspective, see Spike, 2020). Most of the principal analytical categories that linguists make use of, starting with noun, verb, adjective, adverb, preposition, sentence, case, tense, mood, number, person, voice, conjunction, subordination, originated in the analysis of Latin, and the question is whether they can be applied to any language, barring compelling evidence to the contrary in specific cases. Already within Latin, there is ample inscriptional evidence that all Romans did not speak alike, and that the earliest grammars were not intended to capture how all Romans spoke, but to devise a systematic schema for producing and comprehending a somewhat idealised form of the language, more regular and logical than what one heard in the streets or read on latrine walls.
When measuring and comparing complexity in Latin and some other language which has been analysed following the tradition ultimately deriving from Latin grammars, what is being compared are usually these somewhat idealised forms. That would be less problematic if one could ascertain that the idealisations were reached in the same way, or indeed that a category such as verb means exactly the same thing in, say, Latin and Chinese. The particular difficulty in this instance is that in Latin a verb can usually be identified by its morphology, whereas in Chinese it cannot, so Chinese verbs are those words which translate what are identifiable as verbs in languages with distinct verbal morphology. Every linguistic category presents this problem between any two languages, and not just unrelated ones, though perhaps especially with them. Do the categories really mean the same, do they do the same functional work? Are the functions of language universal, or culture-specific, or more specific still?
Identifying categories functionally for purposes of measuring complexity presents further difficulties, and not just with regard to language. A simple hammer can be made by joining a head to a handle; it can be complexified by adding a claw, a neck, a grip, or even, as Homer Simpson discovered, electric power. How would one measure the degree of complexity which each of these additions represents? Not by what it can do that a simple hammer cannot, such as extricating a nail using the claw: that would be some sort of efficiency measure, not one of complexity. Perhaps by the amount of additional time it takes to produce the more complex hammer, under identical conditions. That seems reasonable and methodologically feasible: assemble a group of hammersmiths, give them the necessary materials and time their production of hammers of various types. Yet in reality nearly all hammer heads are made by casting steel, and producing one with a claw or neck will take the same amount of time and effort once the cast is made. A rubber grip, on the other hand, requires a direct expense of manufacturing time and material, but does not make the hammer more functionally complex.
When it comes to languages, it does not seem to be the case that some of them have the structural equivalent of a hammer claw, making it possible to extract nails from wood, whilst others do not. In functional terms, whatever can be done using one language can also be done using any other, even if by different structural means; more precisely, in scientific terms, it has never been shown that a particular utterance in language x cannot be translated into language y. The utterance and its translation may differ in perceived efficiency of expression or in aesthetic effect, but on the level of meaning, of 'message', what is conveyed in x can be rendered, expressed, explained in y. This is subject to the proviso that, even within a single language, meaning is not a matter of a message being transmitted directly from a speaker's mind to a hearer's; it has to be interpreted by the hearer, which is to say that the meaning of the utterance is reconstructed, co-constructed. Long-standing views about the 'impossibility of translation' (see Joseph, 1998) have been dependent on an idealised conception of meaning transmission, characterised by Reddy (1979) as the 'conduit metaphor'. In any case, such views have not tended to differentiate between translation into a closely related, structurally similar language on the one hand, and a language perceived as being at a different level of structural complexity on the other.
Moving from particular structural levels to global assessment of the comparative simplicity and complexity of languages, we encounter the notion of 'complexity trade-offs', whereby for example a smaller phonemic inventory might be compensated for by greater word length. This fits in with the approach which treats the functions fulfilled by languages as universal: the function being invariable, the complexity of the linguistic means by which it is carried out should also be invariable in its totality, but may vary in its component parts. This was crucial to the doctrine of equal linguistic complexity which was asserted in a strong form starting in the 1950s, in part as a reaction against claims of the superiority of some cultures over others (see Joseph and Newmeyer, 2012). Since at least Gabelentz (1891) it has been recognised as well that perceived simplicity in the system for language production (Bequemlichkeit) does not equate with simplicity of understanding and interpretation (Deutlichkeit). On the contrary, they seem in at least some instances to be directly opposed to one another.
These difficulties have led some to reject global assessment of a language's complexity in favour of level-specific assessment. Nevertheless, the conception of the language system which figures in complexity research is of a closed system (apart from lexicon and other levels discussed in Constitutive and Regulative Rules below), and a closed system should in principle be measurable in terms of how simple or complex it is relative to another closed system.

DESCRIPTIVE COMPLEXITY
Replacing or complementing inventory complexity with descriptive complexity has many advantages, as Nichols (2019) points out, though she also acknowledges that it is more resistant to precise quantification. Comparing descriptive complexity across languages obviously requires that their structures be described in the same way, or as similarly as possible. Ideally the linguists doing the comparing would be the ones who collected and analysed the data and wrote up the initial descriptions; in practice, the linguists doing the comparing tend to work at least partly with descriptions drawn up by others. Differences in methodological handling of the data, from collection to analysis to description, are seldom recoverable, and even when they are, any attempt to incorporate them into a new description being created for measurement of descriptive complexity could only be approximative and might well introduce as much distortion as it eliminates.
It is an old debate within linguistics whether descriptive practice should aim for observational objectivity or should take account of how speakers themselves understand (or 'feel') how the language is structured: this is the 'etic-emic' debate, a locus classicus for which is Sapir (1933). It rarely surfaces in work on linguistic complexity, where the starting point is the completed grammatical analysis. Scepticism about inventory complexity is based in part on concerns about the mapping of form and meaning, where something of the emic critique of etic analysis comes through. Descriptive complexity alleviates some of these concerns, but by no means all of them.
Differences in descriptive practice hark back to the earliest known linguistic analyses. The Aṣt ̣ adhy ay i ('Eight chapters') of P aṇ ini is a reduction of the Sanskrit language to the simplest possible form, in a logical sense. It consists of 3,959 sutras covering the whole of Sanskrit phonology and grammar. The sutras are written in an extremely compact style, perhaps to aid memorisation and repetition. This gives them the character of mathematical formulas, which start from an abstract base form, then use complex rules to derive the actually occurring forms from it. A sense of its character comes through from considering just the first two sutras: 1.1.1 vṛ ddhir adaic 1.1.2 adeṆ guṇ aḥ The first sutra says, in effect: vṛddhi a or aic. The word vṛddhi, meaning growth or increase, is used to indicate a 'strengthening' of the vowel /a/ under certain conditions. The sutra specifies that, under vṛddhi, /a/ can be doubled in length to / a/, or else can become 'aic' the formula for the set consisting of the two diphthongs /ai/ and /au/. Such a set, called a paribasa, is something one has to know separately. Knowledge of it is assumed by the sutra.
The second sutra says: a or eṆ guṇ a. This defines a lesser grade of strengthening of a which is termed guṇ a. The sutra specifies that, under guṇ a, /a/ can either remain as /a/ or else can become 'eṆ 'the formula for another paribasa, consisting of the long vowels / e/ and / o/ (classed with diphthongs in Sanskrit grammar). Thus the first sutra can be translated in an expanded form as 'The term vṛddhi covers the sounds / a ai au/', and the second sutra as 'The term guṇ a covers the sounds /a e o/'. Economy has so driven the structure of the text as to make it extraordinarily difficult to follow, indeed impossible except to adepts. The fact that symbols are used before they are explained is only one part of this difficulty. With many of the sutras, how they are to be expanded is a vexed question, which is why a long tradition of commentaries on P aṇ ini arose.
How does the descriptive complexity of these two sutras compare? They are of approximately identical length; each requires additional knowledge which is signalled but not spelled out. They cover the same number of sounds. On the other hand, the first sutra describes what for a modern linguist are familiar processes of lengthening and diphthongisation, which can be taken as straightforward and requiring no explanation, just specification of the circumstances under which it applies. The second sutra describes what to modern eyes is a complex and intricate relationship of a set of vowels and diphthongs, calling for elucidation and explanation, in addition to specification of circumstances; linguists with Indo-Europeanist training will also want to be told how it relates to the development of these vowels from Proto-Indo-European to Sanskrit, but that goes beyond the bounds of 'description'or does it?
Within the context for which these descriptions were created, the grammatical tradition of the language described, vṛddhi and guṇ a exhibit equal complexity. Taken out of this 'native' context and translated into descriptions which answer the questions a modern non-Sanskritist expects to have answered, guṇ a is of greater descriptive complexity. This is in part a version of the etic-emic debate, and in part an example of the potential disjuncture between the complexity of the description and that of the phenomenon described.
Even when we consider just modern linguistics, we encounter cases of the same linguist analysing the same structure in simpler and more complex ways (see further Bulté and Housen, 2012). Mazziotta (2019) and Joseph (forthcoming) examine Lucien Tesnière's analysis of the same sentence in 1934 and again two decades later. 10 The earlier version treats the sentence as a 'solar system' with a verb at its centre; every word apart from that key verb is joined to one other word, by a single or double arrow. In his later analysis of this sentence, what we find is considerably more elaborate, with no arrows but single, double and dotted lines, straight or curved, sometimes multiple and with other symbols added indicating types of relationships. The reason for the changes is not given and is not easily deduced. The later work is aimed at explaining the syntactic structure of a range of languages; this in itself would not have required giving up the solar model, but the shift of purpose away from the syntax of French alone was a complexification that coincided with the complexifying of Tesnière's linguistic description. This was happening not long before Noam Chomsky was independently developing his own version of syntactic trees, which have certain features in common with both of Tesnière's modelsnotably, Chomsky is closer to Tesnière (1934) in not depicting different types of syntactic relationships using graphically different lines. The evolution of a given linguist's analysis and description over time does not necessarily represent progress, such that his or her last work must be treated as definitive.
In addition to the etic-emic debate, linguistics in the mid-20th century featured another controversy, treated memorably by Householder (1952), between the 'God's truth' and 'hocus-pocus' positions. 11 Essentially the question was whether linguists discover linguistic structure or invent it. Most linguists want to position what they do as science, and their work as discoverywhich raises epistemological issues that are sometimes confronted, but more often ignored on the grounds that taking them seriously would make any practical work impossible. Indeed, in every science, epistemological questions are acknowledged but kept to the margins, so that 'normal science', in Kuhn's (1962) term, can be pursued. Yet with some scientific endeavours it is particularly difficult to keep such questions at bay, and language complexity is one of those endeavours. It involves multiple levels of analysis, at each of which difficult issues have been set aside and a form of idealisation produced. When one starts comparing these idealisations for the purpose of measuring their relative simplicity and complexity, what has been repressed tends to return in the form of seepage through the cracks, whether it has to do with how the data were gathered, how the analysis was conducted, how the description was composed, or how simplicity and complexity are conceived in terms of language form and function.

CONSTITUTIVE AND REGULATIVE RULES
The birth pangs of modern academic linguistics in the mid-19th century included a debate as to whether it was a natural or historical science. This can be understood as one version of what Bruno Latour (1991) has characterised as the 'constitution' of modern thought, based on a polarisation of Nature and Subject/ Society. In the subsequent decades the debate over linguistics was settled on the side of Nature, and it has been toward that pole that linguists have striven to locate their work; but as Latour argues, the polarisation is not actually possible, and modern thought, however much it may strive for a purified existence at one or the other pole, always ends up being located somewhere in the intermediate space of 'hybrids' (see Joseph, 2018). With the study of language that is not difficult to show, since, as an aspect of human behaviour, there must be some space left for the individual and social dimension if it is studied as a natural phenomenon; and some space for the natural dimension if framing it as a phenomenon of Subject and Society.
Fundamental to modern linguistics is the concept of the language system, with its sub-systems of at least phonology, morphology and syntax, which are 'closed' systems, along with lexicon and perhaps other systems (semantics, pragmatics, higher discourse levels) which are 'open' in the sense that they are not expected to have a relatively small number of elements or to be resistant to taking on new ones. The language system is understood as being shared by those who speak the language as their mother tongue; second-language speakers and multilinguals pose problems that are left to a specialised sub-field, and not generally called into evidence in analysing the language. Because of the readily observable fact that even mother-tongue speakers of 'the same language' differ in how they speak, there needs to be some means of accounting for this, such as positing a domain of 'speech' that is individual, and that represents what is produced using the shared language system, much as the same violin will produce different sounds depending on who is playing it. This is a flawed analogy, obviously, because the violin is a physical object, the sameness of which is directly observable as it is passed from player to player, whereas the language system is not directly observable: its shape has to be deduced from the observable speech of individuals, in a process that requires distinguishing which features are idiosyncratic from those that are generally shared. This is an inherently normative process, in the sense that it involves deciding what is normal, and so can be ascribed to the language system, and what is individual, whether it is a regular feature of a given speaker's idiosyncratic usage or a one-off use in a particular context. All such individual features will be analysed as aspects of speech, parole, as opposed to being built into langue, the socially-shared language system.
Linguists however are resistant to accepting that there is a normative dimension to this process. In the first year of studying linguistics, one is presented with the doctrine that linguistics is descriptive, in contrast to the prescriptive approaches to language which are dominant outside linguistics. Prescriptive judgments (such as he don't is wrong, and he doesn't is right, despite the former's great frequency) are clearly normative. Being on the descriptive side of the dichotomy, and rejecting prescriptive judgments as anti-scientific, leads linguists to assume that we are immune to any normative judgment, and not just to the particularly egregious normativity represented by prescriptions of what is good and bad usage.
This resistance by linguists has not always been unanimous. Garvin (1954: 81-82) points out that when Hjelmslev (1953) introduces his distinction between obligatory and facultative dominance, he 'avoids giving a "real" definition which for "concepts like facultative and obligatory would necessarily presuppose a concept of sociological norm, which proves [in Hjelmslev's view] to be dispensable throughout linguistic theory"'. Garvin contests the supposed dispensability: 'Most American linguists have accepted as one of their basic assumptions the statement that language is part of culture; 12 this implies some assumption of a "sociological norm" -"cultural" would probably be the preferred adjectivedetermining the habit pattern which constitutes or underlies speech behavior'. Referring to Garvin (1953), he argues that 'linguistic structure can be considered a set of "social norms" in the sense in which the social psychologists use the term; as far as I can see, H form [Garvin's formula for 'form understood in Hjelmslev's sense'] is quite analogous to "structure" in this sense, and hence the equation H form "social norm" is not impossible'.
Determining what is normal and systematic is not normative in the same way as is maintaining what is good and bad; the value judgment is of a different order. But identifying the normal is still a normative value judgment, and it runs throughout the conception of a language system. It is a tenet of generative linguistics that 'ungrammatical' sentences are ones which native speakers of English do not produce (unless as a performance error) and which they reject as not English when they hear them; as opposed to 'ungrammatical' sentences in the prescriptive sense, which are ones that native speakers do produce regularly, but which violate rules laid down in grammars of English as to what is correct and incorrect. And yet some of the utterances which were declared ungrammatical in early generative work are accepted as grammatical in later work, even by the same linguist (see Joseph, 2020 on Chomsky's treatment of performing leisure in Hill, 1962;Chomsky, 2008), with no claim made that the language system has changed in the interim.
The intractability of assessing simplicity and complexity points to a problematic reductivism in how linguists dichotomise the way a language system is constituted and functions. In his restatement of Kant's distinction between constitutive and regulative rules, Searle (1969: 55) writes: 'Regulative rules regulate activities whose existence is independent of the rules; constitutive rules constitute (and also regulate) forms of activity whose existence is logically dependent on the rules'. Pullum (2006) applies Searle's distinction to the one made by linguists between prescriptivism (which Pullum classes as regulative) and descriptivism (which he classes as constitutive): 'I begin by taking it for granted that there are conditions we might call correctness conditions for natural languages. [. . .] They are constitutive, not regulative'. In saying this Pullum captures an insight that is by no means peculiar to him, but characterises modern linguistics generally: that the grammar of a language consists of rules that determine what is and is not a grammatical utterance in the language, where grammatical is not a value judgment (which would make it regulative) but an observation of a quasi-natural constitutive fact about what the language does and does not allow.
The reductivism lies in the erasure of what Searle recognises in inserting the parenthesis '(and also regulate)', viz. that the distinction between constitutive/descriptive rules on the one hand, and regulative/prescriptive rules on the other, is not the absolute one which linguists take it to be, but is deceptively weak. This has knockon effects for research into complexity: the systems being compared are unavailable for direct examination; they are inferred from language use, based on a distinction of grammatical and ungrammatical utterances which is asserted dogmatically to be purely constitutive/descriptive, but where judgments of grammaticality are, as suggested by Searle, also regulative, and where prescriptive rules are not necessarily unconstitutive.
If complexity is being measured in terms of what is required to produce grammatical utterances in the judgment of native speakers, that is very different from what is required to produce comprehensible utterances. Linguists are rarely interested in totally incomprehensible utterances; even in the case of a neurolinguistic analysis of aphasic speech done with therapeutic aims, there needs to be some comprehension of what the patient is 'trying to say', in order to work out what is making an utterance ungrammatical. Anyone who interacts regularly with non-native speakers of a language will have experienced linguistic features which make an utterance 'non-native' without necessarily making it incomprehensible. There is a gap between 'how we say it' and what we can understand, at every level from phonetics to discourse. Linguistics conceives of each speaker's mental grammar as being the system which generates that speaker's production of language, enables their comprehension of the language, and also enables them to recognise what is 'deviant', to use a term from an earlier phase of Chomskyan analysis. It is recognised that speakers' mental grammars vary from one another, when it comes to production and recognition of deviance, but less attention has gone to the implications of comprehension. If my mental grammar, my knowledge of a language, is what enables me to understand utterances in that language, it must be expansive enough to account for all the forms that I can comprehend, even if I never produce them.
The measurement of linguistic complexity follows linguistics generally in conceiving of grammar in that narrower way which is based on production, plus recognition of deviance, rather than the full range of what speakers can comprehend. The purposes for which linguistic analysis has traditionally been undertaken probably demand this narrow conception, although the spread of machine comprehension may be changing this. Work in this area has moved toward Bayesian analysis of large-scale production corpora, incorporating 'feature engineering' and 'deep learning' to extract grammatical structure, still based on production, though significantly less subject to normative reduction. We are in the early stages of understanding whether such research will revolutionize the measurement of complexity, or render it meaningless, or simply fail to apply to it. The narrow conception of grammar has survived decades of onslaught from various directions; part of its appeal, and hence of its strength, is its seemingly direct applicability to areas of language research that desperately want grammar to be systematic in a relatively simple form.
Bayesian analysis does not help us to understand what it is that we should measure when, for example, we want to quantify the complexity of number-noun gender agreement in Arabic. Numbers and nouns are both inflected for gender. With the numbers one and two, the number and noun match in gender. With the numbers three to ten there is 'reverse agreement': if the noun is masculine, the feminine form of the number is used, and if the noun is feminine, the masculine form of the number is used; and in either case the number is followed by the noun in its indefinite genitive plural form. From 11 to 19 the numbers have the form one-ten, two-ten, three-ten etc.; for 11 (one-ten) and 12 (two-ten), both the first element (one/two) and the second (ten) agree in gender with the noun. 13 But from 13 (three-ten) to 19 (nine-ten), the second element agrees in gender with the noun, but the first element has reverse agreement. 20,30,40,50,60,70,80 and 90 have the same form regardless of the gender of the noun. In 21 (one-twenty) and 22 (two-twenty), 31 and 32 etc. the first element agrees in gender with the noun, whilst the second element remains invariable. With 23 (three-twenty) to 29, 33 to 39 etc., the first element has reverse agreement with the noun, and the second element is again invariable. This is the sort of structure that disperses adult learners of Arabic as a second language into a wide gamut of abilities, from those who never get it wrong to those who totally ignore it, yet are generally understood and so perhaps see no need to learn it. For many mother tongue speakers of Arabic, correct grammar is a cultural, even a religious duty. In both cases, normative judgments will be passed upon those who speak and write the language. There is no clear dividing line except for what is laid down in the rules of Classical Arabic grammar. So if you want to gauge the complexity of Arabic morphology, what is it that you will measure? The grammar of an educated native speaker? An average native speaker? A competent speaker? An understood speaker? Whether the analysis is arrived at by a linguist or a computer programmed for deep learning, these questionsthese normative questionshave to be answered, and actually a good linguist will be better at that than the most powerful computer would be.

CONCLUSION
If we follow Jakobson's (1959) dictum that 'the true difference between languages is not in what may or may not be expressed but in what must or must not be conveyed by the speakers', it comes down to how to determine the must. Must, or else what? If the answer isn't 'or else incomprehension by the hearer', then it lies somewhere in the realm of the normative. Not in a clearly defined normative location either, but something like a blurred and shifting field of vision where one eye is gazing through the normativity of the language community, and the other eye through the normativity of the linguistics community in its analytical choices. The two eyes rarely if ever focus on the same object. To make matters worse, linguists are in denial about their normativity: in Peircean semiotic terms, the language systems which we attempt to measure for complexity are icons, representations by human hands, which we pretend are indices, reproducing their objects through direct, natural means. Linguistic analysis is run through with interpretationbut to say that threatens the image of linguistics as an objective science. 14 Ultimately, that image is the obstacle to measuring the complexity of languages, because it prevents linguists from doing what is needed to make the measurement solid and meaningful: confronting the normative, interpretative dimension of both what we want to measure and how we want to measure it. Only by understanding that dimension can we hope to bring it under control in a way that would allow for its elimination as a variable in the comparative analysis of language systems, which systems would themselves need to be reconceived in a way that embodies a consistency that would make genuine comparison possible, and the measurement of linguistic complexity less intractable.

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/Supplementary Material, further inquiries can be directed to the corresponding author.

AUTHOR CONTRIBUTIONS
JEJ is the sole author of this article.

ACKNOWLEDGMENTS
I am grateful to James McElvenny for his helpful comments on an early draft and to the Editor and Reviewers for their constructive advice. 13 I omit other details concerning the precise forms used, including their case. 14 In Joseph (2010) I propose the term hermeneiaphobia for this fear-repulsiondenial of interpretation that characterises linguistics; and Joseph (2012) notes how Welby (1896), recognising interpretation as the key to understanding the mental side of linguistic and semiotic phenomena, diagnosed a similar condition in the philosophers and psychologists of her time.