Grammar Measures


Linguistic Measures

The Linguistics Analysis Tools and the Speech Analysis Tool provide the following measures:

3. Morphological Measures

The morphological measures inform about the distribution of parts of speech in the text and, for example, certain aphasia types different in the distribution of POS.

  • Adjective
  • Adposition
  • Adverb
  • Auxiliary
  • Coordinating Conjunction
  • Determiner
  • Interjection
  • Noun
  • Numeral
  • Particle
  • Pronoun
  • Proper Noun
  • Punctuation
  • Subordinating Conjunction
  • Symbol
  • Verb
  • Other
4. Syntactic Measures

The complexity of a language involves components that contribute to its structure and usage:

  • Clausal modifier of noun
  • Adjectival complement
  • Adverbial clause modifier
  • Adverbial modifier
  • Agent
  • Adjectival modifier
  • Appositional modifier
  • Attribute
  • Auxiliary
  • Auxiliary (passive)
  • Case marker
  • Coordinating conjunction
  • Clausal complement
  • Compound modifier
  • Conjunct
  • Clausal subject
  • Clausal subject (passive)
  • Dative
  • Unclassified dependent
  • Determiner
  • Direct object
  • Expletive
  • Interjection
  • Marker
  • Meta modifier
  • Negation modifier
  • Modifier of nominal
  • Noun phrase as adverbial modifier
  • Nominal subject
  • Nominal subject (passive)
  • Number modifier
  • Object predicate
  • Parataxis
  • Complement of preposition
  • Object of preposition
  • Possession modifier
  • Pre-correlative conjunction
  • Pre-determiner
  • Prepositional modifier
  • Particle
  • Punctuation
  • Modifier of quantifier
  • Relative clause modifier
  • Root
  • Open clausal complement
5. Semantic Measures

Semantic measures offer information about the distribution of semantic entities in the text. For semantics scoring you can also check the semantics scoring application.

  • Cardinal number
  • Date
  • Event
  • Facility
  • Geopolitical entity
  • Language
  • Law
  • Location
  • Monetary value
  • Nationalities or religious or political groups
  • Ordinal number
  • Organization
  • Percentage
  • Person
  • Product
  • Quantity
  • Time
  • Work of art
6. Phonological Measures

We currently offer measures on how many syllables per word. The syllabification accuracy of this module is higher for English and Spanish. Please also, check the Transcription to IPA Application for more details on phoneme distribution.

7. Lexical Measures

Lexical measures are used to analyze a text based on its vocabulary. They help in understanding various aspects of the text such as its complexity, diversity, and richness. Open Brain AI offers the following lexical measures:

Function Words:

These are words that have little lexical meaning but serve to connect other words or express the grammar relationships. Examples include prepositions, pronouns, articles, conjunctions, etc.

Specifically, the measure estimates all measures that belong to the following categories:

  • Adposition
  • Auxiliary
  • Coordinating conjunction
  • Determiner
  • Interjection
  • Particle
  • Pronoun
  • Subordinating conjunction

The remaining parts of speech are calculated among the Content Words.

Content Words:

These are words that carry most of the meaning in a sentence. Examples include nouns, verbs, adjectives, and adverbs.

Specifically, the measure estimates all measures that belong to the following categories:

  • Adjective
  • Adverb
  • Noun
  • Numeral
  • Proper noun
  • Verb


This refers to the number of characters in a text, including letters, numbers, punctuation marks, and spaces.

Word Count Tokenized:

This is the number of words in a text after tokenization, which is the process of breaking a text into pieces, called tokens.

Word Density:

This is the ratio of a specific word to the total number of words in a text.

Propositional Idea Density (PID):

The Propositional Idea Density (PID) is a measure designed to quantify the density of ideas or propositions within a piece of writing, offering a deeper look beyond mere word count into the intellectual substance a text holds.

Imagine reading through a paragraph: each sentence unfolds new information, concepts, or actions. PID aims to quantify this unfolding, providing a numerical value that reflects the concentration of ideas per unit of text.

The calculation of PID involves a two-step process:

Counting Ideas: The first step is to identify and count the propositions within the text. A proposition can be thought of as a meaningful unit of information, often centered around verbs and their associated subjects and objects. Each proposition adds to the overall idea count of the text.

Normalization: To ensure that PID is comparable across texts of varying lengths, the total count of propositions is then normalized. This is typically done by dividing the proposition count by the total number of words or sentences in the text, resulting in a ratio that represents the density of ideas.

Using PID

PID serves as a powerful tool for analyzing and comparing texts. It provides insights into the complexity and depth of writings, from literary works to academic papers. A higher PID indicates a text rich with ideas. Conversely, a lower PID might suggest a text that is easier to digest, possibly suited for broader audiences. Whether you're a writer seeking to enrich your prose, a student analyzing literary works, or a researcher comparing academic texts, PID offers a unique lens through which to view and assess written content.

Sentence Count:

This is the number of sentences in a text after parsing, which is the process of syntactic analysis of a text.

TTR (Type-Token Ratio):

This is the ratio of unique words (types) to the total number of words (tokens) in a text. It is used to measure the lexical diversity of a text.

Corrected TTR:

This is the TTR corrected for text length, as TTR is affected by the length of the text.

Herdan's C:

A measure of vocabulary richness that is less sensitive to text length than TTR. It is calculated as the log of the number of unique words divided by the log of the total number of words.


A measure of vocabulary richness that is based on the sum of the logarithms of the word frequencies.

Maas's TTR:

A measure of vocabulary richness that is less sensitive to text length. It is calculated as the log of the total number of words divided by the log of the number of unique words.

Mean Segmental TTR (MSTTR):

This is the mean of the TTRs of segments of a text, usually segments of 100 words. It is used to measure the lexical diversity of a text while controlling for text length.

Mean Length of Utterance (MLU):

The Mean Length of Utterance (MLU) is a linguistic measure used to analyze the average length of sentences or utterances within a text. It is a valuable metric for assessing language complexity and development. MLU is often employed in linguistics and speech-language pathology to gain insights into language production.

8. Detailed Analysis Measures

The previous measures offered counts and the ration of the count of a measure to total words of the provided text. The following provides the annotated text in the form of a table for researchers interested in the detailed description of the output (see above for the details on grammar).

Notes on the output

The Part-Of-Speech (POS) measures and the like typically involve two columns: one representing the count of a specific category in the text, and another representing that count divided by the total number of words, serving as a standardized measure. This standardization is crucial because the raw count of a POS category is naturally influenced by the text's length—longer texts tend to have a greater number of occurrences of a particular POS category, making the ratio a more meaningful and comparative measure. I hope this explanation clarifies things for you.

The Type-Token Ratio (TTR) and similar measures yield a single score, which can be found in the right column. It is safe to ignore any 'NAs' that may appear in the left column in these cases.