Documentation

Welcome to the Open Brain AI Online

Welcome to the Open Brain AI Online Platform, a comprehensive suite of applications designed to assist you in various linguistic and acoustic analyses. This guide will walk you through the applications available on the platform and how to use them.


Getting started

Accessing any of the pages and tools of the assessment platform requires registration. This platform does not retain any data other than those provided for the sign up (for more information on data protection policy see the disclaimers).


Automatic Translation

Use this tool to translated from on language to another by selecting the appropriate to and from languages from the option menu. More languages will be supported soon.


1. Linguistics Analysis Tool - Written Speech Assessment

This tool comprises two main components:
a. Linguistic Analysis Module: Analyze text and elicit linguistic measures on phonology, morphology, syntax, semantics, lexicon, and readability.
b. AI Discourse Analysis Module: Analyze discourse and provide suggestions on errors, macrostructure and microstructure of discourse, and assess if the text is produced by a patient or a healthy individual.

The user must provide a text in the form, select one of the language options, the type of analysis. After a few moments the application will provide a report with the linguistic measures.


2. Linguistics Analysis Tool - Processing Documents

This tool takes one or more Plain Text (*.txt) or a Microsoft Word (*.docx) documents and returns the an Excel file or a CSV file with the NLP measures. It comprises two main components:
a. Linguistic Analysis Module: Analyze text and elicit linguistic measures on phonology, morphology, syntax, semantics, lexicon, and readability.
b. AI Discourse Analysis Module: Analyze discourse and provide suggestions on errors, macrostructure and microstructure of discourse, and assess if the text is produced by a patient or a healthy individual.

The user must provide one or more files, select one of the language options, the type of analysis, and the form of the output file.


3. Linguistic Domains Analysis

There is an option to perform specific analyses on morphology, syntax, and semantics. To this end, the user needs to sekect Click here for a morphological, syntactic, and semantic analysis of texts. These tools compriseone main components:
Linguistic Analysis Module: Analyze text and elicit linguistic measures on phonology, morphology, syntax, semantics, lexicon, and readability.

The user must provide a text in the form, select one of the language options. After a few moments the application will provide a report with the linguistic measures.


Language Support for Written Assessment

  • Catalan
  • Chinese
  • Croatian
  • Danish
  • Dutch
  • English
  • Finnish
  • French
  • German
  • Greek
  • Italian
  • Lithuanian
  • Macedonian
  • Norwegian
  • Polish
  • Portuguese
  • Romanian
  • Russian
  • Slovenian
  • Spanish
  • Swedish
  • Ukrainian


Speech Analysis Tool

This tool has three main components:

a. Transcription Module: Transcribes sound in English, Greek, Italian, Swedish, and Norwegian.
b. Linguistic Analysis from Speech Module: Analyze text and elicit linguistic measures as mentioned in the Linguistics Analysis Tool.
c. AI Discourse Analysis from Speech Module: Analyze discourse as mentioned in the Linguistics Analysis Tool.

Linguistic Measures

The Linguistics Analysis Tools and the Speech Analysis Tool provide the following measures:

1. Readability Measures

Readability measures are various formulas used to determine the ease with which a person can understand a written text. These measures usually consider factors such as sentence length, syllable count, and word length to provide an estimate of the reading level required to understand a text. Open Brain AI offers the following readability measures (English Language Only):

Flesch Reading Ease:

This measure calculates the ease of reading a text by considering the average sentence length and the average number of syllables per word. A higher score on this index indicates easier readability. It is commonly used in the field of education and content creation.

Example: "The cat sat on the mat." - This sentence would score high on the Flesch Reading Ease scale as it uses simple words and a short sentence length.

Flesch-Kincaid Grade Level:

This is a development of the Flesch Reading Ease measure that estimates the U.S. school grade level required to understand a text. It takes into account the average sentence length and the average number of syllables per word.

Example: "The industrious student diligently completed all assignments." - This sentence would likely correspond to a higher grade level due to the use of more complex words and longer sentence length.

Gunning Fog Index:

This index calculates readability by considering the average sentence length and the percentage of complex words (words with three or more syllables) in a text. A higher score on this index indicates a higher level of difficulty.

Example: "The weather today is sunny and bright." - This sentence would score low on the Gunning Fog Index as it uses simple words and has a short sentence length.

Coleman-Liau Index:

This index estimates the U.S. school grade level required to understand a text by considering the average number of characters per word and the average sentence length.

Example: "Reading books is a good way to expand one's knowledge." - This sentence would likely correspond to a middle school grade level on the Coleman-Liau Index due to the average sentence length and average number of characters per word.

Automated Readability Index:

This index estimates the U.S. school grade level required to understand a text by considering the average number of characters per word and the average sentence length.

Example: "The quick brown fox jumps over the lazy dog." - This sentence would likely correspond to a lower grade level on the Automated Readability Index due to the short sentence length and average number of characters per word.

Understanding the readability of a text is important for various reasons. It helps in tailoring the content to the target audience, making it accessible and engaging. It is especially crucial in educational materials, where the content needs to be appropriate for the students' reading level. Additionally, readability measures are also useful for content creators, marketers, and writers to ensure their message is effectively communicated to their intended audience.

3. Morphological Measures

The morphological measures inform about the distribution of parts of speech in the text and, for example, certain aphasia types different in the distribution of POS.

  • Adjective
  • Adposition
  • Adverb
  • Auxiliary
  • Coordinating Conjunction
  • Determiner
  • Interjection
  • Noun
  • Numeral
  • Particle
  • Pronoun
  • Proper Noun
  • Punctuation
  • Subordinating Conjunction
  • Symbol
  • Verb
  • Other
4. Syntactic Measures

The complexity of a language involves components that contribute to its structure and usage:

  • Clausal modifier of noun
  • Adjectival complement
  • Adverbial clause modifier
  • Adverbial modifier
  • Agent
  • Adjectival modifier
  • Appositional modifier
  • Attribute
  • Auxiliary
  • Auxiliary (passive)
  • Case marker
  • Coordinating conjunction
  • Clausal complement
  • Compound modifier
  • Conjunct
  • Clausal subject
  • Clausal subject (passive)
  • Dative
  • Unclassified dependent
  • Determiner
  • Direct object
  • Expletive
  • Interjection
  • Marker
  • Meta modifier
  • Negation modifier
  • Modifier of nominal
  • Noun phrase as adverbial modifier
  • Nominal subject
  • Nominal subject (passive)
  • Number modifier
  • Object predicate
  • Parataxis
  • Complement of preposition
  • Object of preposition
  • Possession modifier
  • Pre-correlative conjunction
  • Pre-determiner
  • Prepositional modifier
  • Particle
  • Punctuation
  • Modifier of quantifier
  • Relative clause modifier
  • Root
  • Open clausal complement
5. Semantic Measures

Semantic measures offer information about the distribution of semantic entities in the text. For semantics scoring you can also check the semantics scoring application.

  • Cardinal number
  • Date
  • Event
  • Facility
  • Geopolitical entity
  • Language
  • Law
  • Location
  • Monetary value
  • Nationalities or religious or political groups
  • Ordinal number
  • Organization
  • Percentage
  • Person
  • Product
  • Quantity
  • Time
  • Work of art
6. Phonological Measures

We currently offer measures on how many syllables per word. The syllabification accuracy of this module is higher for English and Spanish. Please also, check the Transcription to IPA Application for more details on phoneme distribution.

7. Lexical Measures

Lexical measures are used to analyze a text based on its vocabulary. They help in understanding various aspects of the text such as its complexity, diversity, and richness. Open Brain AI offers the following lexical measures:

Function Words:

These are words that have little lexical meaning but serve to connect other words or express the grammar relationships. Examples include prepositions, pronouns, articles, conjunctions, etc.

Specifically, the measure estimates all measures that belong to the following categories:

  • Adposition
  • Auxiliary
  • Coordinating conjunction
  • Determiner
  • Interjection
  • Particle
  • Pronoun
  • Subordinating conjunction

The remaining parts of speech are calculated among the Content Words.

Content Words:

These are words that carry most of the meaning in a sentence. Examples include nouns, verbs, adjectives, and adverbs.

Specifically, the measure estimates all measures that belong to the following categories:

  • Adjective
  • Adverb
  • Noun
  • Numeral
  • Proper noun
  • Verb

Characters:

This refers to the number of characters in a text, including letters, numbers, punctuation marks, and spaces.

Word Count Tokenized:

This is the number of words in a text after tokenization, which is the process of breaking a text into pieces, called tokens.

Word Density:

This is the ratio of a specific word to the total number of words in a text.

Propositional Idea Density (PID):

The Propositional Idea Density (PID) is a measure designed to quantify the density of ideas or propositions within a piece of writing, offering a deeper look beyond mere word count into the intellectual substance a text holds.

Imagine reading through a paragraph: each sentence unfolds new information, concepts, or actions. PID aims to quantify this unfolding, providing a numerical value that reflects the concentration of ideas per unit of text.

The calculation of PID involves a two-step process:

Counting Ideas: The first step is to identify and count the propositions within the text. A proposition can be thought of as a meaningful unit of information, often centered around verbs and their associated subjects and objects. Each proposition adds to the overall idea count of the text.

Normalization: To ensure that PID is comparable across texts of varying lengths, the total count of propositions is then normalized. This is typically done by dividing the proposition count by the total number of words or sentences in the text, resulting in a ratio that represents the density of ideas.

Using PID

PID serves as a powerful tool for analyzing and comparing texts. It provides insights into the complexity and depth of writings, from literary works to academic papers. A higher PID indicates a text rich with ideas. Conversely, a lower PID might suggest a text that is easier to digest, possibly suited for broader audiences. Whether you're a writer seeking to enrich your prose, a student analyzing literary works, or a researcher comparing academic texts, PID offers a unique lens through which to view and assess written content.

Sentence Count:

This is the number of sentences in a text after parsing, which is the process of syntactic analysis of a text.

TTR (Type-Token Ratio):

This is the ratio of unique words (types) to the total number of words (tokens) in a text. It is used to measure the lexical diversity of a text.

Corrected TTR:

This is the TTR corrected for text length, as TTR is affected by the length of the text.

Herdan's C:

A measure of vocabulary richness that is less sensitive to text length than TTR. It is calculated as the log of the number of unique words divided by the log of the total number of words.

Summer:

A measure of vocabulary richness that is based on the sum of the logarithms of the word frequencies.

Maas's TTR:

A measure of vocabulary richness that is less sensitive to text length. It is calculated as the log of the total number of words divided by the log of the number of unique words.

Mean Segmental TTR (MSTTR):

This is the mean of the TTRs of segments of a text, usually segments of 100 words. It is used to measure the lexical diversity of a text while controlling for text length.

Mean Length of Utterance (MLU):

The Mean Length of Utterance (MLU) is a linguistic measure used to analyze the average length of sentences or utterances within a text. It is a valuable metric for assessing language complexity and development. MLU is often employed in linguistics and speech-language pathology to gain insights into language production.

8. Detailed Analysis Measures

The previous measures offered counts and the ration of the count of a measure to total words of the provided text. The following provides the annotated text in the form of a table for researchers interested in the detailed description of the output (see above for the details on grammar).

Notes on the output

The Part-Of-Speech (POS) measures and the like typically involve two columns: one representing the count of a specific category in the text, and another representing that count divided by the total number of words, serving as a standardized measure. This standardization is crucial because the raw count of a POS category is naturally influenced by the text's length—longer texts tend to have a greater number of occurrences of a particular POS category, making the ratio a more meaningful and comparative measure. I hope this explanation clarifies things for you.

The Type-Token Ratio (TTR) and similar measures yield a single score, which can be found in the right column. It is safe to ignore any 'NAs' that may appear in the left column in these cases.


Speech Acoustics Application [preview only]

Provides visual representation of acoustic signal including waveform, spectrogram, and fundamental frequency contour with automated measures of fundamental frequency and intensity.


Segment a Sound into Words and Find the Speakers
Finds boundaries of words and distinguishes speakers in a speech recording.
Segment a sound into words

  • Word Segmentation
  • Find Pauses from the Automatic Transcription (i.e., pause detection makes use of word segmentation, thus it is more accurate)
Find the Speakers in a soundfile
The diarization model, identifies the beginning and end of a speaker using unsupervised machine learning. You have to provide the number of speakers.
Model Output
  • Html Output, provided in a website
  • CSV file
  • Praat textgrid
Language Support
Works with the following languages more will be added soon.
  • English
  • Greek
  • Italian
  • Swedish


Semantics Scoring Application

Employs Word Embeddings to score semantic distance between two words. Requires a CSV file with "target" and "response" columns. Provides two semantic measures:
a. Semantic Measure 1: Based on pre-trained glove vectors from 2B tweets, 27B tokens, 1.2M vocab, uncased.
b. Semantic Measure 2: Based on trained vectors from the English Wikipedia trained by us.


Spelling Application and Phonology Application
What is the Spelling and Phonology Score?

The Spelling and Phonology Score provides composite measurements that quantify phonological variations between a target word and a response word. These scores are calculated after converting both words into their International Phonetic Alphabet (IPA) representations, which encapsulate the phonological attributes of the words.

Spelling Score vs. Phonology Score

While the Phonology Score is calculated using IPA conversions for both words, the Spelling Score diverges in its approach. Specifically, it converts only non-words into their IPA forms, not actual words. This is predicated on the notion that writers are likely familiar only with the phonological representation of non-words, not their written forms. However, users have the flexibility to modify this behavior by choosing to treat non-words as words, or vice versa, for IPA conversion.

Algorithmic Methodology

The algorithm employs a customized version of the Levenshtein distance metric that accounts for transpositions. In essence, the Levenshtein distance measures the minimal number of operations (additions, deletions, substitutions, and transpositions) required to transform one word into another.

2. **Phonetic Comparisons**:

- For non-words, the tool employs their phonetic transcriptions to gauge their similarity to real words.

3. **Spelling Distance for Words and Non-Words**:

- The tool accommodates both real and fabricated words, providing comprehensive metrics for each word pair in a given list.

Updated Features

1. **Levenshtein Score with Operations**:

- This tool now offers a nuanced breakdown, detailing not just the Levenshtein Score but also the types and counts of operations, Transpositions, Deletions, Insertions, and Substitutions, needed to equate two words.

Why Transpositions, Deletions, Insertions, and Substitutions Matter

These operations are noteworthy for various reasons:

- **Typing or Phonological Errors**: often occur when people mistype or mispronounce words.

- **Language Learning**: Learners frequently make mistakes in the sequence of adjacent sounds or letters.

- **Clinical Assessments**: Errors like transpositions can signal specific language impairments or cognitive conditions.

Expanded Explanation of Operations

1. **Insertions**:

- Adding a new letter to a word, e.g., transforming "cat" into "cart" by inserting an 'r'.

2. **Deletions**:

- Removing a letter from a word, e.g., reverting "cart" back to "cat" by deleting the 'r'.

3. **Substitutions**:

- Replacing one letter with another, e.g., changing "cat" to "bat" by substituting 'c' with 'b'.

4. **Transpositions**:

- Swapping adjacent letters, e.g., converting "flow" to "fowl" by transposing 'l' and 'o'.

Target Audience

This tool is geared towards researchers, linguists, and educators focusing on language and spelling. It has applications in:

- **Spelling Research**: To analyze frequent types of misspellings.

- **Language Learning**: To assess the proximity of a learner's pronunciation to the target word.

- **Healthcare**: To monitor language capabilities in fields like speech therapy.


Scoring Phonological Errors

Scores phonological distance. Requires a CSV file with "target" and "response" columns. Application will create a new column with phonemic scores. Provides a score of phonological errors on a scale of 0.0 to 1.0, where 1.0 indicates highest degree of error. Supported languages are listed below..


Scoring Spelling Responses

Scores spelling errors. Requires a CSV file with "target", "response", and "type" columns. Application will create a new column with spelling scores. Provides a score of spelling errors on a scale of 0.0 to 1.0, where 1.0 indicates highest degree of error. Supported languages are listed below. See the preparing the CSV section on how to prepare the CSV file for this application.


International Phonetic Alphabet (IPA) Transcription Tool

This tool allows the transcription of a text written in standard alphabet into the International Phonetic Alphabet (IPA). In addition, the tool provides measures about the phoneme distribution found in the provided text.


Using the Phonological, Spelling Error Scoring, and IPA Transcription Tools

The applications cover the following languages.

Supported Languages
  • Afrikaans
  • Albanian
  • Aragonese
  • Armenian (West)
  • Armenian
  • Bosnian
  • Bulgarian
  • Catalan
  • Chinese (Cantonese)
  • Chinese (Mandarin)
  • Croatian
  • Czech
  • Danish
  • Dutch
  • English (North UK)
  • English (Received Pronunciation)
  • English (Scottish)
  • English (UK)
  • English (US)
  • English (West Indies)
  • English (West Midlands)
  • Esperanto
  • Estonian
  • Finnish
  • French (Belgium)
  • French
  • Georgian
  • German
  • Greek (Ancient)
  • Greek
  • Hindi
  • Hungarian
  • Icelandic
  • Indonesian
  • Irish
  • Italian
  • Kannada
  • Kurdish
  • Latin
  • Latvian
  • Lingua Franca Nova
  • Lithuanian
  • Macedonian
  • Malay
  • Malayalam
  • Nepali
  • Norwegian
  • Persian (Pinglish)
  • Persian
  • Polish
  • Portuguese (Brazil)
  • Portuguese (Portugal)
  • Punjabi
  • Romanian
  • Russian
  • Serbian
  • Slovak
  • Spanish (Latin America)
  • Spanish
  • Swahili
  • Swedish
  • Tamil
  • Turkish
  • Vietnamese (Hue)
  • Vietnamese (Sgn)
  • Vietnamese
  • Welsh
Preparing CSV Files

For the Semantics Scoring Application, Scoring Phonological Errors, and Scoring Spelling Responses, you need to provide a CSV file with specific columns. The IPA transcription works with plain text.

For the Semantics Scoring Application: CSV file should have "target" and "response" columns.

For the Phonological Errors Scoring Application: CSV file should have "target" and "response" columns.

For the Spelling Responses Scoring Application: CSV file should have "target", "response", and "type" columns. "type" column should have values "nonword" or "word".

Make sure to spell words in lowercase and without spaces. If not, you will not get an output. You can keep additional data columns, but the required columns are necessary to generate a score.

Also, make sure you have selected the appropriate target language from the dropdown menu

.
Processing Time

Note that long files may take longer to process.

Output

After uploading your CSV file, the application will create a new column with the respective scores at the end of your file.

Application Usage

The tools provided can be employed in clinics and classrooms to assess patients' and students' linguistic functioning across various parameters such as semantics, phonology, and spelling.


Picture Description Task (Oral and Written)

These two applications provide assessment of speakers' productions in written or oral picture description tasks.

A picture description task is a common assessment tool used in the evaluation of individuals with aphasia or other language disorders. In a picture description task, the patient is presented with a picture and is asked to describe it in as much detail as possible. The picture typically depicts a scene with multiple elements, actions, and interactions to allow for a range of linguistic constructions and vocabulary. In Open Brain AI users can select between standard and non-standard pictures to describe. The task assesses the patient's ability to produce spontaneous speech. It can reveal difficulties in forming grammatically correct sentences, using appropriate vocabulary, or maintaining coherence. Open Brain AI incorporates tools to conduct the picture description task and evaluate the content (what the patient says) and the structure (how they say it). This can provide insights into the type and severity of the aphasia. Patients with different types of aphasia (e.g., Broca's, Wernicke's, Global) may produce different patterns of errors and difficulties in the picture description task. Repeated assessments over time using Open Brain AI can track a patient's recovery and the effectiveness of therapeutic interventions.


Essay Assessment

This is an application powered by advanced language models. It evaluates Content and Argumentation by examining the thesis statement's clarity and strength, logical argument progression, depth of analysis, and evidence backing claims. The tool reviews the essay's structure, ensuring logical flow, cohesion, and the presence of a clear introduction, body, and conclusion. It checks for Grammar and Mechanics, including punctuation, spelling, sentence construction, and verb tense adherence. The application also analyzes the essay's style and voice for uniqueness, consistency, and appropriateness. Enhanced style is achieved through diverse sentence structures, vocabulary, and rhetorical techniques. The tool provides feedback on essay clarity and precision, flagging ambiguous language or jargon. Lastly, it highlights potential grammatical and stylistic errors.


Disclaimers

Make sure to read our disclaimers before using these tools.


Documentation

Please contact us at themistocleous@gmail.com in case you require further assistance.