Welcome to the Open Brain AI Online Platform, a comprehensive suite of applications designed to assist you in various linguistic and acoustic analyses. This guide will walk you through the applications available on the platform and how to use them.
Accessing any of the pages and tools of the assessment platform requires registration. This platform does not retain any data other than those provided for the sign up (for more information on data protection policy see the disclaimers).
Use this tool to translated from on language to another by selecting the appropriate to and from languages from the option menu. More languages will be supported soon.
This tool comprises two main components:
a. Linguistic Analysis Module: Analyze text and elicit linguistic measures on phonology, morphology, syntax, semantics, lexicon, and readability.
b. AI Discourse Analysis Module: Analyze discourse and provide suggestions on errors, macrostructure and microstructure of discourse, and assess if the text is produced by a patient or a healthy individual.
The user must provide a text in the form, select one of the language options, the type of analysis. After a few moments the application will provide a report with the linguistic measures.
This tool takes one or more Plain Text (*.txt) or a Microsoft Word (*.docx) documents and returns the an Excel file or a CSV file with the NLP measures. It comprises two main components:
a. Linguistic Analysis Module: Analyze text and elicit linguistic measures on phonology, morphology, syntax, semantics, lexicon, and readability.
b. AI Discourse Analysis Module: Analyze discourse and provide suggestions on errors, macrostructure and microstructure of discourse, and assess if the text is produced by a patient or a healthy individual.
The user must provide one or more files, select one of the language options, the type of analysis, and the form of the output file.
There is an option to perform specific analyses on morphology, syntax, and semantics. To this end, the user needs to sekect Click here for a morphological, syntactic, and semantic analysis of texts. These tools compriseone main components:
Linguistic Analysis Module: Analyze text and elicit linguistic measures on phonology, morphology, syntax, semantics, lexicon, and readability.
The user must provide a text in the form, select one of the language options. After a few moments the application will provide a report with the linguistic measures.
This tool has three main components:
a. Transcription Module: Transcribes sound in English, Greek, Italian, Swedish, and Norwegian.The Linguistics Analysis Tools and the Speech Analysis Tool provide the following measures:
Readability measures are various formulas used to determine the ease with which a person can understand a written text. These measures usually consider factors such as sentence length, syllable count, and word length to provide an estimate of the reading level required to understand a text. Open Brain AI offers the following readability measures (English Language Only):
Flesch Reading Ease:
This measure calculates the ease of reading a text by considering the average sentence length and the average number of syllables per word. A higher score on this index indicates easier readability. It is commonly used in the field of education and content creation.
Example: "The cat sat on the mat." - This sentence would score high on the Flesch Reading Ease scale as it uses simple words and a short sentence length.
Flesch-Kincaid Grade Level:
This is a development of the Flesch Reading Ease measure that estimates the U.S. school grade level required to understand a text. It takes into account the average sentence length and the average number of syllables per word.
Example: "The industrious student diligently completed all assignments." - This sentence would likely correspond to a higher grade level due to the use of more complex words and longer sentence length.
Gunning Fog Index:
This index calculates readability by considering the average sentence length and the percentage of complex words (words with three or more syllables) in a text. A higher score on this index indicates a higher level of difficulty.
Example: "The weather today is sunny and bright." - This sentence would score low on the Gunning Fog Index as it uses simple words and has a short sentence length.
Coleman-Liau Index:
This index estimates the U.S. school grade level required to understand a text by considering the average number of characters per word and the average sentence length.
Example: "Reading books is a good way to expand one's knowledge." - This sentence would likely correspond to a middle school grade level on the Coleman-Liau Index due to the average sentence length and average number of characters per word.
Automated Readability Index:
This index estimates the U.S. school grade level required to understand a text by considering the average number of characters per word and the average sentence length.
Example: "The quick brown fox jumps over the lazy dog." - This sentence would likely correspond to a lower grade level on the Automated Readability Index due to the short sentence length and average number of characters per word.
Understanding the readability of a text is important for various reasons. It helps in tailoring the content to the target audience, making it accessible and engaging. It is especially crucial in educational materials, where the content needs to be appropriate for the students' reading level. Additionally, readability measures are also useful for content creators, marketers, and writers to ensure their message is effectively communicated to their intended audience.
The morphological measures inform about the distribution of parts of speech in the text and, for example, certain aphasia types different in the distribution of POS.
The complexity of a language involves components that contribute to its structure and usage:
Semantic measures offer information about the distribution of semantic entities in the text. For semantics scoring you can also check the semantics scoring application.
We currently offer measures on how many syllables per word. The syllabification accuracy of this module is higher for English and Spanish. Please also, check the Transcription to IPA Application for more details on phoneme distribution.
Lexical measures are used to analyze a text based on its vocabulary. They help in understanding various aspects of the text such as its complexity, diversity, and richness. Open Brain AI offers the following lexical measures:
Function Words:
These are words that have little lexical meaning but serve to connect other words or express the grammar relationships. Examples include prepositions, pronouns, articles, conjunctions, etc.
Specifically, the measure estimates all measures that belong to the following categories:
The remaining parts of speech are calculated among the Content Words.
Content Words:
These are words that carry most of the meaning in a sentence. Examples include nouns, verbs, adjectives, and adverbs.
Specifically, the measure estimates all measures that belong to the following categories:
Characters:
This refers to the number of characters in a text, including letters, numbers, punctuation marks, and spaces.
Word Count Tokenized:
This is the number of words in a text after tokenization, which is the process of breaking a text into pieces, called tokens.
Word Density:
This is the ratio of a specific word to the total number of words in a text.
Propositional Idea Density (PID):
The Propositional Idea Density (PID) is a measure designed to quantify the density of ideas or propositions within a piece of writing, offering a deeper look beyond mere word count into the intellectual substance a text holds.
Imagine reading through a paragraph: each sentence unfolds new information, concepts, or actions. PID aims to quantify this unfolding, providing a numerical value that reflects the concentration of ideas per unit of text.
The calculation of PID involves a two-step process:
Counting Ideas: The first step is to identify and count the propositions within the text. A proposition can be thought of as a meaningful unit of information, often centered around verbs and their associated subjects and objects. Each proposition adds to the overall idea count of the text.
Normalization: To ensure that PID is comparable across texts of varying lengths, the total count of propositions is then normalized. This is typically done by dividing the proposition count by the total number of words or sentences in the text, resulting in a ratio that represents the density of ideas.
Using PID
PID serves as a powerful tool for analyzing and comparing texts. It provides insights into the complexity and depth of writings, from literary works to academic papers. A higher PID indicates a text rich with ideas. Conversely, a lower PID might suggest a text that is easier to digest, possibly suited for broader audiences. Whether you're a writer seeking to enrich your prose, a student analyzing literary works, or a researcher comparing academic texts, PID offers a unique lens through which to view and assess written content.
Sentence Count:
This is the number of sentences in a text after parsing, which is the process of syntactic analysis of a text.
TTR (Type-Token Ratio):
This is the ratio of unique words (types) to the total number of words (tokens) in a text. It is used to measure the lexical diversity of a text.
Corrected TTR:
This is the TTR corrected for text length, as TTR is affected by the length of the text.
Herdan's C:
A measure of vocabulary richness that is less sensitive to text length than TTR. It is calculated as the log of the number of unique words divided by the log of the total number of words.
Summer:
A measure of vocabulary richness that is based on the sum of the logarithms of the word frequencies.
Maas's TTR:
A measure of vocabulary richness that is less sensitive to text length. It is calculated as the log of the total number of words divided by the log of the number of unique words.
Mean Segmental TTR (MSTTR):
This is the mean of the TTRs of segments of a text, usually segments of 100 words. It is used to measure the lexical diversity of a text while controlling for text length.
Mean Length of Utterance (MLU):
The Mean Length of Utterance (MLU) is a linguistic measure used to analyze the average length of sentences or utterances within a text. It is a valuable metric for assessing language complexity and development. MLU is often employed in linguistics and speech-language pathology to gain insights into language production.
The previous measures offered counts and the ration of the count of a measure to total words of the provided text. The following provides the annotated text in the form of a table for researchers interested in the detailed description of the output (see above for the details on grammar).
The Part-Of-Speech (POS) measures and the like typically involve two columns: one representing the count of a specific category in the text, and another representing that count divided by the total number of words, serving as a standardized measure. This standardization is crucial because the raw count of a POS category is naturally influenced by the text's length—longer texts tend to have a greater number of occurrences of a particular POS category, making the ratio a more meaningful and comparative measure. I hope this explanation clarifies things for you.
The Type-Token Ratio (TTR) and similar measures yield a single score, which can be found in the right column. It is safe to ignore any 'NAs' that may appear in the left column in these cases.
Provides visual representation of acoustic signal including waveform, spectrogram, and fundamental frequency contour with automated measures of fundamental frequency and intensity.
Employs Word Embeddings to score semantic distance between two words. Requires a CSV file with "target" and "response" columns. Provides two semantic measures:
a. Semantic Measure 1: Based on pre-trained glove vectors from 2B tweets, 27B tokens, 1.2M vocab, uncased.
b. Semantic Measure 2: Based on trained vectors from the English Wikipedia trained by us.
The Spelling and Phonology Score provides composite measurements that quantify phonological variations between a target word and a response word. These scores are calculated after converting both words into their International Phonetic Alphabet (IPA) representations, which encapsulate the phonological attributes of the words.
While the Phonology Score is calculated using IPA conversions for both words, the Spelling Score diverges in its approach. Specifically, it converts only non-words into their IPA forms, not actual words. This is predicated on the notion that writers are likely familiar only with the phonological representation of non-words, not their written forms. However, users have the flexibility to modify this behavior by choosing to treat non-words as words, or vice versa, for IPA conversion.
The algorithm employs a customized version of the Levenshtein distance metric that accounts for transpositions. In essence, the Levenshtein distance measures the minimal number of operations (additions, deletions, substitutions, and transpositions) required to transform one word into another.
2. **Phonetic Comparisons**:
- For non-words, the tool employs their phonetic transcriptions to gauge their similarity to real words.
3. **Spelling Distance for Words and Non-Words**:
- The tool accommodates both real and fabricated words, providing comprehensive metrics for each word pair in a given list.
1. **Levenshtein Score with Operations**:
- This tool now offers a nuanced breakdown, detailing not just the Levenshtein Score but also the types and counts of operations, Transpositions, Deletions, Insertions, and Substitutions, needed to equate two words.
These operations are noteworthy for various reasons:
- **Typing or Phonological Errors**: often occur when people mistype or mispronounce words.
- **Language Learning**: Learners frequently make mistakes in the sequence of adjacent sounds or letters.
- **Clinical Assessments**: Errors like transpositions can signal specific language impairments or cognitive conditions.
1. **Insertions**:
- Adding a new letter to a word, e.g., transforming "cat" into "cart" by inserting an 'r'.
2. **Deletions**:
- Removing a letter from a word, e.g., reverting "cart" back to "cat" by deleting the 'r'.
3. **Substitutions**:
- Replacing one letter with another, e.g., changing "cat" to "bat" by substituting 'c' with 'b'.
4. **Transpositions**:
- Swapping adjacent letters, e.g., converting "flow" to "fowl" by transposing 'l' and 'o'.
This tool is geared towards researchers, linguists, and educators focusing on language and spelling. It has applications in:
- **Spelling Research**: To analyze frequent types of misspellings.
- **Language Learning**: To assess the proximity of a learner's pronunciation to the target word.
- **Healthcare**: To monitor language capabilities in fields like speech therapy.
Scores phonological distance. Requires a CSV file with "target" and "response" columns. Application will create a new column with phonemic scores. Provides a score of phonological errors on a scale of 0.0 to 1.0, where 1.0 indicates highest degree of error. Supported languages are listed below..
Scores spelling errors. Requires a CSV file with "target", "response", and "type" columns. Application will create a new column with spelling scores. Provides a score of spelling errors on a scale of 0.0 to 1.0, where 1.0 indicates highest degree of error. Supported languages are listed below. See the preparing the CSV section on how to prepare the CSV file for this application.
This tool allows the transcription of a text written in standard alphabet into the International Phonetic Alphabet (IPA). In addition, the tool provides measures about the phoneme distribution found in the provided text.
The applications cover the following languages.
For the Semantics Scoring Application, Scoring Phonological Errors, and Scoring Spelling Responses, you need to provide a CSV file with specific columns. The IPA transcription works with plain text.
For the Semantics Scoring Application: CSV file should have "target" and "response" columns.
For the Phonological Errors Scoring Application: CSV file should have "target" and "response" columns.
For the Spelling Responses Scoring Application: CSV file should have "target", "response", and "type" columns. "type" column should have values "nonword" or "word".
Make sure to spell words in lowercase and without spaces. If not, you will not get an output. You can keep additional data columns, but the required columns are necessary to generate a score.
Also, make sure you have selected the appropriate target language from the dropdown menu
.Note that long files may take longer to process.
After uploading your CSV file, the application will create a new column with the respective scores at the end of your file.
The tools provided can be employed in clinics and classrooms to assess patients' and students' linguistic functioning across various parameters such as semantics, phonology, and spelling.
These two applications provide assessment of speakers' productions in written or oral picture description tasks.
A picture description task is a common assessment tool used in the evaluation of individuals with aphasia or other language disorders. In a picture description task, the patient is presented with a picture and is asked to describe it in as much detail as possible. The picture typically depicts a scene with multiple elements, actions, and interactions to allow for a range of linguistic constructions and vocabulary. In Open Brain AI users can select between standard and non-standard pictures to describe. The task assesses the patient's ability to produce spontaneous speech. It can reveal difficulties in forming grammatically correct sentences, using appropriate vocabulary, or maintaining coherence. Open Brain AI incorporates tools to conduct the picture description task and evaluate the content (what the patient says) and the structure (how they say it). This can provide insights into the type and severity of the aphasia. Patients with different types of aphasia (e.g., Broca's, Wernicke's, Global) may produce different patterns of errors and difficulties in the picture description task. Repeated assessments over time using Open Brain AI can track a patient's recovery and the effectiveness of therapeutic interventions.
This is an application powered by advanced language models. It evaluates Content and Argumentation by examining the thesis statement's clarity and strength, logical argument progression, depth of analysis, and evidence backing claims. The tool reviews the essay's structure, ensuring logical flow, cohesion, and the presence of a clear introduction, body, and conclusion. It checks for Grammar and Mechanics, including punctuation, spelling, sentence construction, and verb tense adherence. The application also analyzes the essay's style and voice for uniqueness, consistency, and appropriateness. Enhanced style is achieved through diverse sentence structures, vocabulary, and rhetorical techniques. The tool provides feedback on essay clarity and precision, flagging ambiguous language or jargon. Lastly, it highlights potential grammatical and stylistic errors.
Make sure to read our disclaimers before using these tools.
Please contact us at themistocleous@gmail.com in case you require further assistance.