The below table lists the different features in BlaBla. In the table, we provide the following columns of information for all the features:

feature name - The name of the feature
feature description - A short description on what the feature means
feature key - The key that you need in the compute_features method
core (# langs) - The base framework that BlaBla uses (Stanza / CoreNLP) for feature extraction
requires Stanza - (Yes/No) depending on whether this features requires Stanza
requires CoreNLP - (Yes/No) depending on whether this features requires CoreNLP
input format - The actual format of the input supported. It is either String (free text) or JSON
default param name - Few of the features allow the user to override a default value such as pause_duration between words. This columns tells you the additional key you can provide in the compute_features method for the feature
default param value - This is the default value taken for the parameter relevant to the feature

feature name	feature description	feature key	core (#langs)	requires Stanza	requires CoreNLP	input format	default param name	default param value
number of pauses	The total number of pauses between words greater than a threshold	num_pauses	Stanza(66)	No	No	JSON	pause_duration	0.35
total pause time	The total duration of pauses between words greater than a threshold	total_pause_time	Stanza(66)	No	No	JSON	pause_duration	0.35
mean pause duration	The average duration of all pauses between words greater than a threshold	mean_pause_duration	Stanza(66)	No	No	JSON	pause_duration	0.35
between utterance pause duration	The proportion of total druation of pauses that are between utterances	between_utterance_pause_duration	Stanza(66)	No	No	JSON	pause_between_utterance_duration	0.035
hesitation ratio	The average gap between sentences	hesitation_ratio	Stanza(66)	No	No	JSON	pause_duration_for_hesitation	0.030
speech rate	The number of words per minute	speech_rate	Stanza(66)	No	No	JSON	-	-
maximum speech rate	The average number of words per minute across the top N (set to 10 by default) rapid sentences	maximum_speech_rate	Stanza(66)	No	No	JSON	num_rapid_sentences	10
total phonation time	Total time duration of all words across all sentences	total_phonation_time	Stanza(66)	No	No	JSON	-	-
standardized phonation time	The total number of words divided by the total phonation time	standardized_phonation_time	Stanza(66)	No	No	JSON	-	-
total locution time	The total amount of time in speech that contains both speech and pauses	total_locution_time	Stanza(66)	No	No	JSON	-	-
noun rate	The rate of nouns across sentences	noun_rate	Stanza(66)	Yes	No	String or JSON	-	-
verb rate	The rate of verbs across sentences	verb_rate	Stanza(66)	Yes	No	String or JSON	-	-
demonstrative rate	The rate of demonstrative across sentences	demonstrative_rate	Stanza(66)	Yes	No	String or JSON	-	-
adjective rate	The rate of adjectives across sentences	adjective_rate	Stanza(66)	Yes	No	String or JSON	-	-
adposition rate	The rate of adpositions across sentences	adposition_rate	Stanza(66)	Yes	No	String or JSON	-	-
adverb rate	The rate of adverbs across sentences	adverb_rate	Stanza(66)	Yes	No	String or JSON	-	-
auxiliary rate	The rate of auxiliaries across sentences	auxiliary_rate	Stanza(66)	Yes	No	String or JSON	-	-
conjunction rate	The rate of conjunctions across sentences	conjunction_rate	Stanza(66)	Yes	No	String or JSON	-	-
determiner rate	The rate of determiners across sentences	determiner_rate	Stanza(66)	Yes	No	String or JSON	-	-
interjection rate	The rate of interjections across sentences	interjection_rate	Stanza(66)	Yes	No	String or JSON	-	-
numeral rate	The rate of numerals across sentences	numeral_rate	Stanza(66)	Yes	No	String or JSON	-	-
particle rate	The rate of particles across sentences	particle_rate	Stanza(66)	Yes	No	String or JSON	-	-
pronoun rate	The rate of pronouns across sentences	pronoun_rate	Stanza(66)	Yes	No	String or JSON	-	-
proper noun rate	The rate of proper nouns across sentences	proper_noun_rate	Stanza(66)	Yes	No	String or JSON	-	-
punctuation rate	The rate of punctuations across sentences	punctuation_rate	Stanza(66)	Yes	No	String or JSON	-	-
subordinating conjunction rate	The rate of subordinating conjunctions across sentences	subordinating_conjunction_rate	Stanza(66)	Yes	No	String or JSON	-	-
symbol rate	The rate of symbols across sentences	symbol_rate	Stanza(66)	Yes	No	String or JSON	-	-
possessive rate	The rate of possessive words across sentences	possessive_rate	Stanza(66)	Yes	No	String or JSON	-	-
noun verb Ratio	The ratio of nouns to verbs across sentences	noun_verb_ratio	Stanza(66)	Yes	No	String or JSON	-	-
noun ratio	The ratio of nouns to the sum of nouns and verbns across sentences	noun_ratio	Stanza(66)	Yes	No	String or JSON	-	-
pronoun noun ratio	The ratio of pronouns to nouns across sentences	pronoun_noun_ratio	Stanza(66)	Yes	No	String or JSON	-	-
closed-class word rate	The proportions of determiners, pronouns, conjunctions and prepositions to all words across sentences	closed_class_word_rate	Stanza(66)	Yes	No	String or JSON	-	-
open-class word rate	The proportions of nouns, verbs, adjectives and adverbs to all words across sentences	open_class_word_rate	Stanza(66)	Yes	No	String or JSON	-	-
total dependency distance	The total distance of all dependencies across sentences	total_dependency_distance	Stanza(66)	Yes	No	String or JSON	-	-
average dependency distance	The average distance of all dependencies across sentences	average_dependency_distance	Stanza(66)	Yes	No	String or JSON	-	-
total dependencies	The total number of unique dependencies across sentences	total_dependencies	Stanza(66)	Yes	No	String or JSON	-	-
average dependencies	The average number of unique dependencies across sentences	average_dependencies	Stanza(66)	Yes	No	String or JSON	-	-
content density	The proportions of numebr of open class words to the numebr of close class words	content_density	Stanza(66)	Yes	No	String or JSON	-	-
idea density	he proportions of verbs, adjectives, adverbs, prepositions and conjucntions to all words across sentences	idea_density	Stanza(66)	Yes	No	String or JSON	-	-
honore's statistic	Calculated as R = (100*log(N))/(1-(V1)/(V)), where V is number of unique words, V1 is the number of words in the vocabulary only spoken once, and N is overall text length / number of words.	honore_statistic	Stanza(66)	Yes	No	String or JSON	-	-
brunet's index	Calculated as N^(V^(-0.165)), where V is number of unique words and N is overall text length / number of words. Measure of lexical richness. Text-length insensitive version of TTR.	brunet_index	Stanza(66)	Yes	No	String or JSON	-	-
type-token-ratio	The number of word types divided by the number of word tokens	type_token_ratio	Stanza(66)	Yes	No	String or JSON	-	-
word length	The mean length of words across the corpus	word_length	Stanza(66)	Yes	No	String or JSON	-	-
proportion of inflected verbs	The ratio of the number of inflected verbs to the number of verbs	prop_inflected_verbs	Stanza(66)	Yes	No	String or JSON	-	-
proportion of auxiliary verbs	The ratio of the number of auxiliary verbs to the number of verbs	prop_auxiliary_verbs	Stanza(66)	Yes	No	String or JSON	-	-
proportion of gerund verbs	The ratio of the number of gerund verbs to the number of verbs	prop_gerund_verbs	Stanza(66)	Yes	No	String or JSON	-	-
proportion of participles	The ratio of the number of particile verbs to the number of verbs	prop_participles	Stanza(66)	Yes	No	String or JSON	-	-
number of clauses	The number of clauses across the corpus	num_clauses	CoreNLP(6)	Yes	Yes	String or JSON	-	-
clause rate	The number of clauses per sentence across the corpus	clause_rate	CoreNLP(6)	Yes	Yes	String or JSON	-	-
number of dependent clauses	The number of Dependent Clauses	num_dependent_clauses	CoreNLP(6)	Yes	Yes	String or JSON	-	-
dependent clauses rate	The rate of Dependent Clauses	dependent_clause_rate	CoreNLP(6)	Yes	Yes	String or JSON	-	-
proportion of nouns with determiners	The proportion of nouns associated with a determiner	prop_nouns_with_det	Stanza(66)	Yes	No	String or JSON	-	-
proportion of nouns with adjectives	The proportion of nouns associated with a adjective	prop_nouns_with_adj	Stanza(66)	Yes	No	String or JSON	-	-
number of noun phrases	The number of Noun Phrases	num_noun_phrases	CoreNLP(6)	Yes	Yes	String or JSON	-	-
noun phrase rate	The rate of Noun Phrases	noun_phrase_rate	CoreNLP(6)	Yes	Yes	String or JSON	-	-
number of verb phrases	The number of Verb Phrases	num_verb_phrases	CoreNLP(6)	Yes	Yes	String or JSON	-	-
verb phrase rate	The rate of Noun Phrases	verb_phrase_rate	CoreNLP(6)	Yes	Yes	String or JSON	-	-
number of infinitive phrases	The number of Infinitive Phrases	num_infinitive_phrases	CoreNLP(6)	Yes	Yes	String or JSON	-	-
infinitive phrase rate	The rate of Infinitive Phrases	infinitive_phrase_rate	CoreNLP(6)	Yes	Yes	String or JSON	-	-
number of prepositional phrases	The number of Prepositional Phrases	num_prepositional_phrases	CoreNLP(6)	Yes	Yes	String or JSON	-	-
prepositional phrase rate	The rate of Infinitive Phrases	prepositional_phrase_rate	CoreNLP(6)	Yes	Yes	String or JSON	-	-
max yngve depth	The maximum Yngve Depth of each parse tree averaged over all sentences	max_yngve_depth	CoreNLP(6)	Yes	Yes	String or JSON	-	-
mean yngve depth	The mean Yngve Depth of all nodes in a parse tree averaged over all sentences	mean_yngve_depth	CoreNLP(6)	Yes	Yes	String or JSON	-	-
total yngve depth	The total Yngve Depth of all nodes in a parse tree averaged over all sentences	total_yngve_depth	CoreNLP(6)	Yes	Yes	String or JSON	-	-
parse tree height	The height of parse tree averaged over all sentences	parse_tree_height	CoreNLP(6)	Yes	Yes	String or JSON	-	-
number of discourse markers	The number of total discourse markers	num_discourse_markers	CoreNLP(6)	Yes	Yes	String or JSON	-	-
discourse marker rate	The rate discourse markers across all sentences	discourse_marker_rate	CoreNLP(6)	Yes	Yes	String or JSON	-	-

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FEATURES.md

FEATURES.md

Files

FEATURES.md

Latest commit

History

FEATURES.md

File metadata and controls