The Correlation Engine

The correlation engine is a hobby project
It's purpose is for correlation metrics between documents
It uses TF-IDF for similarity between term vectors
The idea is to create a database where any type of document can be correlated to a set of documents
It currently uses Spring as a web frontend
Possible use cases could be for finding similar attacks, correlating CVEs, log correlation, SIEM etc

Document A datatype with a unique identifier and a set of fields with text information
Vector Extractor Key role is to extract terms from text input mapped from a document field. It does so by tokenizing the input from a document field, extracting (possibliy new) terms and building a sparse vector.
Analyzer A list of vector extractors
SparseVector A vector where only the non zero fields are present
Dictionary A list of terms, the number of indexed documents and the frequency of the terms in the indexed documents

Example of a analyzer

	"name": "uniq",
	"extractors": [
		{
			"name": "uniq_words",
			"sourceField": "description" 
		},
		{
			"name": "uniq_words",
			"sourceField": "name" 
		}
	]
}```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

The Correlation Engine

Example of a analyzer

Files

README.md

Latest commit

History

README.md

File metadata and controls

The Correlation Engine

Example of a analyzer