The Correlation Engine

The correlation engine is a hobby project
It's purpose is for correlation metrics between documents
It uses TF-IDF for similarity between term vectors
The idea is to create a database where any type of document can be correlated to a set of documents
It currently uses Spring as a web frontend
Possible use cases could be for finding similar attacks, correlating CVEs, log correlation, SIEM etc

Document A datatype with a unique identifier and a set of fields with text information
Vector Extractor Key role is to extract terms from text input mapped from a document field. It does so by tokenizing the input from a document field, extracting (possibliy new) terms and building a sparse vector.
Analyzer A list of vector extractors
SparseVector A vector where only the non zero fields are present
Dictionary A list of terms, the number of indexed documents and the frequency of the terms in the indexed documents

Example of a analyzer

	"name": "uniq",
	"extractors": [
		{
			"name": "uniq_words",
			"sourceField": "description" 
		},
		{
			"name": "uniq_words",
			"sourceField": "name" 
		}
	]
}```

Name		Name	Last commit message	Last commit date
Latest commit History 58 Commits
.circleci		.circleci
seeder		seeder
src		src
.gitignore		.gitignore
README.md		README.md
lombok.config		lombok.config
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

The Correlation Engine

Example of a analyzer

About

Releases

Packages

Languages

petha/correlation-engine

Folders and files

Latest commit

History

Repository files navigation

The Correlation Engine

Example of a analyzer

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages