Skip to content

Latest commit

 

History

History
98 lines (73 loc) · 3.29 KB

README.md

File metadata and controls

98 lines (73 loc) · 3.29 KB

Mesi

Lint and Test codecov PyPI PyPI - Downloads License


Mesi is a tool to measure the similarity in a many-to-many fashion of long-form documents like Python source code or technical writing. The output can be useful in determining which of a collection of files are the most similar to each other.

Installation

Python 3.9+ and pipx are recommended, although Python 3.6+ and/or pip will also work.

pipx install mesi

If you'd like to test out Mesi before installing it, use the remote execution feature of pipx, which will temporarily download Mesi and run it in an isolated virtual environment.

pipx run mesi --help

Usage

For a directory structure that looks like:

projects
├── project-one
│   ├── pyproject.toml
│   ├── deliverables
│   │   └── python_program.py
│   └── README.md
├── project-two
│   ├── pyproject.toml
│   ├── deliverables
│   │   └── python_program.py
│   └── README.md
│

where similarity should be measured between each project's deliverables/python_program.py file, run the command:

mesi projects/*/deliverables/python_program.py

A lower distance in the produced table equates to a higher degree of similarity.

See the help menu (mesi --help) for additional options and configuration.

Algorithms

There are many algorithms to choose from when comparing string similarity! Mesi implements all the algorithms provided by TextDistance. In general levenshtein is never a bad choice, which is why it is the default.

Table Formats

Mesi uses tabulate for table formatting. The table format can be configured with the --table-format option to one of the formats listed in tabulate's documentation.

Dependencies

Mesi uses two primary dependencies for text similarity calculation: polyleven, and TextDistance. Polyleven is the default, as its singular implementation of Levenshtein distance can be faster in most situations. However, if a different edit distance algorithm is requested, TextDistance's implementations will be used.

Bugs/Requests

Please use the GitHub issue tracker to submit bugs or request new features, options, or algorithms.

License

Distributed under the terms of the GPL v3 license, mesi is free and open source software.