Skip to content

Commit

Permalink
Merge pull request #1 from atomflunder/v0.5.0
Browse files Browse the repository at this point in the history
v0.5.0
  • Loading branch information
atomflunder authored Apr 27, 2022
2 parents 9111c80 + 7153742 commit 0d55b57
Show file tree
Hide file tree
Showing 10 changed files with 237 additions and 243 deletions.
18 changes: 18 additions & 0 deletions .github/CONTRIBUTING.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
# Contributing to stringmatch

First off, thanks for being interested in contributing to stringmatch! Every contribution is appreciated a lot. The following are some guidelines to get you started. They are *guidelines* and not strict rules.

If you just want to ask a question, go ahead and visit the [GitHub Discussions Tab](https://github.com/atomflunder/stringmatch/discussions).

## Bug reports

While submitting a bug report, make sure to follow the template and be clear in how to reproduce the bug. If you already know how to fix the bug, go ahead and either describe it in the report, or submit a pull request directly.

## Pull requests

Submitting a pull request is just as straight-forward as submitting a bug report. Follow the template and you will be fine.
If you make any changes to the functionality of the code, please make sure to test the functionality beforehand, writing tests is greatly encouraged.
It would also be greatly appreciated if you stick to the general style of the library, but not really required.

Thanks again for your interest in contributing!
If you still have doubt in contributing to this library, I can assure you there is no bad contribution.
2 changes: 1 addition & 1 deletion .github/workflows/build.yml
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
name: build
name: Build

on: [push, pull_request]

Expand Down
16 changes: 14 additions & 2 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,14 +2,26 @@

This is a broad overview of the changes that have been made over the lifespan of this library.

## v0.5.0 - 2022-04-27

- Removed scorer argument from functions, added it into `__init__` in both Match() and Ratio()
- Renamed *_with_score functions to *_with_ratio to be consistent with naming
- This affects the three functions added in v0.4.0
- Removed Exceptions
- Returning a score of 0 instead of raising EmptySearchException
- Using "levenshtein" as default instead of raising InvalidScorerException
- Setting no limit instead of raising InvalidLimitException, if a limit less than 1 is set
- Updated docstrings to reflect these changes
- Updated tests to reflect these changes

## v0.4.1 - 2022-04-27

- Added proper Python Versions to setup classifiers

## v0.4.0 - 2022-04-27

- Added match_with_score, get_best_match_with_score and get_best_matches_with_score functions
- Added tests for those functions
- Added tests for those functions
- Updated documentation a bit

## v0.3.1 - 2022-04-26
Expand All @@ -26,7 +38,7 @@ This is a broad overview of the changes that have been made over the lifespan of
- Made library public and installable via git
- Added multiple scorers
- Added new kwargs to Match functions
- Added tests for those
- Added tests for those
- Improved various functions
- Added exception type
- Some documentation improvements
Expand Down
128 changes: 88 additions & 40 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,13 @@ Inspired by [seatgeek/thefuzz](https://github.com/seatgeek/thefuzz), which did n
- [Requirements](#requirements)
- [Installation](#installation)
- [Basic Usage](#basic-usage)
- [Additional Arguments](#additional-arguments)
- [Matching](#matching)
- [Ratios](#ratios)
- [Matching & Ratios](#matching--ratios)
- [Strings](#strings)
- [Advanced Usage](#advanced-usage)
- [Keyword Arguments](#keyword-arguments)
- [Scoring Algorithms](#scoring-algorithms)
- [Links](#links)

## Requirements
Expand All @@ -32,107 +38,149 @@ pip install -U git+https://github.com/atomflunder/stringmatch

## Basic Usage

### Matching

The match functions allow you to compare 2 strings and check if they are "similar enough" to each other, or get the best match(es) from a list of strings:

```python
from stringmatch import Match, Ratio, Strings
from stringmatch import Match

match = Match()
ratio = Ratio()
strings = Strings()

# Basic usage:
match.match("searchlib", "srchlib") # returns True
match.match("searchlib", "something else") # returns False
# Checks if the strings are similar.
match.match("searchlib", "srchlib") # returns True
match.match("searchlib", "something else") # returns False

# Matching lists:
# Returns the best match(es) found in the list.
searches = ["searchli", "searhli", "search", "lib", "whatever", "s"]
match.get_best_match("searchlib", searches) # returns "searchli"
match.get_best_matches("searchlib", searches) # returns ['searchli', 'searhli', 'search']
match.get_best_match("searchlib", searches) # returns "searchli"
match.get_best_matches("searchlib", searches) # returns ['searchli', 'searhli', 'search']
```

### Ratios

# Ratios:
ratio.ratio("searchlib", "searchlib") # returns 100
ratio.ratio("searchlib", "srechlib") # returns 82
You can get the "ratio of similarity" between strings like this:

```python
from stringmatch import Ratio

ratio = Ratio()

# Getting the ratio between the two strings.
ratio.ratio("searchlib", "searchlib") # returns 100
ratio.ratio("searchlib", "srechlib") # returns 82

# Getting the ratio between the first string and the list of strings at once.
searches = ["searchlib", "srechlib"]
ratio.ratio_list("searchlib", searches) # returns [100, 82]
ratio.ratio_list("searchlib", searches) # returns [100, 82]
```

### Matching & Ratios

You can also get both the match and the ratio together in a tuple using these functions:

# Getting matches and ratios:
match.match_with_score("searchlib", "srechlib") # returns (True, 82)
```python
from stringmatch import Match

match = Match()
searches = ["test", "nope", "tset"]
match.get_best_match_with_score("test", searches) # returns ("test", 100)
match.get_best_matches_with_score("test", searches) # returns [("test", 100), ("tset", 75)]

# Modify strings:
# This is meant for internal use, but you can also use it yourself, if you choose to.
strings.latinise("Héllö, world!") # returns "Hello, world!"
strings.remove_punctuation("wh'at;, ever") # returns "what ever"
strings.only_letters("Héllö, world!") # returns "Hll world"
strings.ignore_case("test test!", lower=False) # returns "TEST TEST!"

match.match_with_ratio("searchlib", "srechlib") # returns (True, 82)
match.get_best_match_with_ratio("test", searches) # returns ("test", 100)
match.get_best_matches_with_ratio("test", searches) # returns [("test", 100), ("tset", 75)]
```

### Additional Arguments
You can pass in additional arguments for the `Match()` functions to customise your search further:
### Strings

This is primarily meant for internal usage, but you can also use this library to modify strings:

#### `score=int`
```python
from stringmatch import Strings

strings = Strings()

strings.latinise("Héllö, world!") # returns "Hello, world!"
strings.remove_punctuation("wh'at;, ever") # returns "what ever"
strings.only_letters("Héllö, world!") # returns "Hll world"
strings.ignore_case("test test!", lower=False) # returns "TEST TEST!"
```

## Advanced Usage

### Keyword Arguments
You can pass in additional arguments for the `Match()` functions to customise your search further:

**`score=70`**
The score cutoff for matching, by default set to 70.

```python
match("searchlib", "srechlib", score=85) # returns False
match("searchlib", "srechlib", score=70) # returns True
```

#### `limit=int`
---

The limit of how many matches to return. Only available for `Matches().get_best_matches()`. By default this is set to `5`.
**`limit=5`**
The limit of how many matches to return. Only available for `Matches().get_best_matches()`. If you want to return every match set this to 0. By default this is set to `5`.

```python
searches = ["limit 5", "limit 4", "limit 3", "limit 2", "limit 1", "limit 0"]
get_best_matches("limit 5", searches, limit=2) # returns ["limit 5", "limit 4"]
get_best_matches("limit 5", searches, limit=1) # returns ["limit 5"]
```

#### `latinise=bool`
---

**`latinise=False`**
Replaces special unicode characters with their latin alphabet equivalents. By default turned off.

```python
match("séärçh", "search", latinise=True) # returns True
match("séärçh", "search", latinise=False) # returns False
```

#### `ignore_case=bool`
---

**`ignore_case=False`**
If you want to ignore case sensitivity while searching. By default turned off.

```python
match("test", "TEST", ignore_case=True) # returns True
match("test", "TEST", ignore_case=False) # returns False
```

#### `remove_punctuation=bool`
---

Removes commonly used punctuation symbols from the strings, like `.,;:!?` and so on. Be careful when using this, because if you pass in a string that is only made up of punctuation symbols, you will get an `EmptySearchException`. By default turned off.
**`remove_punctuation=False`**
Removes commonly used punctuation symbols from the strings, like `.,;:!?` and so on. By default turned off.

```python
match("test,---....", "test", remove_punctuation=True) # returns True
match("test,---....", "test", remove_punctuation=False) # returns False
```

#### `only_letters=bool`
---

Removes every character that is not in the latin alphabet, a more extreme version of `remove_punctuation`. The same rules apply here, be careful when you use it or you might get an `EmptySearchException`. By default turned off.
**`only_letters=False`**
Removes every character that is not in the latin alphabet, a more extreme version of `remove_punctuation`. By default turned off.

```python
match("»»ᅳtestᅳ►", "test", only_letters=True) # returns True
match("»»ᅳtestᅳ►", "test", only_letters=False) # returns False
```

#### `scorer=str`
### Scoring Algorithms

The scoring algorithm to use, the available options are: [`"levenshtein"`](https://en.wikipedia.org/wiki/Levenshtein_distance), [`"jaro"`](https://en.wikipedia.org/wiki/Jaro–Winkler_distance#Jaro_similarity), [`"jaro_winkler"`](https://en.wikipedia.org/wiki/Jaro–Winkler_distance#Jaro–Winkler_similarity). Different algorithms will produce different results, obviously. By default set to `"levenshtein"`.
You can pass in different scoring algorithms when initialising the `Match()` and `Ratio()` classes.
The available options are: [`"levenshtein"`](https://en.wikipedia.org/wiki/Levenshtein_distance), [`"jaro"`](https://en.wikipedia.org/wiki/Jaro–Winkler_distance#Jaro_similarity), [`"jaro_winkler"`](https://en.wikipedia.org/wiki/Jaro–Winkler_distance#Jaro–Winkler_similarity).
Different algorithms will produce different results, obviously. By default set to `"levenshtein"`.

```python
match("test", "th test", scorer="levenshtein") # returns True (score = 73)
match("test", "th test", scorer="jaro_winkler") # returns False (score = 60)
levenshtein_matcher = Match(scorer="levenshtein")
jaro_winkler_matcher = Match(scorer="jaro_winkler")

levenshtein_matcher.match("test", "th test") # returns True (score = 73)
jaro_winkler_matcher.match("test", "th test") # returns False (score = 60)
```


Expand Down
3 changes: 1 addition & 2 deletions stringmatch/__init__.py
Original file line number Diff line number Diff line change
@@ -1,8 +1,7 @@
# flake8: noqa
from .exceptions import *
from .match import *
from .ratio import *
from .strings import *

__title__ = "stringmatch"
__version__ = "0.4.1"
__version__ = "0.5.0"
16 changes: 0 additions & 16 deletions stringmatch/exceptions.py

This file was deleted.

Loading

0 comments on commit 0d55b57

Please sign in to comment.