Fuzzy Name-matching

Name matching algorithm for company to CRSP permnos (US. public firms)

Please use matcher.py as it reflects a new wave of disambiguation efforts.

To help with this project, add name pair that should be matched to the white list and bad matches in the black list. I will periodically look at those problematic ones and further improve this algo

how to use this

git clone this repo. If you do not know how to use git. Download this repo as zip and unzip it.

Make sure you have Python 3.6+ installed and install any missing packages it tells you to install

pip install pandas rapidfuzz nltk loguru

Place your name file in to the unzipped folder. The name file has to be in the following csv format:

1,apple inc
2,microsoft corp
3,whatever inc
...

where you have an index column and the name colume to match.

After having this file, run

./matcher.py name.csv

This will result in result.csv file that contains the matched results like the following:

1,apple inc, 12345, APPLE INC, 100
...

The result columns are: your_index, your_name, permno, name_in_CRSP, matching_score

All results are only those the program thinks they are good matches. A high score only indicate they are textually similar, but the matched one can have low score, indicating good matches but texually unsimilar. It is only used for further processing if needed. You can safely ignore it.

I pull the latest CRSP stocknames once for a while, if you want to match on your own file, pass -b option and your file have to have the same specs as the stocknames:

./matcher.py name.csv -b your_supplied_file.csv

Name		Name	Last commit message	Last commit date
Latest commit History 119 Commits
utils		utils
.gitignore		.gitignore
README.md		README.md
__init__.py		__init__.py
black_list.csv		black_list.csv
comp_names.csv		comp_names.csv
coname.py		coname.py
firstname.txt		firstname.txt
location.csv		location.csv
locations.csv		locations.csv
match_pair.py		match_pair.py
matcher.py		matcher.py
names_decode.csv		names_decode.csv
postproc-ppl.py		postproc-ppl.py
requirements.txt		requirements.txt
stocknames.csv		stocknames.csv
stocknames_mainclass.csv		stocknames_mainclass.csv
surname.txt		surname.txt
white_list.csv		white_list.csv
words_dictionary.json		words_dictionary.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Fuzzy Name-matching

how to use this

About

Releases

Packages

Contributors 2

Languages

leoliu0/name_matching

Folders and files

Latest commit

History

Repository files navigation

Fuzzy Name-matching

how to use this

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages