We are very pleased that you would like to use our tool. If you want to skip the possibly hour-long initial setup, we recommend recommend you to visit the already set up version at csgender.org. You can instantly start analyzing the data and we also try our best to update the data used there every 3 months.
- Install python virtualenv:
pip3.9 install virtualenv
- Create a virtual environment:
python3.9 -m virtualenv gap_env
- Activate the virtual environment:
source gap_env/bin/activate
- Install the dependencies:
pip3.9 install -r requirements.txt
csv files of all level-1 elements of dblp.xml can be parsed with this component. To do so, perform the following steps:
-
- Prepare the Environment if not already done.
- Activate the virtual environment:
source gap_env/bin/activate
if not already done in (i).
- Create a directory for the dblp dump:
mkdir dblp
- Download the dblp.xml and the relevant dblp-20xx-xx-xx.dtd file from the dblp xml dump.
- Store the downloaded files in the 'dblp' directory.
- Run the parser to generate the csv files to be stored in
csv/
:python3.9 dblp_parser.py
-
- Prepare the Environment if not already done.
- Activate the virtual environment:
source gap_env/bin/activate
if not already done in (i). - Parse the dblp xml if not already done.
- If you already have gender-annotated first names from the GenderAPI, put them under
csv/GenderAPI/
- Run the database script to fill the database and also save the tables as readable csv files under
csv/db/
:python3.9 database.py
A csv file with all unknown first name can be found under csv/GenderAPI/unprocessed/
. It contains first names that
where unknown to the GenderAPI in the past (this may change over time!) as well as names that we did not requested from
the GenderAPI yet. Pass it to the GenderAPI and start with the first step again to increase the gender
determination rate.
-
- Prepare the Environment if not already done.
- Activate the virtual environment:
source gap_env/bin/activate
if not already done in (i). - Parse the dblp xml if not already done.
- Propagate data to the database if not already done.
- Install Streamlit
- Run the website with
streamlit run prototype.py
. A new browser tab will open with the app.