Gimie (GIt Meta Information Extractor) is a python library and command line tool to extract structured metadata from git repositories.
Scientific code repositories contain valuable metadata which can be used to enrich existing catalogues, platforms or databases. This tool aims to easily extract structured metadata from a generic git repositories. It can extract extract metadata from the Git provider (GitHub or GitLab) or from the git index itself.
Using Gimie: easy peasy, it's a 3 step process.
To install the stable version on PyPI:
pip install gimie
To install the dev version from github:
pip install git+https://github.com/sdsc-ordes/gimie.git@main#egg=gimie
Gimie is also available as a docker container hosted on the Github container registry:
docker pull ghcr.io/sdsc-ordes/gimie:latest
# The access token can be provided as an environment variable
docker run -e GITHUB_TOKEN=$GITHUB_TOKEN ghcr.io/sdsc-ordes/gimie:latest gimie data <repo>
In order to access the github api, you need to provide a github token with the read:org
scope.
New to access tokens? Or don't know how to get your Github / Gitlab token ?
Have no fear, see here for Github tokens and here for Gitlab tokens. (Note: tokens are as precious as passwords! Treat them as such.)
Gimie will use your access tokens to gather information for you. If you want info about a Github repo, Gimie needs your Github token; if you want info about a Gitlab Project then Gimie needs your Gitlab token.
Add your tokens one by one in your terminal: your Github token:
export GITHUB_TOKEN=
and/or your Gitlab token:
export GITLAB_TOKEN=
gimie data https://github.com/numpy/numpy
(want a Gitlab project instead? Just replace the URL in the command line)
from gimie.project import Project
proj = Project("https://github.com/numpy/numpy")
# To retrieve the rdflib.Graph object
g = proj.extract()
# To retrieve the serialized graph
g_in_ttl = g.serialize(format='ttl')
print(g_in_ttl)
For more advanced use see the documentation.
The default output is Turtle, a textual syntax for RDF data model. We follow the schema recommended by codemeta.
Supported formats are turtle, json-ld and n-triples (by specifying the --format
argument in your call i.e. gimie data https://github.com/numpy/numpy --format 'ttl'
).
With no specifications, Gimie will print results in the terminal. Want to save Gimie output to a file? Add your file path to the end : gimie data https://github.com/numpy/numpy > path_to_output/gimie_output.ttl
All contributions are welcome. New functions and classes should have associated tests and docstrings following the numpy style guide.
The code formatting standard we use is black, with --line-length=79
to follow PEP8 recommendations. We use pytest as our testing framework. This project uses pyproject.toml to define package information, requirements and tooling configuration.
activate a conda or virtual environment with Python 3.8 or higher
git clone https://github.com/sdsc-ordes/gimie && cd gimie
make install
run tests:
make test
run checks:
make check
for an easier use Github/Gitlab APIs, place your access tokens in the .env
file: (and don't worry, the .gitignore
will ignore them when you push to GitHub)
cp .env.dist .env
build documentation:
make doc
Releases are done via github release
- a release will trigger a github workflow to publish the package on Pypi
- Make sure to update to a new version in
pyproject.toml
andconf.py
before making the release - It is possible to test the publishing on Pypi.test by running a manual workflow: go to github actions and run the Workflow: 'Publish on Pypi Test'