Skip to content

Commit

Permalink
docs: update README
Browse files Browse the repository at this point in the history
  • Loading branch information
bolinocroustibat committed Aug 27, 2024
1 parent ca01254 commit aea9d47
Showing 1 changed file with 33 additions and 14 deletions.
47 changes: 33 additions & 14 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,26 +22,44 @@ The hydra crawler is one of the components of the architecture. It will check if

## Dependencies

### System

This project uses `libmagic`, which needs to be installed on your system, eg:
`brew install libmagic` on MacOS, or `sudo apt-get install libmagic-dev` on Linux.

### Python

This project uses Python 3.9.

`brew install libmagic` on MacOS, or `sudo apt-get install libmagic-dev` on linux.
Project dependencies are listed in `pyproject.toml`, while dependencies are locked in `requirements.txt` (for production only deps) and in `requirements-dev.txt` files.

To install the exact same environment locally including the dev dependencies, use the lock file: `pip install -r requirements-dev.txt`, or `pip-sync requirements-dev.txt` with [pip-tools](https://pip-tools.readthedocs.io/en/stable/).

To update the lock files, you can use any modern Python package manager (except Poetry) like [pip-tools](https://pip-tools.readthedocs.io/en/stable/), [PDM](https://pdm.fming.dev/) or [uv](https://uv.readthedocs.io/en/latest/), while defining `requirement.txt` and `requirement-dev.txt` as the output lock files.
With [pip-tools](https://pip-tools.readthedocs.io/en/stable/), the command is:
`pip-compile requirements.txt && pip-compile requirements-dev.txt`.

## CLI

### Create database structure

Install udata-hydra dependencies and cli.
`poetry install`
Create a Python 3.9 virtual environment and activate it:
`python3 -m venv .venv && source .venv/bin/activate`

Install udata-hydra dependencies and cli:
`pip install -r requirements.txt && pip install -r requirements-dev.txt`
...or with `pip-sync` from [pip-tools](https://pip-tools.readthedocs.io/en/stable/):
`pip-sync requirements.txt && pip-sync requirements-dev.txt`

`poetry run udata-hydra migrate`
`python3 udata-hydra migrate`

### Load (UPSERT) latest catalog version from data.gouv.fr

`poetry run udata-hydra load-catalog`
`python3 udata-hydra load-catalog`

## Crawler

`poetry run udata-hydra-crawl`
`python3 udata-hydra-crawl`

It will crawl (forever) the catalog according to config set in `udata_hydra/config.toml`, with a default config in `udata_hydra/config_default.toml`.

Expand All @@ -57,11 +75,11 @@ If an URL matches one of the `EXCLUDED_PATTERNS`, it will never be checked.

A job queuing system is used to process long-running tasks. Launch the worker with the following command:

`poetry run rq worker -c udata_hydra.worker`
`python3 rq worker -c udata_hydra.worker`

Monitor worker status:

`poetry run rq info -c udata_hydra.worker --interval 1`
`python3 rq info -c udata_hydra.worker --interval 1`

## CSV conversion to database

Expand All @@ -71,13 +89,13 @@ Converted CSV tables will be stored in the database specified via `config.DATABA

To run the tests, you need to launch the database, the test database, and the Redis broker with `docker compose -f docker-compose.yml -f docker-compose.test.yml -f docker-compose.broker.yml up -d`.

Then you can run the tests with `poetry run pytest`.
Then you can run the tests with `pytest`.

To run a specific test file, you can pass the path to the file to pytest, like this: `poetry run pytest tests/test_app.py`.
To run a specific test file, you can pass the path to the file to pytest, like this: `pytest tests/test_app.py`.

To run a specific test function, you can pass the path to the file and the name of the function to pytest, like this: `poetry run pytest tests/test_app.py::test_get_latest_check`.
To run a specific test function, you can pass the path to the file and the name of the function to pytest, like this: `pytest tests/test_app.py::test_get_latest_check`.

If you would like to see print statements as they are executed, you can pass the -s flag to pytest (`poetry run pytest -s`). However, note that this can sometimes be difficult to parse.
If you would like to see print statements as they are executed, you can pass the -s flag to pytest (`pytest -s`). However, note that this can sometimes be difficult to parse.

### Tests coverage

Expand Down Expand Up @@ -105,8 +123,9 @@ RESOURCES_ANALYSER_API_KEY = "api_key_to_change"
### Run

```bash
poetry install
poetry run adev runserver udata_hydra/app.py
python3 -m venv .venv && source .venv/bin/activate
pip install .
python3 adev runserver udata_hydra/app.py
```

### Routes/endpoints
Expand Down

0 comments on commit aea9d47

Please sign in to comment.