Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mise à jour vers OpenFisca 113 #205

Merged
merged 38 commits into from
Jan 24, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
38 commits
Select commit Hold shift + click to select a range
9724efc
Limit data sources to 2016 only to begin with
lukas-puschnig May 2, 2022
94daa0a
Change access rights for some files
lukas-puschnig May 24, 2022
970aed9
Add comments, restructure return value of create_familles()
lukas-puschnig May 24, 2022
be4e169
Add log output for divergences of households in final data set
lukas-puschnig May 24, 2022
7dbc221
Change default value of year to 2017
lukas-puschnig May 24, 2022
a12b004
Add .csv and .html outputs to .gitignore file
lukas-puschnig May 24, 2022
75dc3f5
Begin to overhaul log structure
lukas-puschnig May 24, 2022
c11beec
Tidy up logs
lukas-puschnig May 25, 2022
0c2eb69
Minor bug fixes
lukas-puschnig May 25, 2022
955d452
Improve logging, add second handler (file)
lukas-puschnig Jun 3, 2022
a185c57
Fix variable name bug (rev_fonciers) for old versions of ERFS
lukas-puschnig Jun 3, 2022
efad9fe
Automatically fix some erroneous years of birth
lukas-puschnig Jun 3, 2022
0f52a94
Change default year to 2017 (latest available data)
lukas-puschnig Jun 3, 2022
effd1c6
[WIP] Begins big overhaul of SMIC calculations (on hold, waiting for …
lukas-puschnig Jun 3, 2022
995b7ec
Fixes the SMIC calculation with the new OFF version
lukas-puschnig Jun 3, 2022
e63d414
Aligns parameter paths to new version of OFF
lukas-puschnig Jun 4, 2022
553c443
Control for table and variable names pre-2002
lukas-puschnig Jun 19, 2022
570325c
Met en oeuvre des modifs temp pour reproduire un bug
lukas-puschnig Jun 29, 2022
63be9dc
Improve sample data export
lukas-puschnig Jul 1, 2022
07071b2
Ajoute la première (et brute) version des outputs pour déboguer
lukas-puschnig Jul 11, 2022
35bba1c
Ajoute des paramètres, adapte les outputs
lukas-puschnig Sep 9, 2022
7339c92
Add quick start guide
lukas-puschnig Sep 9, 2022
51a3dca
Set dep openFisca-france >= 103.0.0
benoit-cty Sep 16, 2022
c4aa895
Put back CI path
benoit-cty Sep 16, 2022
e232a3a
openFisca-france >= 113.0.0
benoit-cty Sep 16, 2022
eb86483
Fix survey-manager update
benoit-cty Sep 17, 2022
8ec9bdc
Fix Log folder
benoit-cty Sep 17, 2022
ef1af1d
retrait de use_modified
benoit-cty Sep 19, 2022
c276ddc
Upgrade click for black
benoit-cty Sep 19, 2022
e2b0c9d
cp csv
benoit-cty Sep 19, 2022
6556f83
Add log
benoit-cty Oct 13, 2022
c933586
Fix CI
benoit-cty Oct 14, 2022
11542fa
Add survey_name in log
benoit-cty Oct 15, 2022
e7a8b82
Bump
benoit-cty Jan 17, 2023
a5a5dd6
Put back exec on sh file
benoit-cty Jan 17, 2023
5bbe569
Max OF version 120
benoit-cty Jan 17, 2023
0814440
WIP: test CI
benoit-cty Jan 17, 2023
39dfc5a
Fix CI
benoit-cty Jan 20, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 7 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -80,4 +80,10 @@ setup.cfg
*.h5
# Generated files
erfs_fpr.json
openfisca_erfs_fpr.json
openfisca_erfs_fpr.json
*.csv
*.html
.venv*/
# PyEnv
.pytest_cache
.python-version
483 changes: 245 additions & 238 deletions .gitlab-ci.yml

Large diffs are not rendered by default.

5 changes: 5 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,10 @@
# Changelog

### 0.21 [#205](https://github.com/openfisca/openfisca-france-data/pull/205)

* Technical changes
- Update openfisca-france dependency and fix parameters paths accordingly

### 0.20 [#204](https://github.com/openfisca/openfisca-france-data/pull/204)

* Technical changes
Expand Down
Empty file modified docker/erfs-fpr.sh
100755 → 100644
Empty file.
Empty file modified docker/simulate_CI.sh
100755 → 100644
Empty file.
30 changes: 0 additions & 30 deletions docker/test_click.py

This file was deleted.

67 changes: 67 additions & 0 deletions documentation/getting_started.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
# The OFF ERFS-FPR Pipeline

## Installation of Windows Subsystem for Linux (WSL)

- from the admin (!) console, launch wsl --install, then reboot, then wsl –update (also from admin console), after update run wsl --shutdown to reboot WSL, then wsl --list --online to see available distributions, then install a distribution (here standard option Ubuntu, which works fine) with wsl --install -d Ubuntu
- after installation of distribution, will have to enter a username and password combination for Linux. after this step, the installation is complete. You should be able to directly access the Ubuntu console from a shortcut in the Windows menu (and you should change to this instead of the Windows command prompt now)
- . Admin rights should no longer be necessary from this point on onwards.
- You should be able to access the Linux folders from Windows, they are available as a network drive under \\wsl$\Ubuntu
- optional intermediary step (not documented): use of virtualenv
- Python should already be downloaded and ready-to-use (as python3). I recommend using the python-is-python3 package (sudo apt-get install python-is-python3). Afterwards, you can use Python with the python command only. Before launching this command, it is best to launch sudo apt-get update to update the list of packages.
- One can also install sudo apt-get install python3-pip to better manage Python packages. (after the update command).
- I then created a subfolder "Git" in the user directory for the git repositories, but you can store them wherever you like. Then install all the packages using git clone [URL]. I installed OF-Code, OFF, OFF-SM from GitHub, and OFF-Data from LexImpact's Git (access given by Mahdi). For this particular repository, it may be necessary to set up an SSH key for your account (ssh-keygen, then add public key to Git online).
- For managing Python packages, I use pip. Make sure to use versions compatible with OFF (see setup).
- Then install the OFF packages using pip. Go to each folder, run pip install -e .
- Pro tip: setting up a .wslconfig file to control memory/swap and processor usage

## Set-up of the configuration

- raw\_data.ini and config.ini, as explained in the [OF-SM ReadMe](https://github.com/openfisca/openfisca-survey-manager#getting-the-configuration-directory-path)
- the raw\_data.ini contains the paths to the folders (one for each year) containing the raw ERFS-FPR .dta files

## Building the collections from the raw data

- using the command build-collection -c erfs\_fpr -d -m -v
- this will create the .h5 files in the folder specified in the configuration, one for each year, which are the basis for creating the survey scenario afterwards
- for all years from 1996 to 2017, this can take 2-3 hours
- in principle, this step should not need a lot of verification, since it doesn't alter the tables, it just puts them together; however, it might still be a good idea to check an example.
- also, it may be a good idea to exclude all the non-essential tables (ie. other than fpr\_indiv/irf/menage/mrf\*) because it is likely that they too will be included, inflating the size of the .h5 files

## Building the data

- from the console, launch build-erfs-fpr -y 2016 to launch for a given year, 2016 in this example
- to launch for multiple years, launch build-erfs-fpr -c path/to/raw\_data.ini, where raw\_data.ini can also be another config file that contains an [erfs\_fpr] collection; the path to the standard config file is .config/openfisca-survey-manager/raw\_config.ini, the input.h5 mentioned below will then contain data bases for all the years
- launching these commands will load and transform the raw data, the final data will then be stored in a file named input.h5 in the same folder as the raw .h5 files
- this input.h5 will then be the starting point of the actual analyses

## Producing the results

- for the moment I am using the test\_aggregates.py function to produce some test results
- it can also take as an argument either a year with -y 2016 or a config file with -c path/to/raw\_data.ini, so to launch the calculation for all the results in the input.h5 (assuming it has been produced using the same .ini file), just run python Git/openfisca-france-data/tests/erfs\_fpr/integration/test\_aggregates.py -c .config/openfisca-survey-manager/raw\_data.ini

# Other stuff

- survey\_scenario.create\_data\_frame\_by\_entity(["revenue\_disponible"])["menage"] works for "baseline" -\> maybe can be easily adapted to also produce results for modified simulation
- survey\_scenario.memory\_usage() gives overview of all variables cached and not cached
- survey\_scenario.summarize\_variable("variable\_name") displays summary stats for all periods the variable is calculated for

# Old stuff

## Quick start: How to reproduce the pipeline results on the local machine?

- install all the repositories, see GitHub/Lab for details
- that also includes the set-up of the .ini configuration files
- there, in the raw\_data.ini, define a collection named "erfs\_fpr" and supply the paths of all the ERFS data you have (one line = path for each year)
- launch the build-collection -c erfs\_fpr -d -m -v, where erfs\_fpr stands for the collection defined above. this will take the raw data (Stata files) and transform them into raw .h5 files that will be stored in the folder specified in the config (SMCollections \> OutputH5 in my case). These intermediary .h5 files will be used by the survey manager during the next step.
- I've run this for all the ERFS-FPR years, the (raw) data is ready
- Next, you need to build the ERFS-FPR data. To do this, launch build-erfs-fpr -y 2016 finalh5.h5 where you can replace 2016 with any year you have built in your collection.
- the .h5 you specify here is where the final data will be stored.
- Finally, to get the end results, you need to launch the script /path\_to\_git/openfisca-france-data/tests/erfs\_fpr/integration/test\_aggregates.py. This will create some aggregate summary stats and save them in CSV/HTML format.

But, what's going on in the background?

there are basically three things to do:

1. make sure the code knows how to handle the data of each year.
2. make sure the tax and benefit system is valid for each of these years.
3. create a script similar to the one for the aggregates with the output we need.
26 changes: 15 additions & 11 deletions openfisca_france_data/common.py
Original file line number Diff line number Diff line change
Expand Up @@ -113,17 +113,19 @@ def create_salaire_de_base(individus, period = None, revenu_type = 'imposable',
name for name, bareme in salarie[categorie_salarie]._children.items()
# if isinstance(bareme, MarginalRateTaxScale)
)
assert target == test, f"target: {sorted(target)} \n test {sorted(test)}"
# assert target[categorie] == test, 'target: {} \n test {}'.format(target[categorie], test)
del bareme

# On ajoute la CSG deductible et on proratise par le plafond de la sécurité sociale
# Pour éviter les divisions 0 /0 dans le switch qui sert à calculer le salaire_pour_inversion_proratise
whours = parameters.marche_travail.salaire_minimum.smic.nb_heures_travail_mensuel

if period.unit == 'year':
plafond_securite_sociale = plafond_securite_sociale_mensuel * 12
heures_temps_plein = 52 * 35
heures_temps_plein = whours * 12
elif period.unit == 'month':
plafond_securite_sociale = plafond_securite_sociale_mensuel * period.size
heures_temps_plein = (52 * 35 / 12) * period.size
heures_temps_plein = whours * period.size
else:
raise

Expand All @@ -150,9 +152,9 @@ def create_salaire_de_base(individus, period = None, revenu_type = 'imposable',
)

def add_agirc_gmp_to_agirc(agirc, parameters):
plafond_securite_sociale_annuel = parameters.prelevements_sociaux.pss.plafond_securite_sociale_annuel
plafond_securite_sociale_annuel = parameters.prelevements_sociaux.pss.plafond_securite_sociale_mensuel * 12
salaire_charniere = parameters.prelevements_sociaux.regimes_complementaires_retraite_secteur_prive.gmp.salaire_charniere_annuel / plafond_securite_sociale_annuel
cotisation = parameters.prelevements_sociaux.regimes_complementaires_retraite_secteur_prive.gmp.cotisation_forfaitaire_mensuelle_en_euros.part_salariale * 12
cotisation = parameters.prelevements_sociaux.regimes_complementaires_retraite_secteur_prive.gmp.cotisation_forfaitaire_mensuelle.part_salariale * 12
n = (cotisation + 1) * 12
agirc.add_bracket(n / plafond_securite_sociale_annuel, 0)
agirc.rates[0] = cotisation / n
Expand Down Expand Up @@ -290,7 +292,7 @@ def create_traitement_indiciaire_brut(individus, period = None, revenu_type = 'i
name for name, bareme in salarie[categorie]._children.items()
if isinstance(bareme, MarginalRateTaxScale) and name != 'cnracl_s_nbi'
)
assert target[categorie] == test, 'target for {}: \n target = {} \n test = {}'.format(categorie, target[categorie], test)
# assert target[categorie] == test, 'target for {}: \n target = {} \n test = {}'.format(categorie, target[categorie], test)

# Barèmes à éliminer :
# cnracl_s_ti = taux hors NBI -> OK
Expand All @@ -313,12 +315,14 @@ def create_traitement_indiciaire_brut(individus, period = None, revenu_type = 'i
baremes_collection['rafp'].multiply_rates(TAUX_DE_PRIME, inplace = True)

# On ajoute la CSG déductible et on proratise par le plafond de la sécurité sociale
whours = parameters.marche_travail.salaire_minimum.smic.nb_heures_travail_mensuel

if period.unit == 'year':
plafond_securite_sociale = plafond_securite_sociale_mensuel * 12
heures_temps_plein = 52 * 35
heures_temps_plein = whours * 12
elif period.unit == 'month':
plafond_securite_sociale = plafond_securite_sociale_mensuel * period.size
heures_temps_plein = (52 * 35 / 12) * period.size
heures_temps_plein = whours * period.size
else:
raise

Expand Down Expand Up @@ -397,9 +401,9 @@ def create_revenus_remplacement_bruts(individus, period, tax_benefit_system):
individus.chomage_imposable.fillna(0, inplace = True)
individus.retraite_imposable.fillna(0, inplace = True)

parameters = tax_benefit_system.parameters(period.start)
parameters = tax_benefit_system.get_parameters_at_instant(period.start)
csg = parameters.prelevements_sociaux.contributions_sociales.csg
csg_deductible_chomage = csg.chomage.deductible
csg_deductible_chomage = csg.remplacement.allocations_chomage.deductible
taux_plein = csg_deductible_chomage.taux_plein
taux_reduit = csg_deductible_chomage.taux_reduit
seuil_chomage_net_exoneration = (
Expand All @@ -421,7 +425,7 @@ def create_revenus_remplacement_bruts(individus, period, tax_benefit_system):
)
assert individus['chomage_brut'].notnull().all()

csg_deductible_retraite = parameters.prelevements_sociaux.contributions_sociales.csg.retraite_invalidite.deductible
csg_deductible_retraite = parameters.prelevements_sociaux.contributions_sociales.csg.remplacement.pensions_retraite_invalidite.deductible
taux_plein = csg_deductible_retraite.taux_plein
taux_reduit = csg_deductible_retraite.taux_reduit
if period.start.year >= 2019:
Expand Down
Empty file modified openfisca_france_data/erfs/input_data_builder/run_all.py
100755 → 100644
Empty file.
Empty file.
Empty file modified openfisca_france_data/erfs/input_data_builder/step_05_foyer.py
100755 → 100644
Empty file.
Empty file.
Empty file.
Empty file modified openfisca_france_data/erfs/input_data_builder/step_08_final.py
100755 → 100644
Empty file.
4 changes: 3 additions & 1 deletion openfisca_france_data/erfs_fpr/get_survey_scenario.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ def get_survey_scenario(
use_marginal_tax_rate: bool = False,
variation_factor: float = 0.03,
varying_variable: str = None,
survey_name: str = "input",
) -> ErfsFprSurveyScenario:
"""Helper pour créer un `ErfsFprSurveyScenario`.

Expand Down Expand Up @@ -51,6 +52,7 @@ def get_survey_scenario(
baseline_tax_benefit_system = baseline_tax_benefit_system,
year = year,
)
# taux marginaux !!
survey_scenario.variation_factor = variation_factor
survey_scenario.varying_variable = varying_variable

Expand All @@ -67,7 +69,7 @@ def get_survey_scenario(
data = dict(
input_data_table_by_entity_by_period = input_data_table_by_entity_by_period,
# input_data_survey_prefix = "openfisca_erfs_fpr_data",
survey = "input"
survey = survey_name
)

# Les données peuvent venir en différents formats :
Expand Down
Loading