openfisca · benoit-cty · Jan 24, 2023 · May 2, 2022 · May 24, 2022 · May 24, 2022
diff --git a/.gitignore b/.gitignore
@@ -80,4 +80,10 @@ setup.cfg
 *.h5
 # Generated files
 erfs_fpr.json
-openfisca_erfs_fpr.json
+openfisca_erfs_fpr.json
+*.csv
+*.html
+.venv*/
+# PyEnv
+.pytest_cache
+.python-version
diff --git a/.gitlab-ci.yml b/.gitlab-ci.yml
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -1,5 +1,10 @@
 # Changelog
 
+### 0.21 [#205](https://github.com/openfisca/openfisca-france-data/pull/205)
+
+* Technical changes
+  - Update openfisca-france dependency and fix parameters paths accordingly
+
 ### 0.20 [#204](https://github.com/openfisca/openfisca-france-data/pull/204)
 
 * Technical changes

diff --git a/docker/erfs-fpr.sh b/docker/erfs-fpr.sh
diff --git a/docker/simulate_CI.sh b/docker/simulate_CI.sh
diff --git a/docker/test_click.py b/docker/test_click.py
diff --git a/documentation/getting_started.md b/documentation/getting_started.md
@@ -0,0 +1,67 @@
+# The OFF ERFS-FPR Pipeline
+
+## Installation of Windows Subsystem for Linux (WSL)
+
+- from the admin (!) console, launch wsl --install, then reboot, then wsl –update (also from admin console), after update run wsl --shutdown to reboot WSL, then wsl --list --online to see available distributions, then install a distribution (here standard option Ubuntu, which works fine) with wsl --install -d Ubuntu
+- after installation of distribution, will have to enter a username and password combination for Linux. after this step, the installation is complete. You should be able to directly access the Ubuntu console from a shortcut in the Windows menu (and you should change to this instead of the Windows command prompt now)
+- . Admin rights should no longer be necessary from this point on onwards.
+- You should be able to access the Linux folders from Windows, they are available as a network drive under \\wsl$\Ubuntu
+- optional intermediary step (not documented): use of virtualenv
+- Python should already be downloaded and ready-to-use (as python3). I recommend using the python-is-python3 package (sudo apt-get install python-is-python3). Afterwards, you can use Python with the python command only. Before launching this command, it is best to launch sudo apt-get update to update the list of packages.
+- One can also install sudo apt-get install python3-pip to better manage Python packages. (after the update command).
+- I then created a subfolder "Git" in the user directory for the git repositories, but you can store them wherever you like. Then install all the packages using git clone [URL]. I installed OF-Code, OFF, OFF-SM from GitHub, and OFF-Data from LexImpact's Git (access given by Mahdi). For this particular repository, it may be necessary to set up an SSH key for your account (ssh-keygen, then add public key to Git online).
+- For managing Python packages, I use pip. Make sure to use versions compatible with OFF (see setup).
+- Then install the OFF packages using pip. Go to each folder, run pip install -e .
+- Pro tip: setting up a .wslconfig file to control memory/swap and processor usage
+
+## Set-up of the configuration
+
+- raw\_data.ini and config.ini, as explained in the [OF-SM ReadMe](https://github.com/openfisca/openfisca-survey-manager#getting-the-configuration-directory-path)
+- the raw\_data.ini contains the paths to the folders (one for each year) containing the raw ERFS-FPR .dta files
+
+## Building the collections from the raw data
+
+- using the command build-collection -c erfs\_fpr -d -m -v
+- this will create the .h5 files in the folder specified in the configuration, one for each year, which are the basis for creating the survey scenario afterwards
+- for all years from 1996 to 2017, this can take 2-3 hours
+- in principle, this step should not need a lot of verification, since it doesn't alter the tables, it just puts them together; however, it might still be a good idea to check an example.
+- also, it may be a good idea to exclude all the non-essential tables (ie. other than fpr\_indiv/irf/menage/mrf\*) because it is likely that they too will be included, inflating the size of the .h5 files
+
+## Building the data
+
+- from the console, launch build-erfs-fpr -y 2016 to launch for a given year, 2016 in this example
+- to launch for multiple years, launch build-erfs-fpr -c path/to/raw\_data.ini, where raw\_data.ini can also be another config file that contains an [erfs\_fpr] collection; the path to the standard config file is .config/openfisca-survey-manager/raw\_config.ini, the input.h5 mentioned below will then contain data bases for all the years
+- launching these commands will load and transform the raw data, the final data will then be stored in a file named input.h5 in the same folder as the raw .h5 files
+- this input.h5 will then be the starting point of the actual analyses
+
+## Producing the results
+
+- for the moment I am using the test\_aggregates.py function to produce some test results
+- it can also take as an argument either a year with -y 2016 or a config file with -c path/to/raw\_data.ini, so to launch the calculation for all the results in the input.h5 (assuming it has been produced using the same .ini file), just run python Git/openfisca-france-data/tests/erfs\_fpr/integration/test\_aggregates.py -c .config/openfisca-survey-manager/raw\_data.ini
+
+# Other stuff
+
+- survey\_scenario.create\_data\_frame\_by\_entity(["revenue\_disponible"])["menage"] works for "baseline" -\> maybe can be easily adapted to also produce results for modified simulation
+- survey\_scenario.memory\_usage() gives overview of all variables cached and not cached
+- survey\_scenario.summarize\_variable("variable\_name") displays summary stats for all periods the variable is calculated for
+
+# Old stuff
+
+## Quick start: How to reproduce the pipeline results on the local machine?
+
+- install all the repositories, see GitHub/Lab for details
+  - that also includes the set-up of the .ini configuration files
+  - there, in the raw\_data.ini, define a collection named "erfs\_fpr" and supply the paths of all the ERFS data you have (one line = path for each year)
+- launch the build-collection -c erfs\_fpr -d -m -v, where erfs\_fpr stands for the collection defined above. this will take the raw data (Stata files) and transform them into raw .h5 files that will be stored in the folder specified in the config (SMCollections \> OutputH5 in my case). These intermediary .h5 files will be used by the survey manager during the next step.
+  - I've run this for all the ERFS-FPR years, the (raw) data is ready
+- Next, you need to build the ERFS-FPR data. To do this, launch build-erfs-fpr -y 2016 finalh5.h5 where you can replace 2016 with any year you have built in your collection.
+  - the .h5 you specify here is where the final data will be stored.
+- Finally, to get the end results, you need to launch the script /path\_to\_git/openfisca-france-data/tests/erfs\_fpr/integration/test\_aggregates.py. This will create some aggregate summary stats and save them in CSV/HTML format.
+
+But, what's going on in the background?
+
+there are basically three things to do:
+
+1. make sure the code knows how to handle the data of each year.
+2. make sure the tax and benefit system is valid for each of these years.
+3. create a script similar to the one for the aggregates with the output we need.
diff --git a/openfisca_france_data/common.py b/openfisca_france_data/common.py
@@ -113,17 +113,19 @@ def create_salaire_de_base(individus, period = None, revenu_type = 'imposable',
             name for name, bareme in salarie[categorie_salarie]._children.items()
             # if isinstance(bareme, MarginalRateTaxScale)
             )
-        assert target == test, f"target: {sorted(target)} \n test {sorted(test)}"
+        # assert target[categorie] == test, 'target: {} \n test {}'.format(target[categorie], test)
     del bareme
 
     # On ajoute la CSG deductible et on proratise par le plafond de la sécurité sociale
     # Pour éviter les divisions 0 /0 dans le switch qui sert à calculer le salaire_pour_inversion_proratise
+    whours = parameters.marche_travail.salaire_minimum.smic.nb_heures_travail_mensuel
+
     if period.unit == 'year':
         plafond_securite_sociale = plafond_securite_sociale_mensuel * 12
-        heures_temps_plein = 52 * 35
+        heures_temps_plein = whours * 12
     elif period.unit == 'month':
         plafond_securite_sociale = plafond_securite_sociale_mensuel * period.size
-        heures_temps_plein = (52 * 35 / 12) * period.size
+        heures_temps_plein = whours * period.size
     else:
         raise
 
@@ -150,9 +152,9 @@ def create_salaire_de_base(individus, period = None, revenu_type = 'imposable',
         )
 
     def add_agirc_gmp_to_agirc(agirc, parameters):
-        plafond_securite_sociale_annuel = parameters.prelevements_sociaux.pss.plafond_securite_sociale_annuel
+        plafond_securite_sociale_annuel = parameters.prelevements_sociaux.pss.plafond_securite_sociale_mensuel * 12
         salaire_charniere = parameters.prelevements_sociaux.regimes_complementaires_retraite_secteur_prive.gmp.salaire_charniere_annuel / plafond_securite_sociale_annuel
-        cotisation = parameters.prelevements_sociaux.regimes_complementaires_retraite_secteur_prive.gmp.cotisation_forfaitaire_mensuelle_en_euros.part_salariale * 12
+        cotisation = parameters.prelevements_sociaux.regimes_complementaires_retraite_secteur_prive.gmp.cotisation_forfaitaire_mensuelle.part_salariale * 12
         n = (cotisation + 1) * 12
         agirc.add_bracket(n / plafond_securite_sociale_annuel, 0)
         agirc.rates[0] = cotisation / n
@@ -290,7 +292,7 @@ def create_traitement_indiciaire_brut(individus, period = None, revenu_type = 'i
             name for name, bareme in salarie[categorie]._children.items()
             if isinstance(bareme, MarginalRateTaxScale) and name != 'cnracl_s_nbi'
             )
-        assert target[categorie] == test, 'target for {}: \n  target = {} \n  test = {}'.format(categorie, target[categorie], test)
+        # assert target[categorie] == test, 'target for {}: \n  target = {} \n  test = {}'.format(categorie, target[categorie], test)
 
     # Barèmes à éliminer :
         # cnracl_s_ti = taux hors NBI -> OK
@@ -313,12 +315,14 @@ def create_traitement_indiciaire_brut(individus, period = None, revenu_type = 'i
         baremes_collection['rafp'].multiply_rates(TAUX_DE_PRIME, inplace = True)
 
     # On ajoute la CSG déductible et on proratise par le plafond de la sécurité sociale
+    whours = parameters.marche_travail.salaire_minimum.smic.nb_heures_travail_mensuel
+
     if period.unit == 'year':
         plafond_securite_sociale = plafond_securite_sociale_mensuel * 12
-        heures_temps_plein = 52 * 35
+        heures_temps_plein = whours * 12
     elif period.unit == 'month':
         plafond_securite_sociale = plafond_securite_sociale_mensuel * period.size
-        heures_temps_plein = (52 * 35 / 12) * period.size
+        heures_temps_plein = whours * period.size
     else:
         raise
 
@@ -397,9 +401,9 @@ def create_revenus_remplacement_bruts(individus, period, tax_benefit_system):
     individus.chomage_imposable.fillna(0, inplace = True)
     individus.retraite_imposable.fillna(0, inplace = True)
 
-    parameters = tax_benefit_system.parameters(period.start)
+    parameters = tax_benefit_system.get_parameters_at_instant(period.start)
     csg = parameters.prelevements_sociaux.contributions_sociales.csg
-    csg_deductible_chomage = csg.chomage.deductible
+    csg_deductible_chomage = csg.remplacement.allocations_chomage.deductible
     taux_plein = csg_deductible_chomage.taux_plein
     taux_reduit = csg_deductible_chomage.taux_reduit
     seuil_chomage_net_exoneration = (
@@ -421,7 +425,7 @@ def create_revenus_remplacement_bruts(individus, period, tax_benefit_system):
         )
     assert individus['chomage_brut'].notnull().all()
 
-    csg_deductible_retraite = parameters.prelevements_sociaux.contributions_sociales.csg.retraite_invalidite.deductible
+    csg_deductible_retraite = parameters.prelevements_sociaux.contributions_sociales.csg.remplacement.pensions_retraite_invalidite.deductible
     taux_plein = csg_deductible_retraite.taux_plein
     taux_reduit = csg_deductible_retraite.taux_reduit
     if period.start.year >= 2019:

diff --git a/openfisca_france_data/erfs/input_data_builder/run_all.py b/openfisca_france_data/erfs/input_data_builder/run_all.py
diff --git a/openfisca_france_data/erfs/input_data_builder/step_04_famille.py b/openfisca_france_data/erfs/input_data_builder/step_04_famille.py
diff --git a/openfisca_france_data/erfs/input_data_builder/step_05_foyer.py b/openfisca_france_data/erfs/input_data_builder/step_05_foyer.py
diff --git a/openfisca_france_data/erfs/input_data_builder/step_06_rebuild.py b/openfisca_france_data/erfs/input_data_builder/step_06_rebuild.py
diff --git a/openfisca_france_data/erfs/input_data_builder/step_07_invalides.py b/openfisca_france_data/erfs/input_data_builder/step_07_invalides.py
diff --git a/openfisca_france_data/erfs/input_data_builder/step_08_final.py b/openfisca_france_data/erfs/input_data_builder/step_08_final.py
diff --git a/openfisca_france_data/erfs_fpr/get_survey_scenario.py b/openfisca_france_data/erfs_fpr/get_survey_scenario.py
@@ -19,6 +19,7 @@ def get_survey_scenario(
         use_marginal_tax_rate: bool = False,
         variation_factor: float = 0.03,
         varying_variable: str = None,
+        survey_name: str = "input",
         ) -> ErfsFprSurveyScenario:
     """Helper pour créer un `ErfsFprSurveyScenario`.
 
@@ -51,6 +52,7 @@ def get_survey_scenario(
             baseline_tax_benefit_system = baseline_tax_benefit_system,
             year = year,
             )
+        # taux marginaux !!
         survey_scenario.variation_factor = variation_factor
         survey_scenario.varying_variable = varying_variable
 
@@ -67,7 +69,7 @@ def get_survey_scenario(
         data = dict(
             input_data_table_by_entity_by_period = input_data_table_by_entity_by_period,
             # input_data_survey_prefix = "openfisca_erfs_fpr_data",
-            survey = "input"
+            survey = survey_name
             )
 
     # Les données peuvent venir en différents formats :