Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Round up numbers #699

Closed
Anna-Livia opened this issue Jul 26, 2018 · 8 comments
Closed

Round up numbers #699

Anna-Livia opened this issue Jul 26, 2018 · 8 comments
Assignees

Comments

@Anna-Livia
Copy link
Contributor

Some formula result have too many decimal numbers.
Impact :

  • it does not make sense to have that many decimal when most elements are money in the end.
  • it makes mandatory adding a margin of error in most tests.
@Anna-Livia
Copy link
Contributor Author

The round branch in country_template contains a failing test because of too many numbers after the decimal point.

Hypotheses :

  1. Results with many numbers after the decimal point are not constant.
  2. There is an existing margin of error in openfisca-run-test
  3. outputs given un yaml tests are transformed to contains enough decimal points as to be compared to the result.

@MattiSG
Copy link
Member

MattiSG commented Aug 1, 2018

it does not make sense to have that many decimal when most elements are money in the end.

It sounds reasonable to me to keep as high precision as possible until exporting values. Rounding errors can be quite problematic considering how many times formulas can be chained. Then again, there are already rounding errors…

What about using the type property to round to an instance-definable number, for example 3 digits for currency, only when returning values in the final answer (as opposed to “on all intermediary steps”)? 🤔

@benjello
Copy link
Member

benjello commented Aug 2, 2018

From my point of view there are two different problems here:

  • what is the right number of decimals we should keep for every variable ?
  • should we use a default number of digits when writing tests to avoid the tidy effort of adding a specific margin every time ?

Tentative answers:

  • The right number of decimals heavily depends on the variable and usually the information when available, is quite difficult to find. I favored using some round command here an there and adding in comment why. But we can use some more formal and cleaner explicit way.
  • The error margin is something the tester may want to control so the option should be kept. We can also add a default per variable but it should be easily available without adding too much potentially confusing information at the variable level. May be using the information from the first point if available could be ok. But then if this information is not available what to use for the default of the default ?

@Anna-Livia
Copy link
Contributor Author

To give some more information :
The following code :

import openfisca_country_template

country = openfisca_country_template
tax_benefit_system = country.CountryTaxBenefitSystem()

scenario = tax_benefit_system.new_scenario().init_from_attributes(
    period='2016-01',
    input_variables={'salary': 1000},
    )

simulation = scenario.new_simulation()

for i in range(100):
    print(simulation.calculate('tax_incentive', period = '2016-01'))

Has the following output :

[333.33334]
[333.33334]
[333.33334]
[333.33334]
...

The folowing test :

- name: Someone making 1000 should get a 333 tax incentive
  period: 2016-01
  input_variables:
    salary: 1000
  output_variables:
    tax_incentive:
      2016-01: 333.3333

has the following output:

AssertionError: b'tax_incentive@2016-01: '[333.33334351] differs from [333.33331299] with an absolute margin [3.05175781e-05] > 0

The following tests pass:

- name: Someone making 1000 should get a 333 tax incentive
  period: 2016-01
  input_variables:
    salary: 1000
  output_variables:
    tax_incentive:
      2016-01: 333.33334

- name: Someone making 1000 should get a 333 tax incentive
  period: 2016-01
  input_variables:
    salary: 1000
  output_variables:
    tax_incentive:
      2016-01: 333.33333

@Anna-Livia
Copy link
Contributor Author

Anna-Livia commented Aug 3, 2018

After some work with @fpagnoux here is what we discovered :

  1. In openfisca, all floats are float32 meaning they only save a few numbers.
    therefore 1000/3 == 333.33334 and 1000/3 == 333.33333
  2. In assert_near() the function used in openfisca-run-test, the results are typecasted as float with .astype(float). This method casts to float64, adding noise to the float32 OpenFisca result.
  3. In the API, we tranform the result array to a list in order to jsonify it. This .tolist()methods typecasts the float32 OpenFisca result into float64 adding noise in the process.

Our proposed solution is to:

  • keep OpenFisca floats as floats32 from beginning to end.
  • document the float32 situation and its limitations

Finally, it seems strange to ask for a margin of error in most tests when the result is often in currency, that has a limited number of decimals. There could be the case of a 'currency' option in tests that rounds result (e.g. 2 decimal numbers for the Euro)
However, as it is not an ask from several country, this is not an avenue we will pursue for now. We advise that each formula rounds its results as needed for now.

@benjello
Copy link
Member

benjello commented Aug 3, 2018

@Anna-Livia @fpagnoux : please keep in mind that we want to allow the use of say float16 etc to save memory when needed.

@fpagnoux
Copy link
Member

fpagnoux commented Aug 3, 2018

@Anna-Livia @fpagnoux : please keep in mind that we want to allow the use of say float16 etc to save memory when needed.

Right now the float precision is hard-coded in the Core to float32. Not sure if it was customizable at some point before... You want to define specific columns as float16, or use float16 for all floats in a simulation?

@benjello
Copy link
Member

benjello commented Aug 5, 2018

@fpagnoux : it was a long time ago (it might be for integers). For now it is not needed, but we may want control at some point in the future the bits used to code variables so I do not want have insert in the code elements that may forbids that.

But it was worthing investigating it and I am definitively ok with the conclusions you (@Anna-Livia
and @fpagnoux) reached !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants