Round up numbers #699

Anna-Livia · 2018-07-26T17:45:31Z

Some formula result have too many decimal numbers.
Impact :

it does not make sense to have that many decimal when most elements are money in the end.
it makes mandatory adding a margin of error in most tests.

Anna-Livia · 2018-07-31T16:50:30Z

The round branch in country_template contains a failing test because of too many numbers after the decimal point.

Hypotheses :

Results with many numbers after the decimal point are not constant.
There is an existing margin of error in openfisca-run-test
outputs given un yaml tests are transformed to contains enough decimal points as to be compared to the result.

MattiSG · 2018-08-01T21:08:30Z

it does not make sense to have that many decimal when most elements are money in the end.

It sounds reasonable to me to keep as high precision as possible until exporting values. Rounding errors can be quite problematic considering how many times formulas can be chained. Then again, there are already rounding errors…

What about using the type property to round to an instance-definable number, for example 3 digits for currency, only when returning values in the final answer (as opposed to “on all intermediary steps”)? 🤔

benjello · 2018-08-02T14:22:49Z

From my point of view there are two different problems here:

what is the right number of decimals we should keep for every variable ?
should we use a default number of digits when writing tests to avoid the tidy effort of adding a specific margin every time ?

Tentative answers:

The right number of decimals heavily depends on the variable and usually the information when available, is quite difficult to find. I favored using some round command here an there and adding in comment why. But we can use some more formal and cleaner explicit way.
The error margin is something the tester may want to control so the option should be kept. We can also add a default per variable but it should be easily available without adding too much potentially confusing information at the variable level. May be using the information from the first point if available could be ok. But then if this information is not available what to use for the default of the default ?

Anna-Livia · 2018-08-03T13:40:46Z

To give some more information :
The following code :

import openfisca_country_template

country = openfisca_country_template
tax_benefit_system = country.CountryTaxBenefitSystem()

scenario = tax_benefit_system.new_scenario().init_from_attributes(
    period='2016-01',
    input_variables={'salary': 1000},
    )

simulation = scenario.new_simulation()

for i in range(100):
    print(simulation.calculate('tax_incentive', period = '2016-01'))

Has the following output :

[333.33334]
[333.33334]
[333.33334]
[333.33334]
...

The folowing test :

- name: Someone making 1000 should get a 333 tax incentive
  period: 2016-01
  input_variables:
    salary: 1000
  output_variables:
    tax_incentive:
      2016-01: 333.3333

has the following output:

AssertionError: b'tax_incentive@2016-01: '[333.33334351] differs from [333.33331299] with an absolute margin [3.05175781e-05] > 0

The following tests pass:

- name: Someone making 1000 should get a 333 tax incentive
  period: 2016-01
  input_variables:
    salary: 1000
  output_variables:
    tax_incentive:
      2016-01: 333.33334

- name: Someone making 1000 should get a 333 tax incentive
  period: 2016-01
  input_variables:
    salary: 1000
  output_variables:
    tax_incentive:
      2016-01: 333.33333

Anna-Livia · 2018-08-03T15:50:43Z

After some work with @fpagnoux here is what we discovered :

In openfisca, all floats are float32 meaning they only save a few numbers.
therefore 1000/3 == 333.33334 and 1000/3 == 333.33333
In assert_near() the function used in openfisca-run-test, the results are typecasted as float with .astype(float). This method casts to float64, adding noise to the float32 OpenFisca result.
In the API, we tranform the result array to a list in order to jsonify it. This .tolist()methods typecasts the float32 OpenFisca result into float64 adding noise in the process.

Our proposed solution is to:

keep OpenFisca floats as floats32 from beginning to end.
document the float32 situation and its limitations

Finally, it seems strange to ask for a margin of error in most tests when the result is often in currency, that has a limited number of decimals. There could be the case of a 'currency' option in tests that rounds result (e.g. 2 decimal numbers for the Euro)
However, as it is not an ask from several country, this is not an avenue we will pursue for now. We advise that each formula rounds its results as needed for now.

benjello · 2018-08-03T16:29:04Z

@Anna-Livia @fpagnoux : please keep in mind that we want to allow the use of say float16 etc to save memory when needed.

fpagnoux · 2018-08-03T17:03:37Z

@Anna-Livia @fpagnoux : please keep in mind that we want to allow the use of say float16 etc to save memory when needed.

Right now the float precision is hard-coded in the Core to float32. Not sure if it was customizable at some point before... You want to define specific columns as float16, or use float16 for all floats in a simulation?

benjello · 2018-08-05T17:47:43Z

@fpagnoux : it was a long time ago (it might be for integers). For now it is not needed, but we may want control at some point in the future the bits used to code variables so I do not want have insert in the code elements that may forbids that.

But it was worthing investigating it and I am definitively ok with the conclusions you (@Anna-Livia
and @fpagnoux) reached !

Anna-Livia added flow:support:doing and removed flow:support:doing labels Jul 26, 2018

Anna-Livia added flow:team:doing and removed flow:team:next labels Aug 6, 2018

Anna-Livia self-assigned this Aug 6, 2018

This was referenced Aug 6, 2018

Keep float as float32 when outputting them #708

Merged

Indicate that floats are stored in float32 openfisca/openfisca-doc#156

Merged

fpagnoux closed this as completed in #708 Aug 8, 2018

fpagnoux removed the flow:team:doing label Aug 8, 2018

fpagnoux reopened this Aug 8, 2018

fpagnoux added the flow:team:doing label Aug 8, 2018

Anna-Livia closed this as completed Aug 10, 2018

Anna-Livia removed the flow:team:doing label Aug 10, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Round up numbers #699

Round up numbers #699

Anna-Livia commented Jul 26, 2018

Anna-Livia commented Jul 31, 2018

MattiSG commented Aug 1, 2018

benjello commented Aug 2, 2018

Anna-Livia commented Aug 3, 2018

Anna-Livia commented Aug 3, 2018 •

edited

Loading

benjello commented Aug 3, 2018

fpagnoux commented Aug 3, 2018

benjello commented Aug 5, 2018

Round up numbers #699

Round up numbers #699

Comments

Anna-Livia commented Jul 26, 2018

Anna-Livia commented Jul 31, 2018

MattiSG commented Aug 1, 2018

benjello commented Aug 2, 2018

Anna-Livia commented Aug 3, 2018

Anna-Livia commented Aug 3, 2018 • edited Loading

benjello commented Aug 3, 2018

fpagnoux commented Aug 3, 2018

benjello commented Aug 5, 2018

Anna-Livia commented Aug 3, 2018 •

edited

Loading