Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parse parameters metadata and expose them in Python #681

Merged
merged 20 commits into from
Jul 16, 2018
Merged

Conversation

fpagnoux
Copy link
Member

@fpagnoux fpagnoux commented Jun 25, 2018

Depends on openfisca/country-template#49
Connected to #677

  • Expose Parameters metadata in the Python API
    • Parameters unit and reference:
      • e.g. parameters.taxes.rate.unit, parameters.taxes.rate.reference
    • Parameter value unit and reference:
      • e.g. parameters.taxes.rate.values_list[0].unit, parameters.taxes.rate.values_list[0].reference
    • Parameter node description and reference:
      • e.g. parameters.taxes.reference, parameters.taxes.description
      • Note: Parameter descriptions (e.g. parameters.taxes.rate.description) were already exposed

@benjello
Copy link
Member

It does the job. Thanks @fpagnoux !

Copy link
Member

@MattiSG MattiSG left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't tested so I can't approve, but the code reads good.

@@ -778,11 +799,17 @@ def _validate_parameter(parameter, data, data_type = None, allowed_keys = None):
)


def _item_to_list(item):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is usually called wrap 🙂
Could it fit in a more generic helpers context?

setup.py Outdated
@@ -7,7 +7,7 @@

setup(
name = 'OpenFisca-Core',
version = '23.1.4',
version = '23.1.5',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would favour a minor bump for such a new feature.

This is not just romance: I could write code that uses this new feature, and it is important for me to know that it needs v23.2, not a patch of v23.1.

@@ -0,0 +1,8 @@
description: Tax rate
unit: /1
reference: http://legifrance.fr/taxes/rate
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This URL doesn't exist. As it is just there for testing, https://law.gov.example/taxes/rate would be more explicit to me 🙂

@benjello
Copy link
Member

@fpagnoux shouldn't you enforce the equality of unit and reference in Parameter.__eq__ ?



def test_parameters_metadata():
path = os.path.join(BASE_DIR, 'filesystem_hierarchy')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use the country template for these tests ?
e.g.

import openfisca_country_template
tbs = openfisca_country_template.CountryTaxBenefitSystem()
basic_income = tbs.parameters.benefits.basic_income
assert_equals(basic_income.description, 'Amount of the basic income')

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

assert_equals(parameters.taxes.rate.values_list[0].reference, ['http://legifrance.fr/taxes/rate/2015-12'])
assert_equals(parameters.taxes.rate.values_list[0].unit, '/1')


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see a test for parameter, parameter_node, and parameter_at_instant and not for scale. Is that intentional ?

data.pop('reference', None)
data.pop('description', None)
# We allow to set a reference and a description for a node.
self.reference = _item_to_list(data.pop('reference', None))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it data.pop() here when it is data.get()in the others ?

Copy link
Member Author

@fpagnoux fpagnoux Jul 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because in a parameter node, metadata and children are at the same level (see discussion about a children key here).

In the line that follows, we treat data as a dict of children parameters, so we need to "pop" the metadata out of the dict so that they are not interpreted as a child.

@fpagnoux
Copy link
Member Author

fpagnoux commented Jul 5, 2018

@fpagnoux shouldn't you enforce the equality of unit and reference in Parameter.__eq__ ?

It seems that this custom equality is only used for tests, and that other metadata (like the description) were already ignored, so it doesn't seem super-necessary to me.

@fpagnoux fpagnoux force-pushed the parse-metadata branch 2 times, most recently from d2d445d to 12a2de5 Compare July 6, 2018 00:15
@fpagnoux fpagnoux requested review from Anna-Livia and removed request for sandcha July 6, 2018 00:22
@fpagnoux fpagnoux force-pushed the parse-metadata branch 2 times, most recently from 60b2d86 to 10b9bb2 Compare July 6, 2018 00:26
@benjello
Copy link
Member

benjello commented Jul 6, 2018

I am more enclined to use currency ou currnecy/EUR than EUR as a unit as it is more country package independent. See openfisca/openfisca-france#1022 for an explanation.

@benjello
Copy link
Member

@fpagnoux : we should deal with the Scales as mentionend in openfisca/openfisca-france#1019.

Could you take care of that (consensus + implementation) ?
I would like to merge it ASAP to deal with downhill code.

Thanks !

@fpagnoux
Copy link
Member Author

Sorry to come back on an already approved PR, but several discussions and new needs being expressed convinced me that our previous approach was a dead-end.

First, we can't reasonably list all metadata that users may one day want to add to parameters. For instance:

  • Marginal scales don't need a unique unit field, but two fields threshold_unit and rate_unit. See discussion[FR].
  • Users are interested in more advanced parameters.
  • The IPP would like to add the day of publishing of parameters in the "Journal officiel". This is valuable information, but only makes sense for France and should not be coded in Core.

Second, as we add more metadata, we are also increasing the possibility of collisions between them and sub-parameters, which could lead to headaches.

I'm therefore suggesting a different approach, by grouping all metadata in a metadata object, where legislation coders would be able to add custom metadata.

@fpagnoux fpagnoux requested review from MattiSG and Anna-Livia July 11, 2018 19:38
@fpagnoux fpagnoux self-assigned this Jul 11, 2018
@MattiSG
Copy link
Member

MattiSG commented Jul 12, 2018

Assessment

I feel there are two distinct problems:

  1. Normalising standard metadata. These should be thoroughly documented, and can open the way to Core handling such as type validation, presence check (reference…), or behaviour change (how to handle mismatching definition periods…).
  2. Exposing arbitrary metadata without any filter. This could be offered as well, but opens room to divergence and typos (unti would be exposed as metadata.unti rather than raising an error).

Opinion

I am in favour of supporting both.

  • Arbitrary metadata exposure allows innovation to come from any country package, and can be normalised after the fact and moved to Core with top-level exposure.
  • Ensuring fully-specified and normalised metadata can be accessed allows cross-packages interoperability.

In the current case, that means to me proceeding with exposing unit at top-level with solid consensus on the format.

Immediate action suggestion

Since @benjello's need seems to be urgent, it might be beneficial to prioritise delivery of the arbitrary, grouped metadata pass-through. We could learn from actual usage the difficulties involved, and move slowly towards normalisation.

Long-term vision

I see how the Core metadata addition process could evolve to having convergent implementation of arbitrary metadata in two or three independent country packages.

For example, I see how reference is surprisingly not mature yet: while the need is obvious everywhere, the exact format of that property is not.
Aotearoa New Zealand has an interesting taxonomy for them, and France has mixed URLs and identifiers.
I feel the best would be to expose reference under metadata until a piece of work is seriously done on defining and unifying reference management, including providing tools to assess how properly referenced a country package is.

@MattiSG MattiSG removed their request for review July 12, 2018 00:26
Before:
```YAML
description: Age of retirement
reference: https://wikipedia.org/wiki/retirement
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be interesting to give an example with several references ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would such an example illustrate?

Right now the metadata are just a pass-trough, and core has no opinion about the content of this reference, so I'm not sure about the value of making the changelog more complex by adding an other slightly different example.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was asking because it was the first thing that came into my mind when I saw the singular reference and that there was only one reference :)
It might just be a worry I have :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, we should keep that in mind.

if data.get('values'):
_validate_parameter(self, data, allowed_keys = set(['values', 'description', 'unit', 'reference']))
# 'unit' and 'reference' are only listed here for backward compatibility
_validate_parameter(self, data, allowed_keys = set(['values', 'description', 'metadata', 'unit', 'reference']))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why isn't descriptionconsidered a metadata in the new schema ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copied from above

The idea is to do a minimal change to the actual situation.
And description can be thought as standard/generic and informative enough
to be placed outside metadata which is more for custom/non-vital metadata.
But @fpagnoux may have more details

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually hesitated while implementing.

The main argument that push to let description at the top level is that in Matti's comment, description seem to perfectly fit the definition of a "fully-specified and normalised metadata":

  • We have no interrogation about it, and we use it in many places, including on our parameters/ API route.
  • parameter.description has always been exposed

Now, if description is at top level, why not reference?

It's arguable, but if everyone is convinced that references are needed, it seems like we are not fully sure about what a reference should be exactly. And whenever we get a better idea about that, moving reference from metadata to top level is not even a breaking change for Core.

Copy link
Contributor

@Anna-Livia Anna-Livia Jul 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My main issue is constistency between where the description is and where the references are.
Would it be a big issue to break core and change parameter.description to parameter.metadata.description ?

Copy link
Member Author

@fpagnoux fpagnoux Jul 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand the argument, but if we want to have both "normalized and standard" and "free custom" metadata, it's hard to have both of them in the same place:
parameter.metadata['description'] would not look very normalized, would not protect us against typos in normalized fields, and would not make any distinction between standard and custom fileds .

Do you disagree with the general dichotomy of normalized vs custom?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't disagree with the paradigm, I think it is interesting to leave room for growth.
However, I think we need to create some kind of guidelines in the documentation for new comers and a timeline on how and when things should be normalized so that we can count on some formating standard to build cool tools that every country can use.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather have normalisation usage-based over time-based 🙂


# If metadata have been provided
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am confused by this comment, as I read the following line as If values have been provided

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, this was very confusing. Adding more context :)

@fpagnoux fpagnoux requested a review from Anna-Livia July 13, 2018 15:17
@fpagnoux fpagnoux merged commit 363de93 into master Jul 16, 2018
@fpagnoux fpagnoux deleted the parse-metadata branch July 16, 2018 19:20
@bonjourmauko bonjourmauko added the kind:feat A feature request, a feature deprecation label Jul 21, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind:feat A feature request, a feature deprecation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants