Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge Variables and DatedVariables #522

Merged
merged 48 commits into from
Jun 12, 2017
Merged

Merge Variables and DatedVariables #522

merged 48 commits into from
Jun 12, 2017

Conversation

sandcha
Copy link
Collaborator

@sandcha sandcha commented Jun 1, 2017

Breaking changes

  • In variables:
    • Remove SimpleVariable and DatedVariable. Any Variable acts like old DatedVariable
    • Remove start_date attribute
    • Rename stop_date attribute to end
  • In formulas:
    • Remove SimpleFormula and DatedFormula. Any Formula acts like old DatedFormula
    • Remove @dated_function: start goes to formula name suffix, stop is deduced from other formulas start dates and variable end attribute

New features

  • Introduce formula_YYYY[_MM[_DD]]
    • Allows for formula start instant definition
  • Introduce end = 'YYYY-MM-DD'
    • Allows for variable end date definition in string format
  • Change ETERNITY period effect
    • Allows for dated Variable definition on ETERNITY period

Technical changes

  • In variables:
    • Remove Abstract class
  • In formulas:
    • Remove Abstract classes
  • In tests:
    • Add test_variable

Fixes #371.

@sandcha sandcha requested a review from fpagnoux June 1, 2017 08:20
@MattiSG MattiSG changed the title Dated variable #371 Merge Variables and DatedVariables Jun 1, 2017
@fpagnoux
Copy link
Member

fpagnoux commented Jun 1, 2017

Tests are passing in france 🎉

(Red only because the version has not been bumped)

Copy link
Member

@fpagnoux fpagnoux left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice !

I couldn't read the whole code change yet, but here are a few remarks.

@@ -84,7 +84,7 @@ def empty_clone(self):
def is_input_variable(self):
"""Returns true if the column (self) is an input variable."""
from . import formulas
return issubclass(self.formula_class, formulas.SimpleFormula) and self.formula_class.function is None
return issubclass(self.formula_class, formulas.Formula) and self.formula_class.formula is None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the first part of the condition is relevant anymore.

Before, we had two kinds of formulas and we wanted to check it was a 'simple' one.
Now, self.formula_class is a Formula by construction.

if holder_or_dated_holder.array is not None:
return holder_or_dated_holder
assert self._array is None # self._array should always be None when dated_holder.array is None.

# Request a computation
formula_dated_holder = self.formula.compute(period = period, **parameters)
dated_holder = self.formula.compute(period = period, **parameters)
assert isinstance(dated_holder, DatedHolder)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding assertion trough the code is useful for debugging, but must be avoided in prod.

If the assertion is not respected, an AssertionError will be thrown, without any explanation.

Here, as by construction compute always return a DatedHolder, I think we can just remove the assertion.

][0]['formula'].function
else:
formula = self.formula.function
formula = self.formula.find_function(period)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd expect the output of self.formula.find_function to be named function.

@@ -34,7 +34,7 @@ def get_introspection_data(self, tax_benefit_system):
return comments, source_file_path, source_code, start_line_number

def to_column(self, tax_benefit_system):
formula_class = self.__class__.formula_class
formula_class = self.__class__.formula_class # Formula
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you simplified everything, formula_class is always Formula !

We should:

  • remove the class attribute formula_class = Formula
  • remove formula_class for new_filled_column arguments as it's always the same.

class AbstractVariable(object):
formula_class = None
class Variable(object):
formula_class = Formula
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this class attribute (see comment below).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

formula_class removed from Variable but maintained as item of Formula.dated_formulas_class

Copy link
Member

@fpagnoux fpagnoux left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More remarks.

Your attention on quality and testing will be very useful for OpenFisca 😎

However, I think we have too many tests here. The goal of unit testing is not to make sure every combination works (it's not realistic). We can talk about it IRL :)

from openfisca_core.periods import MONTH, ETERNITY
from openfisca_core.columns import IntCol

import openfisca_dummy_country as dummy_country
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The dummy country is deprecated, we should use the country template instead.

variable = tax_benefit_system.column_by_name['variable__no_dated_attributes']
assert variable is not None
assert variable.end is None
assert variable.formula_class.dated_formulas_class.__len__() == 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To get the length of an array in Python, we usually use len(variable.formula_class.dated_formulas_class).

__len__ is a magic method. It's usually used only to define what len does on a given object you create.

add_variable_catch_assertion(tax_benefit_system, variable__deprecated_start_date, 'Deprecated "start_date" attribute in definition of variable')


# 371 - end, no function
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why mentioning the number of the PR here ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a lazy way to find history. Removed.

raise

# Check that Error at variable adding prevents it from registration in the taxbenefitsystem.
assert not hasattr(tax_benefit_system.column_by_name, "variable__strange_end_attribute")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not what you want to test ;).

column_by_name = {'a' : 1}
hasattr(column_by_name, 'a')
>>> False

hasattr check if tax_benefit_system.column_by_name has an attribute named variable__strange_end_attribute . An attribute is something you get with the column_by_name.a notation. It's not the same thing than an item you get with column_by_name['a'].

You could use :

column_by_name = {'a' : 1}
column_by_name.has_key('a')
>>> True

But the most idiomatic is :

column_by_name = {'a' : 1}
column_by_name.get('a')
>>> 1
column_by_name.get('b')
>>> None

assert not column_by_name.get('b')

There is a slight difference between the 2 solutions: the case where a property as been explicitly set to None, which is not relevant in our case.

entity = Individu
definition_period = MONTH
label = u"Variable with end attribute, no function."
# start_date = datetime.date(1980, 1, 1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why having this commented line ?

entity = Individu
definition_period = MONTH
label = u"Variable with end attribute, one function without date."
# start_date = datetime.date(1980, 1, 1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commented line

def test_call__dated_attribute__one_formula():
month = '1979-12'
simulation = new_simulation(tax_benefit_system, month)
assert simulation.calculate('dated_attribute__one_formula', month, extra_params=[100]) == 100
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure using extra_params make this test clearer. Why not just write return vectorize(self, 100) ?

assert variable is not None
assert variable.end == '1989-12-31'

assert variable.formula_class.dated_formulas_class.__len__() == 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use len

def test_dates__dated_attribute__one_formula():
variable = tax_benefit_system.column_by_name['dated_attribute__one_formula']
assert variable is not None
assert variable.end == '1989-12-31'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same remark than above: you already checked that for a variable declared with an end attribute, the end can be retrieved with variable.end. This assertion is redundant.

assert variable.formula_class.dated_formulas_class.__len__() == 1
formula = variable.formula_class.dated_formulas_class[0]
assert formula is not None
assert formula['start_instant'].date == datetime.date(1, 1, 1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

datetime.date(1, 1, 1) should be replaced by datetime.date.min.
If tomorrow datetime is adapted to handle the egyptian era , we don't want to impose ourselves to stop at 0001-01-01 :)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok for tests. And it will show us the egyptian era case when it comes as datetime.date.min is not applied in formulas.py (to be consistent with formula name specifications).

Copy link
Member

@fpagnoux fpagnoux left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great test coverage!

if end is UnboundLocalError:
end = None if reference_column is None else reference_column.end
elif end is not None:
assert isinstance(end, str)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this was the case before, but we must never add assertion without an error message.
If in my code I write:

class basic_income(Variable):
    column = FloatCol
    entity = Person
    definition_period = MONTH
    label = "Basic income provided to adults"
    url = "https://law.gov.example/basic_income"  # Always use the most official source
    end = date(2015, 1, 2)

I get the error:

assert isinstance(end, str)
AssertionError:

As a formula writer, I have no idea what to do.

+ formula_default_day

else:
match = re.match(r'(formula_)(\d{4}(_\d{2}){0,2})', formula_name) # YYYY or YYYY_MM or YYYY_MM_DD
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This regexp is not strict enough. It matches formula_2015_toto for instance.

As a formula writer, if I write:

    def formula_2015_toto(person, period, legislation):

I get the error:

ValueError: time data 'formula_2015_toto_01_01' does not match format 'formula_%Y_%m_%d'

which is not as explicit as the one you wrote :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, are the parenthesis around formula_ necessary ?

@@ -83,8 +83,7 @@ def empty_clone(self):

def is_input_variable(self):
"""Returns true if the column (self) is an input variable."""
from . import formulas
return issubclass(self.formula_class, formulas.SimpleFormula) and self.formula_class.function is None
return self.formula_class.formula is None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't work.

type object 'accomodation_size' has no attribute 'formula'

Should probably be

return self.formula_class.dated_formulas_class == []

@@ -69,10 +82,21 @@ def clone(self, holder, keys_to_skip = None):
if key not in keys_to_skip:
new_dict[key] = value

keys_to_skip.add('dated_formulas')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already filtered the values in line 82. Adding dated_formulas at this point doesn't do anything.


def find_function(self, period):
"""
This function finds the last active formula for the time interval [period starting date, variable end attribute].
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is not necessary, we know it's a function ;).
-->
Finds the last...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done (it was used to act like the python official documentation but it's not even consistent there).


# end, one function without date

class dated_attribute__one_simple_formula(Variable):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

end_attribute...


def test_dates__dated_attribute__one_simple_formula():
variable = tax_benefit_system.column_by_name['dated_attribute__one_simple_formula']
assert variable is not None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same than above, this cannot be None


assert len(variable.formula_class.dated_formulas_class) == 1
formula = variable.formula_class.dated_formulas_class[0]
assert formula is not None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this really be None ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really. Removed in file.


i = 1
formula = variable.formula_class.dated_formulas_class[i]
assert formula is not None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same than above: can this ever be None ?

column = IntCol
entity = Person
definition_period = MONTH
label = u"Variable, no and attribute, multiple dated functions with different names and no date overlap."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo

CHANGELOG.md Outdated
@@ -1,5 +1,35 @@
# Changelog

# 13.0.0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I copy-pasted what you wrote in the PR description to start discussing the Changelog :).

The general idea of the Changelog is to be written from the point of view of a OpenFisca Core user (ie someone who is writing their own legislation)

CHANGELOG.md Outdated
#### Breaking changes

- In variables:
- Remove SimpleVariable and DatedVariable. Any Variable acts like old DatedVariable
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there is currently anything named SimpleVariable.

Suggestion:

  • Merge Variable and DatedVariable
    • Variable can now handle formula evolution over time.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok for second line and the fanciful SimpleVariable.
In first line, what about AbstractVariable ? It was possible to create its own kind of variable before and we removed this service.

Copy link
Member

@fpagnoux fpagnoux Jun 9, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think AbstractVariable was ever meant to be extended by child classes in country packages 🤔.
I don't think anyone ever did it anyway, so it's ok not to mention it.

CHANGELOG.md Outdated

- In variables:
- Remove SimpleVariable and DatedVariable. Any Variable acts like old DatedVariable
- Remove start_date attribute
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove -> Deprecate and remove

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does a real deprecated attribute raise a warning and is still managed or does it raise a fatal error ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question.
If we just "deprecate" something, I'd assume that it's still available and raise a waring.
If we "deprecate and remove" something, I'd assume it will raise a fatal error.
If we "remove" something, I'm just afraid it's not crystal clear that what we removed is a feature.

@MattiSG @michelbl @Anna-Livia thoughts ?

CHANGELOG.md Outdated
- In variables:
- Remove SimpleVariable and DatedVariable. Any Variable acts like old DatedVariable
- Remove start_date attribute
- Rename stop_date attribute to end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When you are quoting code (for instance stop_date), it's better to use backticks (`) to format it properly.

CHANGELOG.md Outdated
- Rename stop_date attribute to end
- In formulas:
- Remove SimpleFormula and DatedFormula. Any Formula acts like old DatedFormula
- Remove @dated_function: start goes to formula name suffix, stop is deduced from other formulas start dates and variable end attribute
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'dd add an example for one formula to explicit the change.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or one example with variable class definition and formulas to show linked changes?

CHANGELOG.md Outdated
- Remove Abstract class
- In formulas:
- Remove Abstract classes
- In tests:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding tests is usually not something the users care about, so usually it's not necessary to mention it in the changelog.

CHANGELOG.md Outdated
#### Technical changes

- In variables:
- Remove Abstract class
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This information is probably not needed in the Changelog:

  • Users don't interact with these abstract classes.
  • Developers can read in the diff that you removed these classes

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to me that it's more a service that disappears than a technical change. So, should we really silence this information ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think AbstractVariable was ever meant to be extended by child classes in country packages 🤔.
I don't think anyone ever did it anyway, so it's ok not to mention it.

CHANGELOG.md Outdated
- Introduce `end = 'YYYY-MM-DD'`
- Allows for variable end date definition in string format
- Change ETERNITY period effect
- Allows for dated Variable definition on ETERNITY period
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dated variables don't exist anymore, so mentioning them in the changelog is not great.

Suggestion :

  • Remove restriction that prevented to have formula changes over time for a variable with ETERNITY as a definition period.

CHANGELOG.md Outdated

#### New features

- Introduce `formula_YYYY[_MM[_DD]]`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This replaces the former @dated_function decorator, so it should be mentioned near the its deprecation.

As a user, it's better for me to find in the same place what syntax has disappeared, and what has replaced it.

CHANGELOG.md Outdated
- Introduce `formula_YYYY[_MM[_DD]]`
- Allows for formula start instant definition
- Introduce `end = 'YYYY-MM-DD'`
- Allows for variable end date definition in string format
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same: should be put near Rename stop_date attribute to end

end = '1989-12-31'


check_error_at_add_variable(tax_benefit_system, variable__deprecated_start_date, 'Deprecated "start_date" attribute in definition of variable')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be in a test

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed it

@sandcha sandcha requested a review from fpagnoux June 8, 2017 20:42
if end and period.start.date > get_datetime_date(self, end):
return None

i = len(self.dated_formulas)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This variable is not use anywhere anymore ;)

# Current attribute isn't a formula
return None

formula_name_separator = '_'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not declare all these constants at the same place than formula_name_prefix?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case it exits with None, we would have unnecessary assignments execution.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you worry about unnecessary assignments, make these constants and put them out of this function ;)

formula_default_day = '01'

if formula_name == formula_name_prefix:
formula_name += formula_name_separator + formula_default_year + formula_name_separator\
Copy link
Member

@fpagnoux fpagnoux Jun 9, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

W503 is highly controversial and outdated.

I'm black-listing it right now 😄

formula_name_prefix + formula_name_separator + '%Y_%m_%d').date()


def get_datetime_date(src_name, date_str):
Copy link
Member

@fpagnoux fpagnoux Jun 9, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This signature is not natural. Passing an argument just so that it is used in an error message is another sign that the code structure is not optimal.

end = None if reference_column is None else reference_column.end
elif end is not None:
assert isinstance(end, str)
get_datetime_date(name, end) # Raises ValueError on format error.
Copy link
Member

@fpagnoux fpagnoux Jun 9, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see how you would need to call this helper several times if you just parsed the string once and for all.
Careful, here, we are parsing the end attribute every single time we call find_function.

It seems much more natural to parse the input only once. The fact that we end up using a method named get_datetime_date just to raise an error is a clear sign that our code structure is lopsided.

column = IntCol
entity = Person
definition_period = MONTH
label = u"Variable with dated attributes (including deprecated start_date), no formula."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

u"Variable with and end attribute and a deprecated start_date, no formula."

CHANGELOG.md Outdated
#### Breaking changes

- In variables:
- Remove SimpleVariable and DatedVariable. Any Variable acts like old DatedVariable
Copy link
Member

@fpagnoux fpagnoux Jun 9, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think AbstractVariable was ever meant to be extended by child classes in country packages 🤔.
I don't think anyone ever did it anyway, so it's ok not to mention it.

Copy link
Member

@fpagnoux fpagnoux left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're getting there :)

- start definition goes to formula name: `formula_YYYY[_MM[_DD]]`
- stop is deduced from next formula start

Before:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice 👏

CHANGELOG.md Outdated
#### Technical changes

- In variables:
- Remove Abstract class
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think AbstractVariable was ever meant to be extended by child classes in country packages 🤔.
I don't think anyone ever did it anyway, so it's ok not to mention it.

assert simulation.calculate('end_attribute__formulas__different_names', month) == 300


def test_clone__end_attribute__formulas__different_names():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm putting this as an optional feedback as we need to move on, but unit tests means than these tests should test one behaviour. This way, if a test does not pass, you know what is broken.

This tests is basically just inspecting the whole structure of the tax benefit system. There is no connection between cloning and checking that column_by_name is indeed storing what it is supposed to store.

Copy link
Member

@fpagnoux fpagnoux Jun 9, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the future, please try to make your tests more focused.

It's ok do add all sorts of assertions in the code when developing, if it can help you understand the complicated and lopsided structure of OpenFisca.

But keep in mind that every line of code added to master, including tests, will have to be maintained and understood by OpenFisca developers.

Also, we want to change deeply the architecture of core. Over-testing it will have a cost : if our tests tend to check more than they should do, every time we will change something (for instance renamingIntCol), dozens of tests that have nothing to do with what we are doing will break.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's was misleading of me to speak about unit tests. It started like that but they are more validation/system tests.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think it is wise to add system tests to OpenFisca in its current state, especially in a file that is already over 500 lines. Please don't rigidify stuff that nobody understands properly. It is welcome to add guarantees to new behaviour, but not very useful to check that we don't change a behaviour that might be flawed.

@sandcha sandcha requested a review from fpagnoux June 9, 2017 17:44
Copy link
Member

@fpagnoux fpagnoux left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs third party mediation

elif end is not None:
assert isinstance(end, str), 'Type error on {}. String expected. Found: {}'.format(name + '.end', type(end))
try:
get_datetime_date(end)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MattiSG,

We need another opinion to cut short this discussion.
We're inside the new_filled_column function, which builds an OpenFisca column (=variable) based on what has been coded by the legislation writer (class rsa(Variable)...).

The function get_datetime_date parses a 'YYYY-MM-DD' string and returns the equivalent datetime.date.

The suggested implementation here is to:

  • First call the get_datetime_date function once when building the column, just to check if it returns an error. We do not save the result.
  • Then, a little further, call it for each dated formula to check that its start date is consistent with the variable end date.
  • Finally, we also call it in the find_function method, whose job is to find the right dated function to use in a calculation. This method is called each time the variable is calculated.

I think there is no reason to parse a string 1 + k + n times.
(wherek = numbers of dated formulas, n = numbers of times the variable is calculated). I'm therefore in favour of parsing this string once and for all when we build the variable.

@sandcha thinks otherwise, and I'll let her expose her arguments.

@MattiSG wdyt ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest to declare an end attribute of date type to Formula class. There is no desagreement about not parsing a string multiple times.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, great 🙂 . Tell me if you need help for this implementation!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been told you reached an agreement orally, glad to have been a good plastic duck 😄

I'm obviously in favour of limiting performance impact, if there is no good reason to multiply this parsing then let's not do it — duh.

I would like to bring your attention to the fact that it is more than time to ship, though. Too much energy and time is going into something that is not a priority. If it is too hard to reach a consensus on the implementation we'll have to abandon that story and learn from it. This is becoming too much of a drain on delivering value.
I'm sure you can get this done quickly. Please don't hesitate to take shortcuts, the quality of this code is already much above average. Better done than perfect.

@sandcha sandcha requested a review from fpagnoux June 12, 2017 16:16
Copy link
Member

@fpagnoux fpagnoux left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, rebasing and merging

sandcha and others added 13 commits June 12, 2017 20:17
Checks interactions between start/stop_date attributes and decorated
functions (DatedFormula).
Removes last DatedVariable reference.
Catch expected AssertionError
Correct syntax on variable calls
Split all tests into test_call/test_dates.
Vectorize dummy functions output.
Update tests so that they all pass > Represents variable dates
management contract before #371 changes.
Keep only Variable class
Minor correction for tests after DatedVariable/Variable merge
And move cache management to Holder
@fpagnoux
Copy link
Member

fpagnoux commented Jun 23, 2017

Migration guide

The following instructions can help you adapt your OpenFisca country package to the new version of core.

Replacing variables stop_date attribute by end

Goal: Replacing attributes like stop_date = date(2014, 12, 31) by end = '2014-12-31'

Throughout your code, replace using regular expressions:

  • stop_date = date\((\d{4}), (\d{1}), (\d{1})\) by end = '\1-0\2-0\3'
  • stop_date = date\((\d{4}), (\d{2}), (\d{1})\) by end = '\1-\2-0\3'
  • stop_date = date\((\d{4}), (\d{1}), (\d{2})\) by end = '\1-0\2-\3'
  • stop_date = date\((\d{4}), (\d{2}), (\d{2})\) by end = '\1-\2-\3'

Manually removing or migrating formula's stop dates

The concept of a formula stop doesn't exist anymore.

You can identify them using the regular expression , stop = date\(\d{4}, \d{1,2}, \d{1,2}\).
For each of them though, a manual treatment is needed.

  • If the stop of the formula is just the day preceding the start of another formula, then this stop is redundant. The identified block can be removed.

For instance, the stop date is redundant in the following case:

@dated_function(start = date(1990, 1, 1), stop = date(2006, 12, 31))
def formula_1(...)
@dated_function(start = date(2007, 1, 1))
def formula_1(...)
  • If not, the stop of the formula probably indicates a global end date for the variable. In this case, manually transpose the end date as a end attribute of the variable.

Migrating @dated_function(...)

The formulas stop date must have been removed of migrated before starting this step

The goal is to replace code bits such as

    @dated_function(start = date(1990, 1, 1))
    def function_x(...):

by

    def function_1990_01_01(...):

Throughout your code, replace using regular expressions:

Be careful, spaces at the start of the line do matter

    @dated_function\(start = date\((\d{4}), (\d{2}), (\d{2})\)\)
    def (.)*\(

by

    def formula_\1_\2_\3(

and

    @dated_function\(start = date\((\d{4}), (\d{1}), (\d{2})\)\)
    def (.)*\(

by

    def formula_\1_0\2_\3(

and


    @dated_function\(start = date\((\d{4}), (\d{2}), (\d{1})\)\)
    def (.)*\(

by

    def formula_\1_\2_0\3(

and

    @dated_function\(start = date\((\d{4}), (\d{1}), (\d{1})\)\)
    def (.)*\(

by

    def formula_\1_0\2_0\3(

Renaming function by formula

Finally, throughout your code, replace using regular expressions def function( by def formula(

Manual final check

Automated research and replace can save you a significant amount of time, but a quick final human check is still recommended to make sure you didn't do anything wrong 🙂 .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Merge Variable and DatedVariable
3 participants