Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create a simple computation endpoint #520

Closed
MattiSG opened this issue May 31, 2017 · 27 comments
Closed

Create a simple computation endpoint #520

MattiSG opened this issue May 31, 2017 · 27 comments
Assignees

Comments

@MattiSG
Copy link
Member

MattiSG commented May 31, 2017

As a producer,
I can send a situation to api.openfisca.fr and compute variables on it,
So that I don't have to manage my own instance


This currently exists through the simulate and calculate endpoints, but their data formats are very complex and not properly documented. This new endpoint should remind of the simpler versions of the /parameters and /variables endpoint that have already been added to the Core.

@MattiSG
Copy link
Member Author

MattiSG commented Jun 6, 2017

Features

  • Any loaded variable MUST be computable, for any period.
  • Values of any variable MUST be specifiable, for any period.
  • Any kind of entity MUST be supported, without making hypotheses on their relationships, in order to support all country packages.
  • MUST be much simpler (more “intuitive”) than the current API, as evaluated by API clients.

Headers

  • Headers MUST provide information on the loaded country package and its version.
  • Headers COULD provide information on the core version.

Errors

  • Any error in the input format, be it syntactic or semantic, MUST yield a 400.
  • Errors in the computation MUST yield a 500 and expose the error to the user with as much context as possible.
  • Misnamed variables and parameters SHOULD yield a 404, in order to ease legislation upgrades.

Documentation

  • The API MUST be documented in the OpenAPI description.

@MattiSG MattiSG added the policy:needs-consensus Discussion is mandatory label Jun 6, 2017
@fpagnoux
Copy link
Member

fpagnoux commented Jun 8, 2017

We started to discuss suggestions to represent scenarios (persons and entities) with @Anna-Livia:

❌ Suggestion 1: Persons reference their entities

"persons": {
    "bob": {
        "birth_date": "1972-01-01",
        "family": {
            "dupont": "parent"
            },
        "company": {
            "Boulangerie": "owner"
            },
    },
    "bill": {
        "birth_date": "1975-01-01",
        "family": {
            "dupont": "parent"
            },
        "company": {
            "Boulangerie": "employee"
            },
    },
    "janet": {
        "birth_date": "1990-01-01",
        "company": {
            "Electrician": "employee"
        },
        "family": {
            "dupont": "child"
        },
    },
},
"families": {
    "dupont": {
        "zipcode": "90210"
    }
},
"companies": {
    "boulangerie": {
        "revenue": 20000
    },
    "electrician": {
        "revenue": 40000
    }
}

Details :

  • Every persons references which entity the belong to, and which role they have.
  • Entities only define their specific variables

Advantages:

  • All persons are defined in the same place
  • Adding or editing a person in really easy (no need to edit the families and companies)

Constraints:

  • Defining an entity (such as family) is a two steps process: defining the individual and their role in the entity, and then defining the entity itself.
  • There is no consolidated vision of an entity.

@fpagnoux
Copy link
Member

fpagnoux commented Jun 8, 2017

✅ Suggestion 2: Entities reference their members through their ID

"persons": {
    "bob": {
        "birth_date": "1972-01-01",
    },
    "bill": {
        "birth_date": "1975-01-01",
    },
    "janet": {
        "birth_date": "1990-01-01",
    },
},
"familles": {
    "dupont": {
        "parents": ["bob", "bill"],
        "children": ["janet"],
        "zip_code": "90210"
    }
},
"companies":
    "boulangerie": {
        "owner": ["bob"],
        "employee": ["bill"],
        "revenue": 20000
    },
    "electrician": {
        "employee": ["janet"],
        "revenue": 40000
    }
}

Details :

  • Entities define their specific variables, and the list of their members, with their roles
  • Persons only defines their specific variables (birth date, salary...)

Advantages:

  • All persons are defined in the same place
  • There is a consolidated vision of every entity.

Constraints:

  • Defining an entity (such as family) is a two steps process: defining the individual and their role in the entity, and then defining the entity itself.
  • Adding a person (or changing which entity they belong to) requires to edit several part of the JSON

Note : this is a slightly simplified version of the current API

@fpagnoux
Copy link
Member

fpagnoux commented Jun 8, 2017

❌ Suggestion 3 - Persons can be defined inside entities

3a - Using unique ids for persons

"families": {
    "dupont": {
        "parents": {
            "bob": {"birth_date": "1972-01-01"},
            "bill": {"birth_date": "1975-01-01"}
        },
        "children": {
            "janet": {"birth_date": "1990-01-01"}
        }
    },
},
"companies": {
    "boulangerie": {
        "owner": ["bob"],
        "employees": ["bill"],
        "revenue": 20000,
    },
    "electrician": {
        "employees":["janet"],
        "revenue": 40000,
    }
}

This is strictly equivalent to:

"families": {
    "dupont": {
        "parents": ["bob", "bill"],
        "children": ["janet"],
        "zipcode": "90210"
    }
},
"companies": {
    "boulangerie": {
        "owner": {
            "bob": {"birth_date": 1972-01-01}
        },
        "employees": {
            "bill": {"birth_date": 1975-01-01},
        },
        "revenue": 20000,
    },
    "electrician": {
        "employees": {
            "janet": {"birth_date": 1990-01-01}
        },
        "revenue": 40000,
    }
},

Details:

  • Entities define their specific variables, and the list of their members, with their roles
  • Persons can be defined inside any entity they belong to.

Advantages:

  • There is a consolidated vision of every entity.
  • Defining an entity is a one step process : entities specific variables and entities members (and their roles) are all defined at one fell swoope. This is convenient for client such as mes-aides, who mainly care about one entity.

Constraints:

  • All persons must be defined in one "key" entity, but the choice of this key entity is let to the API client.
  • There is no single place where persons are defined.
  • Changing the entity someone belongs may imply moving a lot of code.
  • Persons ids must be unique
  • There are several ways of defining the same scenario
  • We use objects to define persons, and arrays to reference them. For instance in the 1st example, families.dupont.parents is an object, while companies.boulangerie.owner is an array.

For clients who are not interested in any group entity, this would be allowed as well:

    "persons": {
         "bill": {"birth_date": "1972-01-01",  "salary": "2000"},
         "bob": {"birth_date": "1972-01-01",  "salary": "1500"},
    }

@fpagnoux
Copy link
Member

fpagnoux commented Jun 8, 2017

❌ 3b - Using JSON path

The first example of 3a becomes:

"families": {
    "dupont": {
        "parents": {
            "bob": {"birth_date": "1972-01-01"},
            "bill": {"birth_date": "1975-01-01"}
        },
        "children": {
            "janet": {"birth_date": "1990-01-01"}
        }
    },
},
"companies": {
    "boulangerie": {
        "owner": ["$.families.dupont.parents.bob"],
        "employees": ["$.families.dupont.parents.bill"],
        "revenue": 20000,
    },
    "electrician": {
        "employees":["$.families.dupont.children.janet"],
        "revenue": 40000,
    }
},

Compared to 3a:

Advantages:

  • No need for a unique id for each person

Constraints:

  • More complex
  • When moving a person from one entity to another, we need to update all references to that person.

@fpagnoux
Copy link
Member

fpagnoux commented Jun 8, 2017

❌ 3c - Using arrays for all person lists

The first example of 3a becomes:

"families": {
    "dupont": {
        "parents": [
            {"bob": {"birth_date": "1972-01-01"}},
            {"bill": {"birth_date": "1975-01-01"}}
        ],
        "children": [
            {"janet": {"birth_date": "1990-01-01"}}
        ]
    },
},
"companies": {
    "boulangerie": {
        "owner": ["bob"],
        "employees": ["bill"],
        "revenue": 20000,
    },
    "electrician": {
        "employees":["janet"],
        "revenue": 40000,
    }
}

Compared to 3a:

Details:

  • We always use arrays to define a list of persons. If we are defining the person, we do it within a JSON object. If we are referencing an existing person, we just write their unique id.

Advantages:

  • The data structure is more homogenous.

Constraints:

  • Slightly more complex, and harder to navigate (we can't do families.dupont.children.janet).

@fpagnoux
Copy link
Member

fpagnoux commented Jun 8, 2017

@MattiSG @sandcha @benjello @cbenz @guillett @michelbl : thoughts ?

@fpagnoux
Copy link
Member

fpagnoux commented Jun 9, 2017

Another topic to discuss is how the client expresses which variable they want to calculate.

❌ Suggestion I : in the URL

Ia: with a common period of requested variable

Following the model of the current formula API, we can encode the requested variables in the URL:

/calculate/2010-01/rsa/

or

/calculate/2010-01/rsa+salaire_net

to compute several variables.

Advantages:
- This makes the body of the request much simpler, as we just need to define the scenario there.

Constraints

  • All the requested variables must be computed for the same period. This would be fine for mes-aides or embauche, may be a big limitation for other usages.
  • URL may be long to compute a lot of variables.

@fpagnoux
Copy link
Member

fpagnoux commented Jun 9, 2017

❌ Suggestion II : in the request body, separated from the scenario

For instance:

"scenario": {
    "familles": {"...": "..."}
    },
"outputs": {
    "2010-01": ["rsa", "salaire_net"]
}

Advantages:

  • Flexible: I can request any variable for any period

Constraints:

  • Complexify the whole request body

@fpagnoux
Copy link
Member

fpagnoux commented Jun 9, 2017

✅ Suggestion III: in the request body, integrated in the scenario

"persons": {
    "bob": {
        "birth_date": "1972-01-01",
        "salaire_brut": {"2010-01": 2000 },
        "salaire_net": {"2010-01": null }
    },
    "bill": {
        "birth_date": "1975-01-01",
    },
    "janet": {
        "birth_date": "1990-01-01",
    },
},
"familles": {
    "dupont": {
        "parents": ["bob", "bill"],
        "children": ["janet"],
        "zip_code": "90210"
    }
},
"companies":
    "boulangerie": {
        "owner": ["bob"],
        "employee": ["bill"],
        "revenue": 20000
    },
    "electrician": {
        "employee": ["janet"],
        "revenue": 40000
    }
}

Details:

  • $calculate could of course be replaced by any magic word (?, $calculate())

Advantages:

  • Flexible: I can request any variable for any period
  • We have a consolidated vision of our input and our requested variables, which makes sense, as salaire_net and salaire_brut are similar concepts, except one is known and we want to calculate the other.
  • This is a little futuristic, but "magic keywords" open opportunities to resolve some old issues with the API. For instance, if today I want to say that janet has been a student from 2010-09 to 2013-06, I need to explicit it for every single month, or rely on the inferences of OpenFisca, which we already concluded was a bad practise. Instead, I could have something like :
"janet": {
    "student": {"$from": "2010-09", "$to": "2013-06", "$value": true}
}

Constraints:

  • May be a little exotic.

@fpagnoux
Copy link
Member

fpagnoux commented Jun 9, 2017

@Anna-Livia
Copy link
Contributor

About the calculation :

I like having the calculation variable inside the URL:

  • Once I have a fixed scenario, I don't have to manipulate it anymore.
  • Separating the two notions of "entities description" and "calculation" is a simpler model.

About the periods, I think that one API call should concern one period. There should be a way to alert the user and/or handle the calculation on a period not adapted to the variable.

Suggestion IA for calculations seems to me to be the best suited :)

@benjello
Copy link
Member

@fpagnoux : my preference goes to suggestion 2 and suggestion II (the latter for clarity but I cannot estimate the burden induced by a complex/complete header).

@openfisca openfisca deleted a comment from fpagnoux Jun 12, 2017
@MattiSG
Copy link
Member Author

MattiSG commented Jun 12, 2017

Thanks a lot @fpagnoux & @Anna-Livia for this work! 👏

Entities

Beliefs

  • 3a and 3c are out of the way for me. The person ID is ambiguous, and we definitely don't want to deal with that ambiguity. How hard will it be to debug when I'll have two bob persons declared in two different entities? Super hard.
  • 3b is unambiguous, but makes JSON references mandatory, which is definitely not the nicest way to walk through data paths.
  • 1 and 2 mean that entities will also have to declare their pluralised key. This is acceptable.
  • 2 means less depth compared to 1.

Opinion

I like the readability of 1, but I prefer the writability and consolidation of 2. Each entity type defines its specific variables, which means it is easier to have a common data model between OpenFisca and the client code.

Conclusion

I'm in favour of 2.

Computation request

Beliefs

  • URL encoding does not scale. I already encountered this issue with Ludwig, it is not merely theoretical. Actually:

URLs over 2,000 characters will not work in the most popular web browser. (source)
…thus, I think I cannot be the only way to request computations.

  • It would be possible to mix I and II. There is no reason not to support both URL and payload computation requests, as long as we are consistent in the way we treat it (either merge both or send a 400 if both are requested — I would favour the latter).

Opinion

  • III allows for an unexpected but very interesting use case: the input and output data formats can be exactly the same, with the input merely having placeholders, thus making it very easy for the client to detect whether the computation has still to happen or not, and equally easy to invalidate a computation result.

Open questions

  • I and II do not attach requested variables to entities. How can III work in a similar fashion when it provides much more information?

@fpagnoux
Copy link
Member

fpagnoux commented Jun 13, 2017

Great, it seems that we have our first consensus: entities will be described according to suggestion 2.

If anyone disagree, let them speak now or forever hold their peace.

@fpagnoux
Copy link
Member

fpagnoux commented Jun 13, 2017

I and II do not attach requested variables to entities. How can III work in a similar fashion when it provides much more information?

With I and II, if the client requires to calculate salaire_net, it will be actually calculated (and returned) for all individuals. The client cannot specify for which person they want the computation to be done.

This is a little advantage for III : if I'm only interested in the output for one person, I may be confused by the 4 results I'm getting for all the family members.

@fpagnoux
Copy link
Member

fpagnoux commented Jun 13, 2017

Another topic to discuss, and that may influence our choice for II vs III is the output format.

IMO, the output format should be similar to the input one, as in both cases we are describing the values of some variables for some entities. So, if we are computing rsa and salaire_net for 2010-01, the minimal output would be:

❌ Suggestion A

"persons": {
    "bob": {
        "salaire_net": {"2010-01": 1500}
    },
    "bill": {
        "salaire_net": {"2010-01": 0}
    },
    "janet": {
        "salaire_net": {"2010-01": 0}
    },
},
"familles": {
    "dupont": {
        "rsa": {"2010-01": 500}
    }
}

Details:

  • We only return the requested variables
  • We don't repeat the entities structure
  • We don't mention the entities for which we required no computation

Advantages:

  • Small and light

Constraints:

  • You need to know the input to understand the output

@fpagnoux
Copy link
Member

fpagnoux commented Jun 13, 2017

The opposite solution is to repeat the whole input:

✅ Suggestion B

"persons": {
    "bob": {
        "birth_date": {"ETERNITY": "1972-01-01" },
        "salaire_net": {"2010-01": 1500}
    },
    "bill": {
        "birth_date": {"ETERNITY": "1975-01-01" },
        "salaire_net": {"2010-01": 0}
    },
    "janet": {
        "birth_date": {"ETERNITY": "1990-01-01" },
        "salaire_net": {"2010-01": 0}
    },
},
"families": {
    "dupont": {
        "parents": ["bob", "bill"],
        "children": ["janet"],
        "zip_code": {"2010-01": "90210"},
        "rsa": {"2010-01": 500}
    }
},
"companies":
    "boulangerie": {
        "owner": ["bob"],
        "employee": ["bill"],
        "revenue": {"2010-01": 20000}
    },
    "electrician": {
        "employee": ["janet"],
        "revenue": {"2010-01": 40000}
    }
}

Details:

  • We return the input, with the requested variables inserted into it

Advantages:

  • The output is free-standing. I can understand it without knowing the input.

Constraints:

  • Bigger output

This approach seems interesting if combined with III.

@fpagnoux
Copy link
Member

@MattiSG
Copy link
Member Author

MattiSG commented Jun 13, 2017

Additional constraint on B : I need the input to know what was computed vs what was provided as input.

@MattiSG
Copy link
Member Author

MattiSG commented Jun 13, 2017

This feels like a performance vs usability case, with no information on the impact on either performance or usability. I don't think we can have a definite decision in pure theory.

  • Performance impact of A: possible deep merge to do on each client call.
  • Performance impact of B: systematically bigger payload, more bytes to send, transfer and receive.

I'd go with IIIA and observe implementors usage, because A is forward-compatible with B while B isn't with A.

@guillett
Copy link
Member

guillett commented Jun 13, 2017

Just a recap (to amend if incorrect):

  • Suggestions 1,2,3 to decide the format of the input scenario
  • Suggestions I,II,III to decide the format of the variables to be computed
  • Suggestions A,B to decide the format of the output

Suggestion Ab

Closer to the current response (https://mes-aides.gouv.fr/api/situations/58d3eeddd1eb3ef92014ad64/openfisca-response) (replace value by computations).

{
   "computations": {
      /*Results from suggestion A*/
      "familles": {"...": "..."},
      "persons": {"...": "..."}
   },
}

Optionally, inputs can be included as-is (response = Object.assign({ computations: results }, input)).

Thinking on paper:

A Suggestion IIb could be

{
   /* ... */
   request: {
        variables: { "2010-01": ["rsa", "salaire_net"] },
        include_intermediate_variables: true,
        /* other parameters */
   }
}

.

I will try to discuss with @fpagnoux IRL today and provide more details here if it is relevant.

@fpagnoux
Copy link
Member

fpagnoux commented Jun 13, 2017

Ok, so if I understand well you suggest to:

  • In solution A, wrap the output into a computations property to be able to return metadata about the result.
  • In solution II, wrap the requested variables intovariables property to be able to add metadata about the requested computation.

include_intermediate_variables is certainly a legit need. I don't have any example about result metadata though.

@fpagnoux
Copy link
Member

I'd go with IIIA and observe implementors usage, because A is forward-compatible with B while B isn't with A.

@MattiSG To be sure: you mean using the placeholders $calculate, right ?

@MattiSG
Copy link
Member Author

MattiSG commented Jun 13, 2017

you mean using the placeholders $calculate, right ?

Yes. I am not sure $calculate is the best name, maybe an empty object is better, but I do back the idea of using a placeholder.

@fpagnoux
Copy link
Member

fpagnoux commented Jun 14, 2017

Errors

Variable doesn't exist

If I send the flawed request:

"persons": {
    "bob": {
        "birth_date": "1972-01-01",
        "variable_that_doesnt_exist": {"2010-01": 2000 },
        "salaire_net": {"2010-01": null }
    },
    "bill": {
        "birth_date": "1975-01-01",
    },
    "janet": {
        "birth_date": "1990-01-01",
    }
},
"familles": {
    "dupont": {
        "parents": ["bob", "bill"],
        "children": ["janet"],
        "zip_code": "90210"
    }
},
"companies":
    "boulangerie": {
        "owner": ["bob"],
        "employee": ["bill"],
        "revenue": 20000
    },
    "electrician": {
        "employee": ["janet"],
        "revenue": 40000
    }
}

The response has the error code 404.
The content of the response is:

"persons": {
    "bob": {
        "variable_that_doesnt_exist": "You tried to calculate or to set a value for variable 'variable_that_doesnt_exist', but it was not found in the loaded tax and benefit system (openfisca-france@18.5.1). Are you sure you spelled 'variable_that_doesnt_exist' correctly? If this code used to work and suddenly does not, this is most probably linked to an update of the tax and benefit system. Look at its changelog to learn about renames and removals and update your code. If it is an official package, it is probably available on <https://github.com/openfisca/openfisca-france/blob/master/CHANGELOG.md>.",
    }
}

@fpagnoux
Copy link
Member

fpagnoux commented Jun 14, 2017

Other cases:

  • Invalid JSON: return 400, with a JSON containing details about the error.
  • Invalid entity: return 400, with a message error in the misnamed entity.
  • Invalid structure: return 400, with the data path leading to the structural error.
  • Invalid type for a variable value: return 400, with a message error in the variable
  • OpenFisca crashed: 500, with a JSON containing details about the error.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants