-
Notifications
You must be signed in to change notification settings - Fork 13.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor: examples data loading for tests #17893
refactor: examples data loading for tests #17893
Conversation
cc223ea
to
0a5d2ae
Compare
Codecov Report
@@ Coverage Diff @@
## master #17893 +/- ##
==========================================
- Coverage 67.10% 66.92% -0.18%
==========================================
Files 1609 1612 +3
Lines 64897 64988 +91
Branches 6866 6872 +6
==========================================
- Hits 43547 43495 -52
- Misses 19484 19626 +142
- Partials 1866 1867 +1
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few minor comments, but I agree we could do with more consolidation on the example data front, as it has evolved over a long time and is inconsistent. One main question I have is can you elaborate on the need for the generator/factory/impl/singleton design being proposed here? I'm not opposed to it, but it feels like a lot of abstraction for something that may not be required. For example, in the fixture where we do return list(BirthNamesGeneratorFactory.make().generate())
, I wonder if we couldn't just do something like return
BirthNamesExample.get_data()? Point being, if we don't anticipate we'll need a generator (here we're just putting
list` around what's being yielded) or a full blown singleton pattern, why add one?
tests/common/example_data_generator/birth_names_generator_factory.py
Outdated
Show resolved
Hide resolved
tests/common/example_data_generator/birth_names_generator_factory.py
Outdated
Show resolved
Hide resolved
add tests for common
@villebro @amitmiran137 regarding the abstractions: this is how I'm coding ... design to interfaces |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lgtm
* refactor: replace the way the birth_names data is generated * refactor: replace the way the birth_names data is generated * refactor structure add tests for common
* refactor: replace the way the birth_names data is generated * refactor: replace the way the birth_names data is generated * refactor structure add tests for common
The way the examples data is generated and loaded into the database as SQL data and as superset objects
is not flexible, configurable and in case you would like deterministic data it does not support it.
In addition, there are many reusing ideas and duplicated codes.
For example, in another PR I'm trying to develop I encountered surprising behavior when I tried to load world bank data.
I thought the way it is loaded and cleaned up be the same as loading the birth_names data but no.
This PR is the first one of achieving this. It abstract the way the raw data is generated.