-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactored Load_file Op. testcases #229
Conversation
Codecov Report
@@ Coverage Diff @@
## main #229 +/- ##
==========================================
+ Coverage 89.78% 89.88% +0.09%
==========================================
Files 67 67
Lines 3594 3538 -56
Branches 342 341 -1
==========================================
- Hits 3227 3180 -47
+ Misses 325 316 -9
Partials 42 42
Continue to review full report at Codecov.
|
conftest.py
Outdated
@@ -64,32 +64,46 @@ def sample_dag(): | |||
|
|||
|
|||
@pytest.fixture | |||
def tmp_table(sql_server): | |||
def tmp_table(request, sql_server): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should just name this "test_table" if it's not actually a table or tmp_table"
def tmp_table(request, sql_server): | |
def test_table(request, sql_server): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sense, updated it.
conftest.py
Outdated
@@ -64,32 +64,46 @@ def sample_dag(): | |||
|
|||
|
|||
@pytest.fixture | |||
def tmp_table(sql_server): | |||
def tmp_table(request, sql_server): | |||
table_type = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should name this is_tmp_table
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sense, updated it.
}, | ||
) | ||
|
||
def get_dataframe_from_table(sql_name: str, tmp_table: Table, hook): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def get_dataframe_from_table(sql_name: str, tmp_table: Table, hook): | |
def get_dataframe_from_table(sql_name: str, tmp_table: Optional[Table, TempTable], hook): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you mean Union[Table, TempTable]
, if so I have added it.
ea7fd5b
to
5d035f4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One change otherwise looks good!
output_table=test_table, | ||
) | ||
test_utils.run_dag(sample_dag) | ||
df = sql_hook.get_pandas_df(f"SELECT * FROM {test_table.qualified_name()}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@utkarsharma2 we've run into issues with hanging tests in the past when people run get_pandas_df
like this. Instead, could you please create an @adf decorated function that validates the test? You should see an example of this in the test_transform
file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dimberman, I think that would be because of conflicting table names, but since now we are using fixture to get tables, it should not reoccur. But we are adopting this as best practice I'll change it.
Also, I think this test should only fail when there is something wrong with load_file operator only and not @adf.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe it is worth using the get_pandas_df
- it keeps the tests simple enough.
An example: a recent refactor on load has resulted in lots of broken tests in most of the other operators - which is quite an inconvenient side-effect. I believe the integration tests per operator should focus on the operator itself, and avoid - where possible - using other Astro operators.
OUTPUT_TABLE_NAME = "expected_table_from_s3_csv" | ||
|
||
self.hook_target = PostgresHook( | ||
postgres_conn_id="postgres_conn", schema="pagila" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's so satisfying to see all these lines deleted 🙌
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great, thanks, @utkarsharma2 ! 🚀
…r table object creation. 2. Used the modified fixture to write testcase that uses temp/named tables.
69a14f9
to
d181021
Compare
Closes: #194 - [x] Each test should work across all databases - [x] Use test_utils.run_dag - [x] Use PyTest fixtures and parameterize to have a single main test that will validate transform across multiple databases - [x] Loading against all formats (csv, parquet, avro, etc.) -- **Partially done, since it's not required for all tests.** - [x] Loading to a temp_table or named table -- **Partially done, since it's not required for all tests.** - [x] Loading to default schema and to named schema -- **Partially done, since it's not required for all tests.**
Closes: #194