Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactored Load_file Op. testcases #229

Merged
merged 12 commits into from
Mar 23, 2022
Merged

Refactored Load_file Op. testcases #229

merged 12 commits into from
Mar 23, 2022

Conversation

utkarsharma2
Copy link
Collaborator

@utkarsharma2 utkarsharma2 commented Mar 21, 2022

Closes: #194

  • Each test should work across all databases
  • Use test_utils.run_dag
  • Use PyTest fixtures and parameterize to have a single main test that will validate transform across multiple databases
  • Loading against all formats (csv, parquet, avro, etc.) -- Partially done, since it's not required for all tests.
  • Loading to a temp_table or named table -- Partially done, since it's not required for all tests.
  • Loading to default schema and to named schema -- Partially done, since it's not required for all tests.

@utkarsharma2 utkarsharma2 marked this pull request as draft March 21, 2022 17:47
@codecov
Copy link

codecov bot commented Mar 21, 2022

Codecov Report

Merging #229 (69a14f9) into main (8a4c23a) will increase coverage by 0.09%.
The diff coverage is 98.18%.

❗ Current head 69a14f9 differs from pull request most recent head d181021. Consider uploading reports for the commit d181021 to get more accurate results

@@            Coverage Diff             @@
##             main     #229      +/-   ##
==========================================
+ Coverage   89.78%   89.88%   +0.09%     
==========================================
  Files          67       67              
  Lines        3594     3538      -56     
  Branches      342      341       -1     
==========================================
- Hits         3227     3180      -47     
+ Misses        325      316       -9     
  Partials       42       42              
Impacted Files Coverage Δ
tests/integration_test_dag.py 0.00% <0.00%> (ø)
conftest.py 92.30% <94.11%> (-0.29%) ⬇️
tests/operators/test_agnostic_append.py 89.09% <100.00%> (ø)
tests/operators/test_agnostic_load_file.py 100.00% <100.00%> (+3.58%) ⬆️
tests/operators/test_agnostic_merge.py 96.47% <100.00%> (ø)
...sts/operators/transform/test_postgres_transform.py 96.66% <100.00%> (ø)
...ts/operators/transform/test_snowflake_transform.py 79.59% <100.00%> (ø)
tests/operators/transform/test_transform.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8a4c23a...d181021. Read the comment docs.

conftest.py Outdated
@@ -64,32 +64,46 @@ def sample_dag():


@pytest.fixture
def tmp_table(sql_server):
def tmp_table(request, sql_server):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should just name this "test_table" if it's not actually a table or tmp_table"

Suggested change
def tmp_table(request, sql_server):
def test_table(request, sql_server):

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sense, updated it.

conftest.py Outdated
@@ -64,32 +64,46 @@ def sample_dag():


@pytest.fixture
def tmp_table(sql_server):
def tmp_table(request, sql_server):
table_type = True
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should name this is_tmp_table

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sense, updated it.

},
)

def get_dataframe_from_table(sql_name: str, tmp_table: Table, hook):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def get_dataframe_from_table(sql_name: str, tmp_table: Table, hook):
def get_dataframe_from_table(sql_name: str, tmp_table: Optional[Table, TempTable], hook):

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you mean Union[Table, TempTable], if so I have added it.

@utkarsharma2 utkarsharma2 marked this pull request as ready for review March 22, 2022 16:24
Copy link
Collaborator

@dimberman dimberman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One change otherwise looks good!

output_table=test_table,
)
test_utils.run_dag(sample_dag)
df = sql_hook.get_pandas_df(f"SELECT * FROM {test_table.qualified_name()}")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@utkarsharma2 we've run into issues with hanging tests in the past when people run get_pandas_df like this. Instead, could you please create an @adf decorated function that validates the test? You should see an example of this in the test_transform file

Copy link
Collaborator Author

@utkarsharma2 utkarsharma2 Mar 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dimberman, I think that would be because of conflicting table names, but since now we are using fixture to get tables, it should not reoccur. But we are adopting this as best practice I'll change it.

Also, I think this test should only fail when there is something wrong with load_file operator only and not @adf.

Copy link
Collaborator

@tatiana tatiana Mar 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe it is worth using the get_pandas_df - it keeps the tests simple enough.

An example: a recent refactor on load has resulted in lots of broken tests in most of the other operators - which is quite an inconvenient side-effect. I believe the integration tests per operator should focus on the operator itself, and avoid - where possible - using other Astro operators.

OUTPUT_TABLE_NAME = "expected_table_from_s3_csv"

self.hook_target = PostgresHook(
postgres_conn_id="postgres_conn", schema="pagila"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's so satisfying to see all these lines deleted 🙌

Copy link
Collaborator

@tatiana tatiana left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, thanks, @utkarsharma2 ! 🚀

@dimberman dimberman merged commit 9ef45bf into main Mar 23, 2022
@dimberman dimberman deleted the refactor_load_file_tests branch March 23, 2022 14:38
utkarsharma2 added a commit that referenced this pull request Mar 30, 2022
Closes: #194 

- [x] Each test should work across all databases
- [x] Use test_utils.run_dag
- [x] Use PyTest fixtures and parameterize to have a single main test that will validate transform across multiple databases
- [x] Loading against all formats (csv, parquet, avro, etc.)  -- **Partially done, since it's not required for all tests.**
- [x] Loading to a temp_table or named table -- **Partially done, since it's not required for all tests.**
- [x] Loading to default schema and to named schema -- **Partially done, since it's not required for all tests.**
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Refactor load_file tests
3 participants