[#9570] Fix fixtures in fixtures/subfolders throwing parsing error #9714

slothkong · 2024-02-29T21:54:11Z

Resolves #9570

Problem

Given that:

The user has a valid unit test definition using csv fixtures
The user stores the csv fixtures under tests/fixtures directory
The execution of dbt build succeeds (target/manifest.json is created)
The execution of dbt test also succeeds
The user moves the csv fixtures to a subdirectory, such as tests/fixtures/my_model

Then:
When the user attempts to run once more dbt test the execution fails with an unexpected error, like:

 File "/path/to/my/site-packages/dbt/parser/manifest.py", line 407, in load
    (_, line, method) = formatted_lines[-3].split(", ")
    ^^^^^^^^^^^^^^^^^
ValueError: not enough values to unpack (expected 3, got 1)

--
Although at first glance, this looked to me like the PartialParsing class was not doing a good job at detecting changes or deleting fixture nodes in-between compilations, after further testing I managed to reproduce the issue in two different scenarios:

When the csv fixtures are moved to a subdirectory.
When the csv fixtures are initially located in subdirectories but then are pulled up to test/fixtures.

So the issue is somewhat more general than was originally reported. That leads me to believe is rather related to the logic that triggers partial parsing and what happens after that logic exits, e.g. error handling within the ManifestLoader class. In said class, the traceback is being parsed under the assumption that frames in the stack conform to an strict format:

dbt-core/core/dbt/parser/manifest.py

Line 407 in e4fe839

(_, line, method) = formatted_lines[-3].split(", ")

However, in this particular case the assumption does not seem to hold, plausibly due to the exception type, how/where it was thrown, or even the presence of extra newline ('\n') or caret ('^') characters in the traceback. As a result, the exception is not gracefully caught.

In this state, the user is left with only a few options, with the easier ones perhaps being:
a) Run dbt clean before proceeding (to trick the ManifestLoader into skipping partial parsing altogether)
b) Move the csv fixture back to their original directory — needless to say, a downgrade of the feature scope

Solution

Instead of making too many assumptions about what the traceback.format_exc() statement in line 405 could return (in regard to list item order or string content and form), I am suggesting to change the ManifestLoader to use Exception object attributes and traceback object functions/attributes to zero-in on the last frame in the stack of the traceback, and extract from it the last exception details, i.e. the values of code, line and method that should be added to the exc_info dictionary for graceful exception catching and logging.

I argue this should be a safer alternative to adding some additional logic like if-else statements and string comparisons to deal with edge-cases where extra newlines or caret characters are present or missing.

The approach does not affect any function API either, and the content of the exc_info dictionary is filled as I believe it should, something along the lines of:

# example evaluation of  `exc_info` dictionary in line 408 of manifest.py
exc_info = {
    "traceback": """Traceback (most recent call last):
      File "/path/to/my/site-packages/dbt-core/core/dbt/parser/manifest.py", line 398, in load

      ...

      unit_test = self.saved_manifest.unit_tests.pop(unique_id)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      KeyError: 'unit_test.my_dbt_project.my_model.test_is_valid_email_address'""",
    "exception": "KeyError: 'unit_test.my_dbt_project.my_model.test_is_valid_email_address'", 
    "code": "    unit_test = self.saved_manifest.unit_tests.pop(unique_id)", 
    "location": "line 616 in delete_fixture_node"
}

Last but not least, with the code changes I suggest, the user can of now freely move the csv fixture to subdirectories or out of them, since whenever partial parsing runs and fails due to a fixture node no longer existing, any exception traceback would be properly parsed and full parsing will kick in to ensure a working manifest.json is created as part of the unit test execution.

Checklist

I have read the contributing guide and understand what's expected of me
I have run this code in development and it appears to resolve the stated issue
This PR includes tests, or tests are not required/relevant for this PR
This PR has no interface changes (e.g. macros, cli, logs, json artifacts, config files, adapter interface, etc) or this PR has already received feedback and approval from Product or DX
This PR includes type annotations for new and modified functions

Additional Context

I debugged, developed and successfully tested integration with help of the postgresql adapter.
After code changes where staged, make test and pre-commit run were successfully executed.
My development dbt project structure looks like:

.
├── dbt_project.yml
├── models
│   ├── my_model.sql
│   ├── my_model.yml
│   ├── top_level_domains.sql
│   ├── top_level_domains.yml
│   ├── unit_test.yml
│   ├── users.sql
│   └── users.yml
├── profiles.yml
└── tests
    └── fixtures
        ├── my_model_mock_output.csv
        ├── top_level_domains_mock_input.csv
        └── users_mock_input.csv

The development my_model.sql definition I used:

select
  users.user_id,
  users.email,
  case
    when regexp_like(users.email,'^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}$') = true
      and accepted_email_domains.tld is not null
        then true
     else false
  end as is_valid_email_address
from {{ ref('users')}} users
left join {{ ref('top_level_domains') }} accepted_email_domains
on users.email_top_level_domain = lower(accepted_email_domains.tld)

The development unit_test.yml definition I used:

unit_tests:
  - name: test_is_valid_email_address
    model: my_model
    given:
      - input: ref('users')
        format: csv
        fixture: users_mock_input
      - input: ref('top_level_domains')
        format: csv
        fixture: top_level_domains_mock_input
    expect:
      format: csv
      fixture: my_model_mock_output

…error

github-actions · 2024-02-29T21:54:27Z

Thank you for your pull request! We could not find a changelog entry for this change. For details on how to document a change, see the contributing guide.

…olders throwing parsing error

slothkong · 2024-03-01T07:24:25Z

Hello @graciegoheen, I believe this one fixes the issue you encountered, with about 5 code line changes only. Would you be so kind as to extend me a review? Or perhaps @jtcohen6 would be willing and able to share a thumbs up/ thumbs down? 🙈

Cheers

codecov · 2024-03-01T19:15:54Z

Codecov Report

Attention: Patch coverage is 88.37209% with 5 lines in your changes are missing coverage. Please review.

Project coverage is 88.10%. Comparing base (e4fe839) to head (2c165fc).
Report is 6 commits behind head on main.

Files	Patch %	Lines
core/dbt/parser/manifest.py	88.37%	5 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #9714      +/-   ##
==========================================
+ Coverage   88.08%   88.10%   +0.01%     
==========================================
  Files         178      178              
  Lines       22433    22443      +10     
==========================================
+ Hits        19761    19774      +13     
+ Misses       2672     2669       -3

Flag	Coverage Δ
integration	`85.48% <46.51%> (-0.15%)`	⬇️
unit	`62.36% <81.39%> (+0.10%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

slothkong · 2024-03-01T20:48:48Z

I'll deal with coverage for the one line that is missing, shortly~

* Transform skip_parsing (private variable of ManifestLoader.load()) into instance-attribute of ManifestLoader(), with default value False (to enable splitting of ManifestLoader.load()) * Split ManifestLoader.load(), to extract operation of PartialParsing into new method called ManifestLoader.safe_update_project_parser_files_partially() (to simplify both cognitive complexity in the codebase and mocking in unittestest) * Add "ignore" type-comments in new ManifestLoader.safe_update_project_parser_files_partially() (to silence mypy warnings regarding instance-attributes which can be initialized as None or as something else, e.g. self.saved_manifest)[1] [1] Although I wanted avoid "ignore" type-comments, it seems like addressing these mypy warnings in a stricter sense requires technical alignment and broader code changes. For example, might need to initialize self.saved_manifest as Manifest, instead of Optional[Manifest], so that PartialParsing gets inputs with type it currently expects. ... perhaps too far beyond the scope of this fix?

slothkong · 2024-03-06T11:48:28Z

Hallo again @graciegoheen,

Hitting the missing line in the unittest was more difficult than I originally thought, hence the delay since my last comment.
Nevertheless, I've got good news:

I generated the coverage report locally and, as of the latest commit (2c165fc), lines with missing coverage in manifest.py went down from 491 to 466 📉
Naturally, the logic I had modified in previous commits is also part of the new bulk of lines with coverage.

I am happy to hear any feedback~

Regards

gshank

Looks good! Nice test case. Moving that code out into a separate function is definitely an improvement.

slothkong added 3 commits February 29, 2024 20:16

[dbt-labs#9570] Fix fixtures in fixtures/subfolders throwing parsing …

7a33075

…error

Fast-forward imports to match upstream

891ad69

Re-introduce doc strings on traceback info handling

f506cc9

slothkong requested a review from a team as a code owner February 29, 2024 21:54

cla-bot bot added the cla:yes label Feb 29, 2024

[dbt-labs#9570] Changelog update for fix of fixtures in fixtures/subf…

eb0bcba

…olders throwing parsing error

graciegoheen added the ready_for_review Externally contributed PR has functional approval, ready for code review from Core engineering label Mar 1, 2024

gshank approved these changes Mar 13, 2024

View reviewed changes

gshank merged commit 7b8ae21 into dbt-labs:main Mar 13, 2024
56 checks passed

gshank mentioned this pull request Mar 15, 2024

[Bug] fixtures in fixtures/subfolders throwing parsing error #9570

Closed

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[#9570] Fix fixtures in fixtures/subfolders throwing parsing error #9714

[#9570] Fix fixtures in fixtures/subfolders throwing parsing error #9714

slothkong commented Feb 29, 2024 •

edited

Loading

github-actions bot commented Feb 29, 2024

slothkong commented Mar 1, 2024 •

edited

Loading

codecov bot commented Mar 1, 2024 •

edited

Loading

slothkong commented Mar 1, 2024

slothkong commented Mar 6, 2024

gshank left a comment

[#9570] Fix fixtures in fixtures/subfolders throwing parsing error #9714

[#9570] Fix fixtures in fixtures/subfolders throwing parsing error #9714

Conversation

slothkong commented Feb 29, 2024 • edited Loading

Problem

Solution

Checklist

Additional Context

github-actions bot commented Feb 29, 2024

slothkong commented Mar 1, 2024 • edited Loading

codecov bot commented Mar 1, 2024 • edited Loading

Codecov Report

slothkong commented Mar 1, 2024

slothkong commented Mar 6, 2024

gshank left a comment

Choose a reason for hiding this comment

slothkong commented Feb 29, 2024 •

edited

Loading

slothkong commented Mar 1, 2024 •

edited

Loading

codecov bot commented Mar 1, 2024 •

edited

Loading