Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix FERC714 validation failure #3021

Closed
cmgosnell opened this issue Nov 7, 2023 · 6 comments · Fixed by #3023
Closed

Fix FERC714 validation failure #3021

cmgosnell opened this issue Nov 7, 2023 · 6 comments · Fixed by #3023
Labels
ferc714 Anything having to do with FERC Form 714

Comments

@cmgosnell
Copy link
Member

cmgosnell commented Nov 7, 2023

There are 90 missing records post #2999. In my understanding this stems from the processing within annualized_respondents_ferc714 and is filtering down into summarized_demand_ferc714 (which is a db table and output table being min/max row checked). It seems there are 6 respondents that are missing from the new table. I pulled the last-passed nightly build next to my local db:

>>> tbl_name = "summarized_demand_ferc714"
>>> tbl= pd.read_sql(tbl_name, pudl_engine)
>>> tbl_night = pd.read_sql_table(tbl_name, pudl_engine_night)
>>> pks = pudl.metadata.classes.Resource.from_id(
...    tbl_name
... ).schema.primary_key
>>> tbl = tbl.set_index(pks)
>>> tbl_night = tbl_night.set_index(pks)
>>> assert(tbl.index.difference(tbl_night.index).empty)
>>> missing.reset_index().respondent_id_ferc714.value_counts()

respondent_id_ferc714
153    15
155    15
204    15
229    15
288    15
291    15
Name: count, dtype: int64

@rousik you did some digging into this as well and if you have a similar view but maybe found a more broad issue with this change

@rousik
Copy link
Collaborator

rousik commented Nov 7, 2023

@cmgosnell yeah, I found that and reported on my findings on slack (still not sure what comms channel to use when) here: https://catalystcooperative.slack.com/archives/C02BSMJJVR8/p1699336003657749. I have dug into the data that feeds into this table and found out that report_date was up to recently calculated very incorrectly, so report_date year did not match the year of utc_datetime in about 80% of the time. With the recent change, we have eliminated null report_date values and aligned most of the date year with utc_datetime (0.07% discrepancy, likely around end of year). I suspect that perhaps the loss is due to shifting of the rows between the years?

@rousik
Copy link
Collaborator

rousik commented Nov 7, 2023

After further digging, I can see that the only piece of information that flows from demand_hourly_pa_ferc714 are distinct report_date non-null values and those have not changed as a result of my change. I suspect that the changes in here are likely due to changes of the other inputs, namely one of these:

    respondent_id_ferc714: pd.DataFrame,
    denorm_utilities_eia: pd.DataFrame,
    service_territory_eia861: pd.DataFrame,
    balancing_authority_eia861: pd.DataFrame,

@cmgosnell
Copy link
Member Author

Okay @rousik well I think I've isolated at which point in the processing for demand_hourly_pa_ferc714 these 6 now-missing respondents are dropped. These respondents don't show up in georeferenced_counties_ferc714. So when we are doing an inner merge here we are loosing these guys.

why they showed up before is still a bit of a mystery to me...

@cmgosnell
Copy link
Member Author

okay these records indeed don't have fipsified_respondents_ferc714 and in the old version they had no actual fips codes and we think that is okay! I am going to convert this inner merge to a left merge so they annualized data shows up. but fix the row could for the fipsified respondents.

(also i'm going to add in a validation test re the demand_hourly_pa_ferc714 no more than .1% of the years in the datetime column should be off from the report_date column)

@jdangerx
Copy link
Member

@cmgosnell is this closeable?

@e-belfer e-belfer changed the title Fix FERC717 validation failure Fix FERC714 validation failure Nov 23, 2023
@zaneselvans
Copy link
Member

@cmgosnell I'm closing since this seems to be working... let us know if there's still something here to address.

@zaneselvans zaneselvans added the ferc714 Anything having to do with FERC Form 714 label Jan 9, 2024
@zaneselvans zaneselvans moved this from New to Done in Catalyst Megaproject Jan 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ferc714 Anything having to do with FERC Form 714
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

4 participants