Constrain allowable years and states for filtering #9

zaneselvans · 2022-04-07T02:49:23Z

The EPA CEMS dataset is composed of ~1300 row groups, each containing a unique combination of year and state to allow efficient pushdown filtering by time and location. Only a certain range of years (1995-2020) and set of state abbreviations (continental US plus DC) are valid for filtering. It would be nice if we could at least suggest, and preferably require that users only attempt to filter with valid values, so that if they ask for something outside of the allowable values they get an error, rather than waiting a long time for a query that won't give them anything useful.

Is this easy to set up with the intake catalog? Can we designate an allowable set of values for years and states to be used as filters? How are user parameters meant to be used? I've seen that you can enumerate allowable values there, but they seem only to be for use in Jinja templating of the filenames, and not for things like the filters.

The text was updated successfully, but these errors were encountered:

zaneselvans · 2022-04-21T05:21:48Z

This doesn't appear to be a way we can use the parameters -- they seem to be able only to select a single file path at a time. To pass the DNF filters through to Dask/Pandas we won't be able to constrain the allowable values. See this comment and this example

zaneselvans · 2022-04-23T05:58:47Z

Closing this as it doesn't seem to be workable.

zaneselvans added intake Intake data catalogs epacems The EPA's Continuous Emissions Monitoring System hourly dataset parquet Apache Parquet is an open columnar data file format. labels Apr 7, 2022

zaneselvans mentioned this issue Apr 7, 2022

EPA CEMS Intake Catalog catalyst-cooperative/pudl#1564

Open

15 tasks

zaneselvans closed this as completed Apr 23, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Constrain allowable years and states for filtering #9

Constrain allowable years and states for filtering #9

zaneselvans commented Apr 7, 2022

zaneselvans commented Apr 21, 2022

zaneselvans commented Apr 23, 2022

Constrain allowable years and states for filtering #9

Constrain allowable years and states for filtering #9

Comments

zaneselvans commented Apr 7, 2022

zaneselvans commented Apr 21, 2022

zaneselvans commented Apr 23, 2022