Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature branch: Rename core + output assets to match new naming protocols #2818

Merged
merged 111 commits into from
Dec 16, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
111 commits
Select commit Hold shift + click to select a range
e0ed4f2
Rename static tables
e-belfer Aug 30, 2023
5f815b3
Rename Census DP1 assets
e-belfer Sep 1, 2023
8da9db6
Test doc fix
e-belfer Sep 1, 2023
bb75aa1
Update core table names for EIA 860, 923, harvested tables, FERC1, code
e-belfer Sep 7, 2023
2bb53f5
Fix integration tests
e-belfer Sep 7, 2023
0cd5e9d
Fix alembic
e-belfer Sep 7, 2023
8790131
Rename 714, 861, epacems
e-belfer Sep 12, 2023
b851fe7
update tests and rest of assets
e-belfer Sep 13, 2023
9d7996d
Resolve merge conflict
e-belfer Sep 13, 2023
2232577
Fix validation tests
e-belfer Sep 13, 2023
7e7303a
Rename ferc output assets
bendnorman Sep 13, 2023
2fcf9f1
Merge branch 'rename-core-assets' into rename-ferc-output-assets
bendnorman Sep 13, 2023
2ee75b2
Rename denorm_cash_flow_ferc1 and remove leading underscore from cros…
bendnorman Sep 13, 2023
ec426d6
Merge branch 'dev' into rename-core-assets
e-belfer Sep 14, 2023
ada8b01
Rename a missing ferc output table and add migration
bendnorman Sep 14, 2023
3417344
Merge branch 'rename-core-assets' into rename-ferc-output-assets
e-belfer Sep 14, 2023
62f0f50
Rename EIA denorm assets
bendnorman Sep 15, 2023
00ce1e9
Recreate ferc rename migration
bendnorman Sep 15, 2023
34dba80
Add docs cross ref fix for intermediate assets
bendnorman Sep 15, 2023
ac79bd5
Resolve small denorm EIA rename issues
bendnorman Sep 17, 2023
ce16bbd
Clean up notebooks
e-belfer Sep 18, 2023
8a4e6ea
Apply naming convention to allocate generation fuel assets
bendnorman Sep 18, 2023
0a912e4
Merge pull request #2856 from catalyst-cooperative/rename-ferc-output…
bendnorman Sep 18, 2023
6b14304
Fix a missing gen fuel asset name in PudlTabl
bendnorman Sep 18, 2023
325fb52
Merge branch 'rename-core-assets' into rename-eia-output-assets
bendnorman Sep 18, 2023
a2042de
Update migrations post ferc1 output rename merge
bendnorman Sep 19, 2023
8b468db
Merge pull request #2858 from catalyst-cooperative/rename-eia-output-…
bendnorman Sep 19, 2023
5f182f9
Merge branch 'rename-core-assets' into rename-allocate-gen-fuel-assets
e-belfer Sep 20, 2023
77a16f5
Update contributor facing documentation with new asset naming convent…
bendnorman Sep 20, 2023
4d5b57d
Add new naming convention to user facing documentation
bendnorman Sep 20, 2023
efb2bbd
Correct allocate-get-fuel down revision
bendnorman Sep 20, 2023
239eb4d
Apply new naming convention to ferc714 respondents, hourly demand and…
bendnorman Sep 21, 2023
09a876d
Merge pull request #2865 from catalyst-cooperative/rename-allocate-ge…
e-belfer Sep 21, 2023
5ebedcd
Fix refs to renamed tables in release notes
bendnorman Sep 21, 2023
6ffe6a5
Rename ferc714 and eia861 output tables in integration tests
bendnorman Sep 21, 2023
d257e52
Merge branch 'rename-core-assets' into rename-annualized-respondents-…
e-belfer Sep 21, 2023
97149cb
Add missing balance authority fk migration
bendnorman Sep 25, 2023
6a5411a
Rename out_ferc714__fipsified_respondents to out_ferc714__respondents…
bendnorman Sep 26, 2023
4d256ec
Respond to first round of Austen's comments
bendnorman Sep 26, 2023
9d30977
Merge pull request #2882 from catalyst-cooperative/rename-annualized-…
bendnorman Sep 26, 2023
ef4b5ad
Merge branch 'rename-core-assets' into create-naming-convention-docs
bendnorman Sep 26, 2023
1a9028d
Update rename-core-assets and clarify raw asset sentence
bendnorman Sep 26, 2023
32dc9ac
Restrict astroid version to avoid random autoapi error
bendnorman Sep 26, 2023
ee23cba
Merge branch 'dev' into rename-core-assets
bendnorman Sep 27, 2023
d8884c2
Merge branch 'dev' into rename-core-assets
bendnorman Sep 28, 2023
765c420
Reset migrations and fix old table refs in docs
bendnorman Sep 28, 2023
7a7a441
Fix names of inputs to exploded tables and xbrl calculation fixes
bendnorman Sep 28, 2023
1aa5116
Rename mcoe and ppl assets
bendnorman Sep 29, 2023
01d3c73
Merge branch 'rename-core-assets' into rename-mcoe-assets
bendnorman Sep 29, 2023
f231452
Fix small ppl migration issue
bendnorman Sep 29, 2023
7b7dba1
Format and sort intermediate resource name cross refs in data dictionary
bendnorman Oct 2, 2023
50cea89
Add upstream mcoe assets back to metadata
bendnorman Oct 2, 2023
791a70b
Update stragler PudlTabl method name
bendnorman Oct 3, 2023
9f578b3
Add frequency to ppl asset name and some clean up
bendnorman Oct 5, 2023
1b5100e
Merge pull request #2904 from catalyst-cooperative/rename-mcoe-assets
bendnorman Oct 6, 2023
8d2ab9a
Merge branch 'dev' into rename-core-assets
cmgosnell Oct 31, 2023
c63dd8f
rename six of the non-contreversial FERC1 tables (core + out)
cmgosnell Oct 31, 2023
e8db0ad
initial rename of the FERC1 core and out tables
cmgosnell Nov 1, 2023
bb088af
add db migration
cmgosnell Nov 1, 2023
8d3b058
rename the ferc1 transformer classes in line with new table names
cmgosnell Nov 1, 2023
797d40e
Merge branch 'rename-core-assets' into create-naming-convention-docs
bendnorman Nov 1, 2023
33fab91
Incorporate some docs changes from #2912
bendnorman Nov 1, 2023
c5fb34f
FINAL FINAL rename of ferc assets
cmgosnell Nov 3, 2023
fc7de0e
ooooops remove the eia860m extraction edit bc that was not supposed t…
cmgosnell Nov 3, 2023
c2af359
Merge branch 'dev' into rename-core-assets
cmgosnell Nov 3, 2023
50e3eef
Merge branch 'rename-core-assets' into create-naming-convention-docs
bendnorman Nov 6, 2023
0c3b9ae
Merge pull request #2995 from catalyst-cooperative/rename-ferc1-assets
bendnorman Nov 6, 2023
10111e4
Remove README.rst from index.rst and move intro content to index
bendnorman Nov 7, 2023
85c6fe3
Add deprecation warnings to PudlTabl and add minor naming docs updates
bendnorman Nov 8, 2023
d61005d
Rename heat_rate_mmbtu_mwh -> heat_rate_mmbtu_mwh_by_unit
bendnorman Oct 4, 2023
53e2f2d
Rename heat rate mmbtu mwh to follow existing naming convention
bendnorman Oct 4, 2023
479ec7f
Remove PudlTabl removal data and make assn table name sources alphabe…
bendnorman Nov 8, 2023
c329804
Explain why CEMS is stored as parquet
bendnorman Nov 8, 2023
d8c01da
Rename heat_rate_mmbtu_mwh_eia/ferc1 columns to unit_heat_rate_mmbtu_…
bendnorman Nov 8, 2023
afaa449
Remove unused ppe_cols_to_grab variable
bendnorman Nov 8, 2023
53d5618
Merge pull request #3028 from catalyst-cooperative/create-renaming-re…
bendnorman Nov 9, 2023
f60592f
Make association asset names more consistent
bendnorman Nov 9, 2023
cb9b188
Merge pull request #2874 from catalyst-cooperative/create-naming-conv…
bendnorman Nov 10, 2023
19a9e7a
Merge branch 'rename-core-assets' into rename-assn-assets
bendnorman Nov 10, 2023
1d2d71c
Add association assset naming convention to docs
bendnorman Nov 10, 2023
0f90efa
Resolve migration issues with unit heat rate column
bendnorman Nov 15, 2023
578a033
Merge pull request #3035 from catalyst-cooperative/rename-assn-assets
bendnorman Nov 15, 2023
46a83b6
Merge branch 'dev' into rename-core-assets
bendnorman Nov 15, 2023
4a2be6a
Update conda-lock.yml and rendered conda environment files.
bendnorman Nov 15, 2023
f82b56e
Merge branch 'rename-core-assets' into rename-heat-rate-mmbtu-mwh-column
bendnorman Nov 15, 2023
7ef9c70
Recreate heat rate migration revision
bendnorman Nov 15, 2023
1396ad8
Merge pull request #3029 from catalyst-cooperative/rename-heat-rate-m…
bendnorman Nov 16, 2023
1e32e57
Merge branch 'dev' into rename-core-assets
bendnorman Nov 16, 2023
0fb7b9f
Use pudl_sqlite_io_manager for fuel_cost_by_generator assets
bendnorman Nov 17, 2023
3528935
Merge branch 'dev' into rename-core-assets
bendnorman Nov 17, 2023
271ffc3
Update conda-lock.yml and rendered conda environment files.
bendnorman Nov 17, 2023
9db6ec2
Checkout lock files from dev
bendnorman Nov 30, 2023
d27c0ca
Merge branch 'dev' into rename-core-assets
bendnorman Dec 1, 2023
77712e6
Update conda-lock.yml and rendered conda environment files.
bendnorman Dec 1, 2023
f07f0a5
Merge branch 'dev' into rename-core-assets
bendnorman Dec 1, 2023
3237907
Merge branch 'rename-core-assets' of github.com:catalyst-cooperative/…
bendnorman Dec 1, 2023
1928c14
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Dec 1, 2023
5e85257
Remove intro.rst and update ferc s3 urls again
bendnorman Dec 1, 2023
e86daad
Merge branch 'rename-core-assets' of github.com:catalyst-cooperative/…
bendnorman Dec 1, 2023
68f2ec5
Merge branch 'dev' into rename-core-assets
bendnorman Dec 4, 2023
24968f4
Update conda-lock.yml and rendered conda environment files.
bendnorman Dec 4, 2023
41c4415
Merge branch 'dev' into rename-core-assets
bendnorman Dec 13, 2023
660eaff
Remove some old table names from metaddata
bendnorman Dec 14, 2023
aaa99ee
Update conda-lock.yml and rendered conda environment files.
bendnorman Dec 14, 2023
e17bd42
Merge branch 'dev' into rename-core-assets
bendnorman Dec 14, 2023
1ad3830
Merge branch 'rename-core-assets' of github.com:catalyst-cooperative/…
bendnorman Dec 14, 2023
be7e5c2
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Dec 14, 2023
2414de7
Remove ref to non existant doc page, remove files no longer in dev
bendnorman Dec 14, 2023
4ecfc86
Merge branch 'dev' into rename-core-assets
bendnorman Dec 15, 2023
9f3d293
Merge branch 'dev' into rename-core-assets
bendnorman Dec 15, 2023
9544618
Merge branch 'dev' into rename-core-assets
bendnorman Dec 15, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
81 changes: 63 additions & 18 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -47,36 +47,81 @@ it's often difficult to work with. PUDL takes the original spreadsheets, CSV fil
and databases and turns them into a unified resource. This allows users to spend more
time on novel analysis and less time on data preparation.

Who is PUDL for?
----------------

The project is focused on serving researchers, activists, journalists, policy makers,
and small businesses that might not otherwise be able to afford access to this data from
commercial sources and who may not have the time or expertise to do all the data
processing themselves from scratch.
and small businesses that might not otherwise be able to afford access to this data
from commercial sources and who may not have the time or expertise to do all the
data processing themselves from scratch.

We want to make this data accessible and easy to work with for as wide an audience as
possible: anyone from a grassroots youth climate organizers working with Google sheets
to university researchers with access to scalable cloud computing resources and everyone
in between!
possible: anyone from a grassroots youth climate organizers working with Google
sheets to university researchers with access to scalable cloud computing
resources and everyone in between!

PUDL is comprised of three core components:

- **Raw Data Archives**

- PUDL `archives <https://github.com/catalyst-cooperative/pudl-archiver>`__
all the raw data inputs on `Zenodo <https://zenodo.org/communities/catalyst-cooperative/?page=1&size=20>`__
to ensure perminant, versioned access to the data. In the event that an agency
changes how they publish data or deletes old files, the ETL will still have access
to the original inputs. Each of the data inputs may have several different versions
archived, and all are assigned a unique DOI and made available through the REST API.
You can read more about the Raw Data Archives in the
`docs <https://catalystcoop-pudl.readthedocs.io/en/dev/intro.html#raw-data-archives>`__.
- **ETL Pipeline**

- The ETL pipeline (this repo) ingests the raw archives, cleans them,
integrates them, and outputs them to a series of tables stored in SQLite Databases,
Parquet files, and pickle files (the Data Warehouse). Each release of the PUDL
Python package is embedded with a set of of DOIs to indicate which version of the
raw inputs it is meant to process. This process helps ensure that the ETL and it's
outputs are replicable. You can read more about the ETL in the
`docs <https://catalystcoop-pudl.readthedocs.io/en/dev/intro.html#the-etl-process>`__.
- **Data Warehouse**

- The outputs from the ETL, sometimes called "PUDL outputs",
are stored in a data warehouse as a collection of SQLite and Parquet files so that
users can access the data without having to run any code. Learn more about how to
access the data `here <https://catalystcoop-pudl.readthedocs.io/en/dev/data_access.html>`__.

What data is available?
-----------------------

PUDL currently integrates data from:

* `EIA Form 860 <https://www.eia.gov/electricity/data/eia860/>`__: 2001 - 2022
* `EIA Form 860m <https://www.eia.gov/electricity/data/eia860m/>`__: 2023-06
* `EIA Form 861 <https://www.eia.gov/electricity/data/eia861/>`__: 2001 - 2022
* `EIA Form 923 <https://www.eia.gov/electricity/data/eia923/>`__: 2001 - 2023-08
* `EPA Continuous Emissions Monitoring System (CEMS) <https://campd.epa.gov/>`__: 1995 - 2022
* `FERC Form 1 <https://www.ferc.gov/industries-data/electric/general-information/electric-industry-forms/form-1-electric-utility-annual>`__: 1994-2021
* `FERC Form 714 <https://www.ferc.gov/industries-data/electric/general-information/electric-industry-forms/form-no-714-annual-electric/data>`__: 2006-2020
* `US Census Demographic Profile 1 Geodatabase <https://www.census.gov/geographies/mapping-files/2010/geo/tiger-data.html>`__: 2010
* **EIA Form 860**: 2001-2022
- `Source Docs <https://www.eia.gov/electricity/data/eia860/>`__
- `PUDL Docs <https://catalystcoop-pudl.readthedocs.io/en/dev/data_sources/eia860.html>`__
* **EIA Form 860m**: 2023-06
- `Source Docs <https://www.eia.gov/electricity/data/eia860m/>`__
* **EIA Form 861**: 2001-2022
- `Source Docs <https://www.eia.gov/electricity/data/eia861/>`__
- `PUDL Docs <https://catalystcoop-pudl.readthedocs.io/en/dev/data_sources/eia861.html>`__
* **EIA Form 923**: 2001-2022
- `Source Docs <https://www.eia.gov/electricity/data/eia923/>`__
- `PUDL Docs <https://catalystcoop-pudl.readthedocs.io/en/dev/data_sources/eia923.html>`__
* **EPA Continuous Emissions Monitoring System (CEMS)**: 1995-2022
- `Source Docs <https://campd.epa.gov/>`__
- `PUDL Docs <https://catalystcoop-pudl.readthedocs.io/en/dev/data_sources/epacems.html>`__
* **FERC Form 1**: 1994-2021
- `Source Docs <https://www.ferc.gov/industries-data/electric/general-information/electric-industry-forms/form-1-electric-utility-annual>`__
- `PUDL Docs <https://catalystcoop-pudl.readthedocs.io/en/dev/data_sources/ferc1.html>`__
* **FERC Form 714**: 2006-2020
- `Source Docs <https://www.ferc.gov/industries-data/electric/general-information/electric-industry-forms/form-no-714-annual-electric/data>`__
- `PUDL Docs <https://catalystcoop-pudl.readthedocs.io/en/dev/data_sources/ferc714.html>`__
* **FERC Form 2**: 2021 (raw only)
- `Source Docs <https://www.ferc.gov/industries-data/natural-gas/industry-forms/form-2-2a-3-q-gas-historical-vfp-data>`__
* **FERC Form 6**: 2021 (raw only)
- `Source Docs <https://www.ferc.gov/general-information-1/oil-industry-forms/form-6-6q-historical-vfp-data>`__
* **FERC Form 60**: 2021 (raw only)
- `Source Docs <https://www.ferc.gov/form-60-annual-report-centralized-service-companies>`__
* **US Census Demographic Profile 1 Geodatabase**: 2010
- `Source Docs <https://www.census.gov/geographies/mapping-files/2010/geo/tiger-data.html>`__

Thanks to support from the `Alfred P. Sloan Foundation Energy & Environment
Program <https://sloan.org/programs/research/energy-and-environment>`__, from
2021 to 2024 we will be integrating the following data as well:
2021 to 2024 we will be cleaning and integrating the following data as well:

* `EIA Form 176 <https://www.eia.gov/dnav/ng/TblDefs/NG_DataSources.html#s176>`__
(The Annual Report of Natural Gas Supply and Disposition)
Expand Down
2 changes: 1 addition & 1 deletion devtools/debug-eia-etl.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -263,7 +263,7 @@
"outputs": [],
"source": [
"%%time\n",
"asset_key = \"fuel_receipts_costs_eia923\"\n",
"asset_key = \"core_eia923__monthly_fuel_receipts_costs\"\n",
"df = defs.load_asset_value(AssetKey(asset_key))\n",
"\n",
"df.head()"
Expand Down
16 changes: 8 additions & 8 deletions devtools/debug-ferc1-etl.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -99,7 +99,7 @@
},
"outputs": [],
"source": [
"ferc1_xbrl_raw_dfs[\"fuel_ferc1\"][\"duration\"].report_year"
"ferc1_xbrl_raw_dfs[\"core_ferc1__yearly_steam_plants_fuel_sched402\"][\"duration\"].report_year"
]
},
{
Expand Down Expand Up @@ -206,7 +206,7 @@
"metadata": {},
"outputs": [],
"source": [
"table_name = \"other_regulatory_liabilities_ferc1\"\n",
"table_name = \"core_ferc1__yearly_other_regulatory_liabilities_sched278\"\n",
"TRANSFORMER = transformers[table_name] # add a table here"
]
},
Expand Down Expand Up @@ -326,8 +326,8 @@
"source": [
"transformed_tables = {}\n",
"for table_name, transformer in transformers.items():\n",
" if table_name == \"plants_steam_ferc1\":\n",
" # plants_steam_ferc1 is a special case. It depends on the transformed fuel_ferc1 table.\n",
" if table_name == \"core_ferc1__yearly_steam_plants_sched402\":\n",
" # core_ferc1__yearly_steam_plants_sched402 is a special case. It depends on the transformed core_ferc1__yearly_steam_plants_fuel_sched402 table.\n",
" continue\n",
" transformed_tables[transformer.table_id.value] = transformer.transform(\n",
" raw_dbf=ferc1_dbf_raw_dfs[transformer.table_id.value],\n",
Expand All @@ -345,13 +345,13 @@
},
"outputs": [],
"source": [
"# Handle special case for \"plants_steam_ferc1\"\n",
"transformer = transformers[\"plants_steam_ferc1\"]\n",
"# Handle special case for \"core_ferc1__yearly_steam_plants_sched402\"\n",
"transformer = transformers[\"core_ferc1__yearly_steam_plants_sched402\"]\n",
"transformed_tables[transformer.table_id.value] = transformer.transform(\n",
" raw_dbf=ferc1_dbf_raw_dfs[transformer.table_id.value],\n",
" raw_xbrl_instant=ferc1_xbrl_raw_dfs[transformer.table_id.value][\"instant\"],\n",
" raw_xbrl_duration=ferc1_xbrl_raw_dfs[transformer.table_id.value][\"duration\"],\n",
" transformed_fuel=transformed_tables[\"fuel_ferc1\"],\n",
" transformed_fuel=transformed_tables[\"core_ferc1__yearly_steam_plants_fuel_sched402\"],\n",
")"
]
}
Expand All @@ -372,7 +372,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.3"
"version": "3.11.5"
}
},
"nbformat": 4,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@
"# Local libraries\n",
"import pudl\n",
"from pudl.workspace.setup import PudlPaths\n",
"from pudl.analysis.ferc1_eia_train import *"
"from pudl.analysis.eia_ferc1_train import *"
]
},
{
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@
"# Local libraries\n",
"import pudl\n",
"from pudl.workspace.setup import PudlPaths\n",
"from pudl.analysis.ferc1_eia_train import *"
"from pudl.analysis.eia_ferc1_train import *"
]
},
{
Expand Down Expand Up @@ -188,7 +188,7 @@
"outputs": [],
"source": [
"current_training_df = pd.read_csv(\n",
" importlib.resources.files(\"pudl.package_data.glue\").joinpath(\"ferc1_eia_train.csv\")\n",
" importlib.resources.files(\"pudl.package_data.glue\").joinpath(\"eia_ferc1_train.csv\")\n",
")\n",
"path_to_overrides = \"./add_to_training/\"\n",
"override_files = [\n",
Expand Down Expand Up @@ -326,10 +326,10 @@
"# Get paths to CSVs.\n",
"from importlib import resources\n",
"one_to_many = path_to_one_to_many=resources.files(\"pudl.package_data.glue\").joinpath(\n",
" \"ferc1_eia_one_to_many.csv\",\n",
" \"eia_ferc1_one_to_many.csv\",\n",
" )\n",
"nulls = path_to_one_to_many=resources.files(\"pudl.package_data.glue\").joinpath(\n",
" \"ferc1_eia_null.csv\",\n",
" \"eia_ferc1_null.csv\",\n",
" )"
]
},
Expand Down
75 changes: 70 additions & 5 deletions devtools/inspect-assets.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -50,10 +50,61 @@
"\n",
"from pudl.etl import defs\n",
"\n",
"asset_key = \"raw_generator_existing_eia860\"\n",
"asset_key = \"exploded_balance_sheet_assets_ferc1\"\n",
"df = defs.load_asset_value(AssetKey(asset_key))\n",
"\n",
"df.head()"
"#df[df.row_type_xbrl == \"correction\"].xbrl_factoid.value_counts()\n",
"#df[(df.xbrl_factoid.isin([\"operation_expense\", \"maintenance_expense\"]))&(df.rel_diff.notnull())&(df.rel_diff!=0)].sort_values(['utility_id_ferc1', 'report_year', 'xbrl_factoid', 'rel_diff']).head(50)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b2d99594",
"metadata": {},
"outputs": [],
"source": [
"df[(df.xbrl_factoid==\"accumulated_depreciation\")&(df.plant_status==\"in_service\")&(df.plant_function==\"total\")]"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "467111b1",
"metadata": {},
"outputs": [],
"source": [
"df[df.xbrl_factoid.isin(factoids)&(df.utility_id_ferc1==9)&(df.report_year==1998)]"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "c6f7427a",
"metadata": {},
"outputs": [],
"source": [
"factoids = ['distribution_maintenance_expense_electric',\n",
" 'hydraulic_power_generation_maintenance_expense',\n",
" 'maintenance_of_general_plant',\n",
" 'nuclear_power_generation_maintenance_expense',\n",
" 'other_power_generation_maintenance_expense',\n",
" 'regional_market_maintenance_expense',\n",
" 'steam_power_generation_maintenance_expense',\n",
" 'transmission_maintenance_expense_electric']"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "951b718d",
"metadata": {},
"outputs": [],
"source": [
"asset_key = \"calculation_components_xbrl_ferc1\"\n",
"calcs = defs.load_asset_value(AssetKey(asset_key))\n",
"\n",
"calcs[(calcs.xbrl_factoid_parent == \"accumulated_depreciation\")].head(50)"
]
},
{
Expand All @@ -77,10 +128,24 @@
"\n",
"from pudl.etl import defs\n",
"\n",
"asset_key = \"fuel_receipts_costs_eia923\"\n",
"asset_key = \"emissions_unit_ids_epacems\"\n",
"df = defs.load_asset_value(AssetKey(asset_key))\n",
"\n",
"df.head()"
"df"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "9f0d118b",
"metadata": {},
"outputs": [],
"source": [
"from pudl.output.epacems import epacems\n",
"\n",
"test_epacems = epacems(states = [\"ID\"], years = [2022])\n",
"\n",
"test_epacems[test_epacems.operating_datetime_utc>=\"2022-01-04\"].head(40)"
]
}
],
Expand All @@ -100,7 +165,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.10"
"version": "3.11.5"
}
},
"nbformat": 4,
Expand Down
14 changes: 7 additions & 7 deletions devtools/python-output-table-conversion-debug.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -48,14 +48,14 @@
"\n",
"You can create an asset by creating a new function and adding the `@asset` decorator. For now, the only attribute you should add to the decorator is the `compute_type = \"Python\"`. All this does is add a cute tag to the asset in the dag to let people know how the asset is being processed.\n",
"\n",
"Next you'll want to figure out what tables the output table depends on. Read through the old output function to see which normalized tables or output functions are being used as inputs to the joins and imputations. Once you have the input table names, add them to the asset function parameters. For example, the `utilities_eia860()` function merges `utilities_entity_eia`, `utilities_eia860`, and `utilities_eia` tables together so the asset would look like this:\n",
"Next you'll want to figure out what tables the output table depends on. Read through the old output function to see which normalized tables or output functions are being used as inputs to the joins and imputations. Once you have the input table names, add them to the asset function parameters. For example, the `utilities_eia860()` function merges `core_eia__entity_utilities`, `core_eia860__scd_utilities`, and `core_pudl__assn_eia_pudl_utilities` tables together so the asset would look like this:\n",
"\n",
"```python\n",
"@asset(compute_kind=\"Python\")\n",
"def denorm_utilities_eia860(\n",
" utilities_entity_eia: pd.DataFrame,\n",
" utilities_eia860: pd.DataFrame,\n",
" utilities_eia: pd.DataFrame,\n",
" core_eia__entity_utilities: pd.DataFrame,\n",
" core_eia860__scd_utilities: pd.DataFrame,\n",
" core_pudl__assn_eia_pudl_utilities: pd.DataFrame,\n",
"):\n",
" ... # joining logic\n",
" return joined_df\n",
Expand Down Expand Up @@ -108,9 +108,9 @@
"```python\n",
"@asset(io_manager_key=\"pudl_sqlite_io_manager\", compute_kind=\"Python\")\n",
"def denorm_utilities_eia860(\n",
" utilities_entity_eia: pd.DataFrame,\n",
" utilities_eia860: pd.DataFrame,\n",
" utilities_eia: pd.DataFrame,\n",
" core_eia__entity_utilities: pd.DataFrame,\n",
" core_eia860__scd_utilities: pd.DataFrame,\n",
" core_pudl__assn_eia_pudl_utilities: pd.DataFrame,\n",
"):\n",
" ... # joining logic\n",
" return joined_df\n",
Expand Down
7 changes: 6 additions & 1 deletion docs/data_access.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,12 +2,17 @@
Data Access
=======================================================================================

We publish the :doc:`PUDL pipeline <intro>` outputs in several ways to serve
We publish the PUDL pipeline outputs in several ways to serve
different users and use cases. We're always trying to increase accessibility of the
PUDL data, so if you have a suggestion please `open a GitHub issue
<https://github.com/catalyst-cooperative/pudl/issues>`__. If you have a question you
can `create a GitHub discussion <https://github.com/orgs/catalyst-cooperative/discussions/new?category=help-me>`__.

PUDL's primary data output is the ``pudl.sqlite`` database. We recommend working with
tables with the ``out_`` prefix, as these tables contain the most complete and easiest
to work with data. For more information about the different types
of tables, read through :ref:`PUDL's naming conventions <asset-naming>`.

.. _access-modes:

---------------------------------------------------------------------------------------
Expand Down
Loading
Loading