You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm trying to download a few years of data for one small table. It takes about 1 minute to download each 100kB file, which is quite slow.
When I set the logging level to DEBUG, I see that this library is making 37 requests for each file that gets downloaded. Mostly to re-crawl the index pages.
e.g. when I try to download data for 2024, this library downloads page /Data_Archive/Wholesale_Electricity/MMSDM/2009/, twice.
What's the purpose of this?
e.g. If I request data for Jan 2024, the library should only look in folder http://www.nemweb.com.au/Data_Archive/Wholesale_Electricity/MMSDM/2024/MMSDM_2024_01/MMSDM_Historical_Data_SQLLoader/DATA/. I don't see why it needs to look in any other folder. Isn't this file structure predictable?
In fact, in the code on this line it seems the exact filename is guessed, not scraped. So why are these requests for HTML pages made at all? If they are required, can you please add a caching decorator? That would be a 2 line change, which should speed things up by x37.
The text was updated successfully, but these errors were encountered:
I'm trying to download a few years of data for one small table. It takes about 1 minute to download each 100kB file, which is quite slow.
When I set the logging level to DEBUG, I see that this library is making 37 requests for each file that gets downloaded. Mostly to re-crawl the index pages.
e.g. when I try to download data for 2024, this library downloads page
/Data_Archive/Wholesale_Electricity/MMSDM/2009/
, twice.What's the purpose of this?
e.g. If I request data for Jan 2024, the library should only look in folder
http://www.nemweb.com.au/Data_Archive/Wholesale_Electricity/MMSDM/2024/MMSDM_2024_01/MMSDM_Historical_Data_SQLLoader/DATA/
. I don't see why it needs to look in any other folder. Isn't this file structure predictable?In fact, in the code on this line it seems the exact filename is guessed, not scraped. So why are these requests for HTML pages made at all? If they are required, can you please add a caching decorator? That would be a 2 line change, which should speed things up by x37.
The text was updated successfully, but these errors were encountered: