-
Notifications
You must be signed in to change notification settings - Fork 14
Theme 2 Scenario 2A, Update usgs notebook #177
Conversation
#file_name_list.append(processTextFile(url,html_link,'wl')) | ||
wl_count+=1 | ||
print "num water level:",wl_count | ||
if (os.path.isdir("./data_files/")): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@birdage This will break on windows machines. Use os.path.join
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ocefpaf want me to correct it before merge?
IMHO it is OK for the merge @birdage, but I do not use windows 😁 Maybe we can open an issue after the merge to remind that the paths need to be fixed. |
@ocefpaf sounds like a good idea, thanks! |
Almost done reviewing. I will merge as soon as I finish running it here. |
for file_name in files: | ||
print count,file_name | ||
display(pb) | ||
for fi,file_name in enumerate(files): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This loop is too complex and easily breakable (in fact I am having trouble to run it right now). Maybe an abstraction to read the metadata is in hand.
Also the actual data and dates can be easily read with pandas and stored in the dictionary instead of the numpy array in the next cell.
from pandas import read_csv
full_data = {}
def parse_metadata(fname):
meta_data = {}
fields = {'Sensor location latitude': 'lat',
'Sensor location longitude': 'lon',
'Site id =': 'name',
'Sensor elevation above NAVD 88 =': 'elevation',
'Barometric sensor site (source of bp) =': 'bp_source',
'Lowest recordable water elevation is': 'lowest_wl'}
with open(os.path.join('data_files', fname)) as f:
content = f.readlines()
for k, ln in enumerate(content):
content[k] = ln.strip()
if content[k].startswith('#'):
for field in fields:
if field in content[k]:
if fields[field] == 'name':
meta_data[fields[field]] = content[k].split(field)[-1]
else:
val = (content[k].split(field)[-1])
meta_data[fields[field]] = float(non_decimal.sub('', val))
if fields[field] == 'lon':
meta_data[fields[field]] = -meta_data[fields[field]]
return meta_data
for count, fname in enumerate(files):
print('{}: {} of {} '.format(fname, count+1, len(files)))
meta_data = parse_metadata(fname)
kw = dict(parse_dates=True, sep='\t', skiprows=29, index_col=0)
actual_data = read_csv(os.path.join('data_files', fname), **kw)
full_data[fname] = {'meta': meta_data,
'data': actual_data} Done reviewing. Over to you @birdage. |
@ocefpaf regarding comments
|
I would go away from %pylab, but that is just me...
OK. I missed the header info. That is what I would like to see, a link to the original data. Thanks!
I will prepare a PR to your branch tomorrow and lets take a look at the results then. |
@ocefpaf Thanks for the comments, well sort it tomorrow. |
Hi John,
|
@ocefpaf thats for the updates, the processing code runs a bit slower but i think the refactor makes it easier to read. ive made some simple changes, will rebase and commit. |
@ocefpaf looks like were ready to merge in, could you do it if possible as i dont want to self merge |
May I ask one last thing? Can you restart the kernel and re-run the notebook so we have a meaningful view at nbviewer: |
@ocefpaf once the travis has run i think were good. |
Theme 2 Scenario 2A, Update usgs notebook
Nice! |
@Bobfrat lets keep this open, and i will be pushing changes to it. will note when complete