Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make duplicate dropping optional #24

Closed
veenstrajelmer opened this issue Feb 29, 2024 · 0 comments · Fixed by #37
Closed

Make duplicate dropping optional #24

veenstrajelmer opened this issue Feb 29, 2024 · 0 comments · Fixed by #37

Comments

@veenstrajelmer
Copy link
Collaborator

veenstrajelmer commented Feb 29, 2024

  • ddlpy version: master
  • Python version: 3.11
  • Operating System: Windows

Description

ddlpy drops duplicate measurements. Would be good to make this optional for data-inspection purposes.

What I Did

Duplicate values for WALSOD 2010 (and others):

import pandas as pd
import requests

url_ddl = 'https://waterwebservices.rijkswaterstaat.nl/ONLINEWAARNEMINGENSERVICES_DBO/OphalenWaarnemingen'
request_ddl = {'AquoPlusWaarnemingMetadata': 
               {'AquoMetadata': 
                {'Grootheid': {'Code': 'WATHTE'}, 'Groepering': {'Code': 'NVT'}, 
                 'Hoedanigheid': {'Code': 'NAP'}, 'MeetApparaat': {'Code': '127'}}
                }, 
                'Locatie': {'Locatie_MessageID': 10716, 'X': 571389.152745295, 'Y': 5694632.62008149, 'Naam': 'Walsoorden', 'Code': 'WALSODN'}, 
                'Periode': {'Begindatumtijd': '2010-01-01T00:00:00.000+00:00', 'Einddatumtijd': '2010-01-01T00:10:00.000+00:00'}}
    
resp = requests.post(url_ddl, json=request_ddl)
if not resp.ok:
    raise Exception('%s for %s: %s'%(resp.reason, resp.url, str(resp.text)))
result = resp.json()
if not result['Succesvol']:
    raise Exception('query not succesful, Foutmelding: %s from %s'%(result['Foutmelding'],url_ddl))

result_pd = pd.json_normalize(result['WaarnemingenLijst'][0]["MetingenLijst"])
print(result_pd[["Tijdstip","Meetwaarde.Waarde_Numeriek"]]) # 3 duplicate times

Gives (everything duplicated three times):

                        Tijdstip  Meetwaarde.Waarde_Numeriek
0  2010-01-01T01:00:00.000+01:00                        63.0
1  2010-01-01T01:00:00.000+01:00                        63.0
2  2010-01-01T01:00:00.000+01:00                        63.0
3  2010-01-01T01:10:00.000+01:00                        83.0
4  2010-01-01T01:10:00.000+01:00                        83.0
5  2010-01-01T01:10:00.000+01:00                        83.0

When retreiving NAP/127/WALSODN 2010 we get >157681 waarnemingen, with ddlpy we get 52562 values due to duplicate dropping. This is nice, but good to make it optional. Adjust measurements = measurements.drop_duplicates() in ddlpy.py

Also for NORTHCMRT, but unknown period

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant