-
Notifications
You must be signed in to change notification settings - Fork 128
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Incomplete dates are not interpreted as ambiguous by augur filter #747
Comments
@corneliusroemer this is what is shown by
I don't see any reference of ambiguity so technically it is correct, but regardless I've created a PR to add support for this. Note that ambiguous dates still follow the format |
Thinking a bit more, this feature probably isn't very useful unless there is support for incomplete date strings (#662 (comment)). |
I'm just now looking at this issue in preparation for reviewing @victorlin's PR #756. The most recent version of Augur includes some more realistic metadata in the functional test suite, so a minimal way to confirm the issue here is with the following command: augur filter \
--metadata filter/metadata.tsv \
--min-date 2015 \
--max-date 2017-01-01 \
--output-strains filtered_strains.txt \
--output-log filtered_log.tsv The strain
A potential functional test in |
@huddlej the incomplete date on With current implementation, Thanks for the example! It helps better understand how this component is used. I'll look into adding some functional testing to my PR. |
@huddlej I implemented a fix here: victorlin@94efe60 Decided not to add this to the current PR since it's a pretty invasive change removing the need for |
Ah... when I was working on this earlier, I failed to realize:
Adding another example of the general issue on date format handing, this test fails when it shouldn't: victorlin@8b24536 |
I just ran into this problem, too, trying to build a tree for the country of Chad which has 9 sequences all of which have an ambiguous date of |
Currently, date format handling is inaccurate (#747) as numeric dates are thrown out: https://github.com/nextstrain/augur/blob/a85194c243db8d85e6fc06ea2d614e0b6095a0c4/augur/utils.py#L115-L119 This change ensures numeric dates are processed, and that non-negative integers are evaluated as year-only ambiguous dates. Also including a few refactors: - Remove `raise_error` parameter. The intent is unclear and tests still pass without it. - Use `return` instead of an intermediate variable. Testing: - Add broken tests and verify new changes pass. - Fix inaccurate existing tests.
Re-opening since the example of cat > metadata.tsv << ~~
strain date
SEQ1 2019
SEQ2 2019-04
SEQ3 2019-04-13
~~
augur filter \
--metadata metadata.tsv \
--min-date 2018 \
--output-strains filtered.txt
cat filtered.txt
# SEQ3
# SEQ1 |
Current Behavior
I was surprised to find that despite there being mention of ambiguous date handling in the documentation of
augur filter
, incomplete dates (e.g.2019
or2019-04
as opposed to2019-04-13
) are counter-intuitively not supported byaugur filter
. In a query withaugur filter --min-date 2018
samples with datestring2019
or2019-04
are currently excluded by augur, despite clearly satisfying the min-date condition.Rather than complaining that the dates are incomplete, the samples with incomplete dates are silently excluded.
Fludb provides incomplete dates, and so it would be very convenient, if they were accepted as such by
augur filter
.I tried adding
--exclude-ambiguous-dates-by year
as a workaround, but it change behaviour.This is related to #662
The text was updated successfully, but these errors were encountered: