-
Notifications
You must be signed in to change notification settings - Fork 122
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FYI: pandas can fail to read participants.tsv #474
Comments
That's a new one. Will take a look. |
I never deliver dated merchandise! ;) |
this might be related other fail:
that file content -- and I thought it must be the last row but seems to be kosher$> cat /mnt/btrfs/datasets/datalad/crawl/openneuro/ds001890/participants.tsv
participant_id session sex genotype Weight SpO2 HR Temperature DOB Experiment_Date Age
c1NT 1 M 3xTG 32.3 98 272 35.8 2016-11-22 2017-03-23 3
c1NT 2 M 3xTG 36.2 94 311 35.8 2016-11-22 2017-05-31 6
c2NT 1 M 3xTG 24.8 98 257 36.9 2016-11-29 2017-03-23 3
c2NT 2 M 3xTG 30.6 99 450 35.8 2016-11-29 2017-05-31 6
c2R1 1 M 3xTG 25 97 360 36.1 2016-11-29 2017-03-23 3
c4NT 1 M C57BL/6 31.5 99 250 36 2016-12-10 2017-03-29 3
c4NT 2 M C57BL/6 35 96 208 34.8 2016-12-10 2017-06-19 6
c5NT 1 M C57BL/6 30.1 99 427 35.2 2016-12-10 2017-03-29 3
c5NT 2 M C57BL/6 31.9 93 223 34.6 2016-12-03 2017-06-05 6
c6NT 1 M 3xTG 26.2 99 330 36.3 2016-12-19 2017-03-30 3
c6NT 2 M 3xTG 30.2 90 250 36 2016-12-19 2017-06-20 6
c7NT 1 M C57BL/6 28.7 36 350 35 2016-12-10 2017-04-06 3
c7NT 2 M C57BL/6 28.3 80 266 35.8 2016-12-10 2017-06-19 6
c8NT 1 M 3xTG 25 97 379 36.3 2017-01-20 2017-04-25 3
c8NT 2 M 3xTG 25 97 379 36.3 2017-01-20 2017-04-25 6
c9NT 1 M 3xTG 23.7 99 457 35 2017-01-20 2017-04-25 3
c11L 1 M 3xTG 30.3 97 370 36.6 2016-11-22 2017-03-23 3
c11L 2 M 3xTG 34.7 97 270 35.4 2016-11-22 2017-05-31 6
c21L 1 M 3xTG 24.2 93 254 36.1 2016-11-29 2017-03-23 3
c31L 1 M 3xTG 29.6 93 331 36.1 2016-12-17 2017-03-24 3
c31R 1 M 3xTG 28.2 99 350 35.8 2016-12-17 2017-03-24 3
c31R 2 M 3xTG 32.3 88 380 35.2 2016-12-17 2017-06-20 6
c32L 1 M 3xTG 27.4 96 306 35.8 2016-12-17 2017-03-30 3
c32L 2 M 3xTG 32 82 240 35.8 2016-12-17 2017-06-20 6
c32R 1 M 3xTG 28 94 253 35.8 2016-12-17 2017-03-30 3
c32R 2 M 3xTG 34 89 254 35.8 2016-12-17 2017-06-20 6
c41L 1 M C57BL/6 30.2 99 300 35.8 2016-12-10 2017-03-29 3
c41L 2 M C57BL/6 32 95 248 35.4 2016-12-10 2017-06-19 6
c41R 1 M C57BL/6 26.3 99 280 36 2016-12-10 2017-03-29 3
c41R 2 M C57BL/6 27.2 98 276 36.6 2016-12-10 2017-06-19 6
c42L 1 M C57BL/6 30.6 99 250 35 2016-12-10 2017-03-29 3
c42L 2 M C57BL/6 36 99 269 35 2016-12-10 2017-06-19 6
c42R 1 M C57BL/6 25.7 99 250 36 2016-12-10 2017-03-29 3
c42R 2 M C57BL/6 29 96 180 34.5 2016-12-10 2017-06-19 6
c51L 1 M C57BL/6 30.5 99 250 36.6 2016-12-10 2017-03-29 3
c51L 2 M C57BL/6 31.6 99 211 34.8 2016-12-03 2017-06-05 6
c61L 1 M 3xTG 27 99 340 36.6 2016-12-19 2017-03-30 3
c61L 2 M 3xTG 31 89 184 36.3 2016-12-19 2017-06-20 6
c61R 1 M 3xTG 26 96 326 36.3 2016-12-19 2017-03-30 3
c61R 2 M 3xTG 30 92 238 36.3 2016-12-19 2017-06-20 6
c62L 1 M 3xTG 25.8 99 335 35.4 2016-12-19 2017-03-30 3
c71L 1 M C57BL/6 30.1 98 298 35.5 2016-12-10 2017-04-06 3
c71L 2 M C57BL/6 36.5 97 233 35 2016-12-10 2017-06-19 6
c71R 1 M C57BL/6 31.4 95 416 35.8 2016-12-10 2017-04-06 3
c71R 2 M C57BL/6 31.6 99 169 34.8 2016-12-10 2017-06-19 6
c81L 1 M 3xTG 30.1 99 336 35.2 2017-01-20 2017-04-25 3
c81L 2 M 3xTG 30.1 99 336 35.2 2017-01-20 2017-04-25 6
c81R 1 M 3xTG 27.9 89 340 35.4 2017-01-20 2017-04-25 3
c81R 2 M 3xTG 27.9 89 340 35.4 2017-01-20 2017-04-25 6
c91L 1 M 3xTG 25.3 95 613 37 2017-01-20 2017-04-25 3
c91L 2 M 3xTG 25.3 95 613 37 2017-01-20 2017-04-25 6
c91R 1 M 3xTG 26 95 266 35.2 2017-01-20 2017-04-25 3
c3NT2 1 M 3xTG_hydrocephalus 25 98 260 36.3 2016-12-17 2017-03-30 3
$> grep hydro /mnt/btrfs/datasets/datalad/crawl/openneuro/ds001890/participants.tsv | sed -e 's,\t, --- ,g'
c3NT2 --- 1 --- M --- 3xTG_hydrocephalus --- 25 --- 98 --- 260 --- 36.3 --- 2016-12-17 --- 2017-03-30 --- 3 |
@yarikoptic For that last, the |
ah! thanks @effigies ! probably having a dedicated exception would be nice here as well with a message matching the one thrown by bids-validator for such a case (isn't bids-validator is run on those datasets upon upload and such issues shouldn't happen with datasets from openneuro?) |
By the standard, "Each participant needs to be described by one and only one row.", so this definitely should be caught by the validator, but it probably isn't. Which also means that this should be a I'll open a BIDS validator issue. |
Hmm. Looks like it should be an error already. Do you want to try running the validator? I'm on public wifi, so can't quickly download 9.7GB. |
yeap will do. And I did see that error from bids-validator appearing for some datasets , so the question was more either openneuro accepts datasets which do not pass bids-validator |
We shouldn't... |
ha -- bids-validator doesn't error out since it is "too smart", but this participants.tsv is not BIDS compliant -- it has multiple entries for the participants with multiple sessions -- one row per session:$> head participants.tsv
participant_id session sex genotype Weight SpO2 HR Temperature DOB Experiment_Date Age
c1NT 1 M 3xTG 32.3 98 272 35.8 2016-11-22 2017-03-23 3
c1NT 2 M 3xTG 36.2 94 311 35.8 2016-11-22 2017-05-31 6
c2NT 1 M 3xTG 24.8 98 257 36.9 2016-11-29 2017-03-23 3
c2NT 2 M 3xTG 30.6 99 450 35.8 2016-11-29 2017-05-31 6
c2R1 1 M 3xTG 25 97 360 36.1 2016-11-29 2017-03-23 3
c4NT 1 M C57BL/6 31.5 99 250 36 2016-12-10 2017-03-29 3
c4NT 2 M C57BL/6 35 96 208 34.8 2016-12-10 2017-06-19 6
c5NT 1 M C57BL/6 30.1 99 427 35.2 2016-12-10 2017-03-29 3
c5NT 2 M C57BL/6 31.9 93 223 34.6 2016-12-03 2017-06-05 6
...
$> awk '{print $1;}' participants.tsv | sort | uniq -c | nl
1 2 c11L
2 2 c1NT
3 1 c21L
4 2 c2NT
5 1 c2R1
6 1 c31L
7 2 c31R
8 2 c32L
9 2 c32R
10 1 c3NT2
11 2 c41L
12 2 c41R
13 2 c42L
14 2 c42R
15 2 c4NT
16 2 c51L
17 2 c5NT
18 2 c61L
19 2 c61R
20 1 c62L
21 2 c6NT
22 2 c71L
23 2 c71R
24 2 c7NT
25 2 c81L
26 2 c81R
27 2 c8NT
28 2 c91L
29 1 c91R
30 1 c9NT
31 1 participant_id
$> ls -ld sub-jgrADc* | nl | tail
21 drwx------ 1 yoh yoh 20 Aug 14 09:57 sub-jgrADc6NT/
22 drwx------ 1 yoh yoh 20 Aug 14 09:57 sub-jgrADc71L/
23 drwx------ 1 yoh yoh 20 Aug 14 09:57 sub-jgrADc71R/
24 drwx------ 1 yoh yoh 20 Aug 14 09:57 sub-jgrADc7NT/
25 drwx------ 1 yoh yoh 20 Aug 14 09:57 sub-jgrADc81L/
26 drwx------ 1 yoh yoh 20 Aug 14 09:57 sub-jgrADc81R/
27 drwx------ 1 yoh yoh 20 Aug 14 09:57 sub-jgrADc8NT/
28 drwx------ 1 yoh yoh 20 Aug 14 09:57 sub-jgrADc91L/
29 drwx------ 1 yoh yoh 10 Aug 14 09:57 sub-jgrADc91R/
30 drwx------ 1 yoh yoh 10 Aug 14 09:57 sub-jgrADc9NT/ So PyBIDS was kinda right I guess but exception is not informative |
Keep in mind that pybids doesn't call the JS bids-validator, it calls the (much more limited) Python |
Agreed that we don't call the validator, but when there are conditions that we can identify as only arising from invalid data, then raising an exception that basically says "Go run the validator for more details" could be useful. |
On further investigation, the problem here was that pybids didn't know how to handle nested metadata, so reading in Assuming it now works, I think I'll close this. The issue about mismatching rows is to my mind strictly a validator issue. |
I can confirm that current master (0.9.2-66-g6751eec AKA 0.9.3-48-g6751eec since 0.9.3 was not annotated) no longer blows, thanks! |
edit 1: additional sample ds001810
The text was updated successfully, but these errors were encountered: