Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cutadapt cannot read from stdin with xopen 2.0.0 #774

Closed
peterjc opened this issue Mar 27, 2024 · 3 comments
Closed

cutadapt cannot read from stdin with xopen 2.0.0 #774

peterjc opened this issue Mar 27, 2024 · 3 comments

Comments

@peterjc
Copy link
Contributor

peterjc commented Mar 27, 2024

Working with xopen 1.9.0 (and older), running here on macOS:

$ python --version
Python 3.10.12
$ cutadapt --version
4.7
$ python -c "import xopen; print(xopen.__version__)"
1.9.0
$ python -c "import dnaio; print(dnaio.__version__)"
1.2.0

Using sample file tests/ncbi-import/multiple_hmm.fasta with this command which outputs 5 of FASTA entries via stdin:

$ cat tests/ncbi-import/multiple_hmm.fasta | cutadapt -a GYRGGGACGAAAGTCYYTGC /dev/stdin | grep -c "^>"
This is cutadapt 4.7 with Python 3.10.12
Command line parameters: -a GYRGGGACGAAAGTCYYTGC /dev/stdin
Processing single-end reads on 1 core ...
Done           00:00:00             5 reads @ 846.8 µs/read;   0.07 M reads/minute
Finished in 0.006 s (1144.028 µs/read; 0.05 M reads/minute).

=== Summary ===

Total reads processed:                       5
Reads with adapters:                         4 (80.0%)
Reads written (passing filters):             5 (100.0%)

Total basepairs processed:         4,631 bp
Total written (filtered):          1,332 bp (28.8%)

=== Adapter 1 ===

Sequence: GYRGGGACGAAAGTCYYTGC; Type: regular 3'; Length: 20; Trimmed: 4 times

Minimum overlap: 3
No. of allowed errors:
1-9 bp: 0; 10-19 bp: 1; 20 bp: 2

Bases preceding removed adapters:
  A: 0.0%
  C: 0.0%
  G: 0.0%
  T: 100.0%
  none/other: 0.0%

Overview of removed sequences
length	count	expect	max.err	error counts
370	1	0.0	2	1
616	1	0.0	2	0 1
932	1	0.0	2	1
1381	1	0.0	2	1
5

Broken when update to xopen 2.0.0 (released 2024-03-26 https://pypi.org/project/xopen/#history - yesterday):

$ cutadapt --version
4.7
$ python -c "import dnaio; print(dnaio.__version__)"
1.2.0
$ python -c "import xopen; print(xopen.__version__)"
2.0.0
$ cat tests/ncbi-import/multiple_hmm.fasta | cutadapt -a GYRGGGACGAAAGTCYYTGC - | grep -c "^>"
This is cutadapt 4.7 with Python 3.10.12
Command line parameters: -a GYRGGGACGAAAGTCYYTGC -
Processing single-end reads on 1 core ...

No reads processed!
0

Also using /dev/stdin is broken:

$ cat tests/ncbi-import/multiple_hmm.fasta | cutadapt -a GYRGGGACGAAAGTCYYTGC /dev/stdin | grep -c "^>"
This is cutadapt 4.7 with Python 3.10.12
Command line parameters: -a GYRGGGACGAAAGTCYYTGC /dev/stdin
Processing single-end reads on 1 core ...

No reads processed!
0

This might be related to #772, but the timing doesn't fit with xopen 2.0.0 being released yesterday.

peterjc added a commit to peterjc/thapbi-pict that referenced this issue Mar 27, 2024
peterjc added a commit to peterjc/thapbi-pict that referenced this issue Mar 27, 2024
@rhpvorderman
Copy link
Collaborator

Something is broken indeed:

(xopen) rhpvorderman@tuxminator:~/PycharmProjects/xopen$ pip list | grep xopen
xopen              2.0.0
(xopen) rhpvorderman@tuxminator:~/PycharmProjects/xopen$ wc -l ~/test/5millionreads_R1.fastq
20000000 /home/rhpvorderman/test/5millionreads_R1.fastq
(xopen) rhpvorderman@tuxminator:~/PycharmProjects/xopen$ cat ~/test/5millionreads_R1.fastq | python -c 'import xopen; f=xopen.xopen("/dev/stdin", "rt"); print(f.read())' | wc -l
19999956
(xopen) rhpvorderman@tuxminator:~/PycharmProjects/xopen$ cat ~/test/5millionreads_R1.fastq | python -c 'import xopen; f=xopen.xopen("-", "rt"); print(f.read())' | wc -l
19999956
(xopen) rhpvorderman@tuxminator:~/PycharmProjects/xopen$ pip install xopen==1.9.0 >/dev/null
(xopen) rhpvorderman@tuxminator:~/PycharmProjects/xopen$ cat ~/test/5millionreads_R1.fastq | python -c 'import xopen; f=xopen.xopen("-", "rt"); print(f.read())' | wc -l
20000001
(xopen) rhpvorderman@tuxminator:~/PycharmProjects/xopen$ cat ~/test/5millionreads_R1.fastq | python -c 'import xopen; f=xopen.xopen("/dev/stdin", "rt"); print(f.read())' | wc -l
20000001
(xopen) rhpvorderman@tuxminator:~/PycharmProjects/xopen$ 

Xopen 1.9.0 performs as it should (the extra newline is added by print)

xopen2.0.0 misses some data? Which is really weird as all the xopen tests pass. I will see if I can fix this issue.

@rhpvorderman
Copy link
Collaborator

I will yank the 2.0.0 release. This is quite serious. Ping @marcelm

@peterjc
Copy link
Contributor Author

peterjc commented Mar 27, 2024

Ah - I hadn't taken the next step of seeing if this was an xopen bug vs cutadapt needing a tweak for an xopen change.

Let's close this and focus on pycompression/xopen#157

I've excluded v2.0.0 on my development branch so my CI works, but that is only a stopgap:

peterjc/thapbi-pict@5b7466d

Yanking the xopen 2.0.0 release seems prudent, thanks - this is more serious that just a cutadapt issue as I first assumed 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants