Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Apply seqtk PR to improve kseq.h parsing performance #1674

Merged
merged 2 commits into from
Sep 27, 2023

Conversation

jmarshall
Copy link
Member

@jmarshall jmarshall commented Sep 21, 2023

Port of @kloetzl's lh3/seqtk#123 and attractivechaos/klib#173 to HTSlib's copy of kseq.h.

Per the original PR: “Doubles throughput; can now parse FASTA at 2GB/s”.

For one large data file on my machine, this reduces klib's kseq_bench2's kstream time from ~1.65s to ~0.92s.

(I have not heavily tested it otherwise.)

As noted here, HTSlib doesn't use kseq itself though a few samtools commands do use it to read a few small BED files etc.

@jkbonfield jkbonfield merged commit b65bee5 into samtools:develop Sep 27, 2023
8 checks passed
@jkbonfield
Copy link
Contributor

Thanks both. A nice simple but effective change.

Sad face over kstream using int everywhere, giving a 2GB limit on fasta sequence length, but that's a totally different PR. :/

@jmarshall jmarshall deleted the kseq-memchr branch September 28, 2023 07:24
jmarshall pushed a commit to pysam-developers/pysam that referenced this pull request Sep 28, 2023
Apply the main part of PR samtools/htslib#1674. Pysam does use kseq itself,
so it is worth applying this now without waiting for the next HTSlib release
to be imported.

Co-Authored-By: Fabian Klötzl <fabian@kloetzl.info>
@kloetzl
Copy link
Contributor

kloetzl commented Oct 2, 2023

This is missing a cast to support C++ with its stricter typing rules. lh3/seqtk@7fe58c8

jkbonfield added a commit to jkbonfield/htslib that referenced this pull request Oct 3, 2023
Klist.h and kseq.h use calloc, realloc and memchr in static inline
code, but if we wish to permit our external headers to be used from a
C++ include then these all need explicit casts as "void *" doesn't
work the same in C++ as in C.

It's tempting to use `extern "C"` in an `#ifdef __cplusplus` guard,
but the nature of these pseudo-template klib headers is such that the
code will be expanded up later inside a C++ file so the extern "C"
doesn't solve it.

See samtools#1674 and samtools#1682
jkbonfield added a commit to jkbonfield/htslib that referenced this pull request Oct 4, 2023
Klist.h and kseq.h use calloc, realloc and memchr in static inline
code, but if we wish to permit our external headers to be used from a
C++ include then these all need explicit casts as "void *" doesn't
work the same in C++ as in C.

It's tempting to use `extern "C"` in an `#ifdef __cplusplus` guard,
but the nature of these pseudo-template klib headers is such that the
code will be expanded up later inside a C++ file so the extern "C"
doesn't solve it.

See samtools#1674 and samtools#1682
daviesrob pushed a commit that referenced this pull request Oct 6, 2023
Klist.h and kseq.h use calloc, realloc and memchr in static inline
code, but if we wish to permit our external headers to be used from a
C++ include then these all need explicit casts as "void *" doesn't
work the same in C++ as in C.

It's tempting to use `extern "C"` in an `#ifdef __cplusplus` guard,
but the nature of these pseudo-template klib headers is such that the
code will be expanded up later inside a C++ file so the extern "C"
doesn't solve it.

See #1674 and #1682
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants