monza:~/tmp$ time ./sambamba view /gnu/data/HG00100.chrom20.ILLUMINA.bwa.GBR.low_coverage.20130415.bam.orig > /dev/null sambamba 0.6.8-pre3 by Artem Tarasov and Pjotr Prins (C) 2012-2018 LDC 1.11.0 / DMD v2.081.2 / LLVM6.0.1 / bootstrap LDC - the LLVM D compiler (0.17.6) real 0m6.930s user 0m26.940s sys 0m0.516s sambamba 0.6.8-pre2 by Artem Tarasov and Pjotr Prins (C) 2012-2018 LDC 1.10.0 / DMD v2.080.1 / LLVM6.0.1 / bootstrap LDC - the LLVM D compiler (0.17.4) real 0m6.854s user 0m26.456s sys 0m0.584s linux-vdso.so.1 (0x00007ffd227fc000) librt.so.1 => /gnu/store/n6nvxlk2j8ysffjh3jphn1k5silnakh6-glibc-2.25/lib/librt.so.1 (0x00007f5d31082000) libpthread.so.0 => /gnu/store/n6nvxlk2j8ysffjh3jphn1k5silnakh6-glibc-2.25/lib/libpthread.so.0 (0x00007f5d30e64000) libm.so.6 => /gnu/store/n6nvxlk2j8ysffjh3jphn1k5silnakh6-glibc-2.25/lib/libm.so.6 (0x00007f5d30b52000) libdl.so.2 => /gnu/store/n6nvxlk2j8ysffjh3jphn1k5silnakh6-glibc-2.25/lib/libdl.so.2 (0x00007f5d3094e000) libgcc_s.so.1 => /gnu/store/h3z6nshhdlc8zgh4mi13x1br03xipi9r-gcc-7.2.0-lib/lib/libgcc_s.so.1 (0x00007f5d30737000) libc.so.6 => /gnu/store/n6nvxlk2j8ysffjh3jphn1k5silnakh6-glibc-2.25/lib/libc.so.6 (0x00007f5d30398000) /gnu/store/n6nvxlk2j8ysffjh3jphn1k5silnakh6-glibc-2.25/lib/ld-linux-x86-64.so.2 (0x00007f5d3128a000)
time ./build/sambamba view /gnu/data/HG00100.chrom20.ILLUMINA.bwa.GBR.low_coverage.20130415.bam.orig > /dev/null
sambamba 0.6.8 by Artem Tarasov and Pjotr Prins (C) 2012-2018 LDC 1.10.0 / DMD v2.080.1 / LLVM6.0.1 / bootstrap LDC - the LLVM D compiler (0.17.4) real 0m2.869s user 0m21.972s sys 0m0.356s
This version was built with: LDC 1.1.1 using DMD v2.071.2 using LLVM 3.9.1 bootstrapped with LDC - the LLVM D compiler (1.1.1) real 0m3.150s user 0m24.668s sys 0m0.320s This version was built with: LDC 1.7.0 using DMD v2.077.1 using LLVM 5.0.1 bootstrapped with LDC - the LLVM D compiler (1.7.0) real 0m2.869s user 0m22.344s sys 0m0.344s
time ./sambamba_v0.6.6 sort -m 20615843020 -N -o /dev/null ENCFF696RLQ.bam -p
sambamba 0.6.8 by Artem Tarasov and Pjotr Prins (C) 2012-2018 LDC 1.10.0 / DMD v2.080.1 / LLVM6.0.1 / bootstrap LDC - the LLVM D compiler (0.17.4) real 7m50.558s user 89m10.808s sys 2m57.188s
and with 120GB RAM
sambamba 0.6.8 by Artem Tarasov and Pjotr Prins (C) 2012-2018 LDC 1.10.0 / DMD v2.080.1 / LLVM6.0.1 / bootstrap LDC - the LLVM D compiler (0.17.4) real 3m49.953s user 81m16.956s sys 1m58.332s
Wed Feb 7 03:43:14 CST 2018 sambamba 0.6.8-pre1 This version was built with: LDC 1.7.0 using DMD v2.077.1 using LLVM 5.0.1 bootstrapped with LDC - the LLVM D compiler (1.7.0) real 8m0.528s user 88m44.084s sys 2m45.888s
When sambamba is given enough RAM to hold everything in memory sambamba is twice as fast (apparently half the time goes to intermediate IO)
time ./sambamba sort -N -o /dev/null ENCFF696RLQ.bam -p -m 120G
real 3m46.856s user 81m44.524s sys 1m56.388s
with 64GB it is
real 5m36.062s user 88m43.176s sys 3m0.536s
and with 32GB it is
real 7m22.125s user 89m6.188s sys 2m51.228s
This version was built with: LDC 1.7.0 using DMD v2.077.1 using LLVM 5.0.1 bootstrapped with LDC - the LLVM D compiler (1.7.0) real 18m15.809s user 158m30.148s sys 3m15.932s
Ouch! A regression in the shipped release 0.6.7.
This version was built with: LDC 1.1.1 using DMD v2.071.2 using LLVM 3.9.1 bootstrapped with LDC - the LLVM D compiler (1.1.1) ldc2 -wi -I. -IBioD -IundeaD/src -g -O3 -release -enable-inlining -boundscheck=off real 18m40.223s user 159m34.292s sys 3m19.300s
So, the same build is 2x slower than the previous version.
This version was built with: LDC 1.1.1 using DMD v2.071.2 using LLVM 3.9.1 bootstrapped with LDC - the LLVM D compiler (1.1.1) Using ldmd2 @sambamba-ldmd-release.rsp "-g" "-O2" "-c" "-m64" "-release" "-IBioD/" "-IundeaD/src/" "-ofbuild/sambamba.o" "-odbuild" "-I." gcc -Wl,--gc-sections -o build/sambamba build/sambamba.o -Lhtslib -Llz4/lib -Wl,-Bstatic -lhts -llz4 -Wl,-Bdynamic /home/wrk/opt/ldc2-1.1.1-linux-x86_64/lib/libphobos2-ldc.a /home/wrk/opt/ldc2-1.1.1-linux-x86_64/lib/libdruntime-ldc.a -lrt -lpthread -lm real 9m9.465s user 97m56.204s sys 2m50.512s
Updated the makefile to build with -singleobj. Now LLVM kicks in!
This version was built with: LDC 1.7.0 using DMD v2.077.1 using LLVM 5.0.1 bootstrapped with LDC - the LLVM D compiler (1.7.0) real 8m1.978s user 89m13.936s sys 2m47.392s
Next I tried adding profile guided optimization. But that turned out to be slower
This version was built with: LDC 1.7.0 using DMD v2.077.1 using LLVM 5.0.1 bootstrapped with LDC - the LLVM D compiler (1.7.0) real 11m16.267s user 116m15.556s sys 2m56.244s
So, the release is reverted an after a version bump:
This version was built with: LDC 0.17.1 using DMD v2.068.2 using LLVM 3.8.0 bootstrapped with version not available real 10m0.932s user 151m39.172s sys 3m7.596s This version was built with: LDC 1.1.1 using DMD v2.071.2 using LLVM 3.9.1 bootstrapped with LDC - the LLVM D compiler (1.1.1) real 9m22.501s user 98m24.748s sys 2m51.996s
Note, updating compiler shows a speed gain for 0.6.6.
sambamba 0.6.8 by Artem Tarasov and Pjotr Prins (C) 2012-2018 LDC 1.10.0 / DMD v2.080.1 / LLVM6.0.1 / bootstrap LDC - the LLVM D compiler (0.17.4) finding positions of the duplicate reads in the file... sorted 11286293 end pairs and 156042 single ends (among them 0 unmatched pairs) collecting indices of duplicate reads... done in 1325 ms found 6603388 duplicates collected list of positions in 0 min 16 sec marking duplicates... collected list of positions in 1 min 2 sec Command being timed: "./bin/sambamba markdup /gnu/data/in_raw.sorted.bam /gnu/data/in_raw.sorted.bam t2.bam" User time (seconds): 406.49 System time (seconds): 3.86 Percent of CPU this job got: 649% Elapsed (wall clock) time (h:mm:ss or m:ss): 1:03.13 Average shared text size (kbytes): 0 Average unshared data size (kbytes): 0 Average stack size (kbytes): 0 Average total size (kbytes): 0 Maximum resident set size (kbytes): 1709720 Average resident set size (kbytes): 0 Major (requiring I/O) page faults: 0 Minor (reclaiming a frame) page faults: 1140382 Voluntary context switches: 393213 Involuntary context switches: 8993 Swaps: 0 File system inputs: 0 File system outputs: 2663824 Socket messages sent: 0 Socket messages received: 0 Signals delivered: 0 Page size (bytes): 4096 Exit status: 0
Uses slightly more memory but is faster than
/usr/bin/time --verbose sambamba markdup /gnu/data/in_raw.sorted.bam /gnu/data/in_raw.sorted.bam t2.bam finding positions of the duplicate reads in the file... sorted 11286293 end pairs and 156042 single ends (among them 0 unmatched pairs) collecting indices of duplicate reads... done in 1521 ms found 6603388 duplicates collected list of positions in 0 min 16 sec marking duplicates... total time elapsed: 1 min 4 sec Command being timed: "sambamba markdup /gnu/data/in_raw.sorted.bam /gnu/data/in_raw.sorted.bam t2.bam" User time (seconds): 423.78 System time (seconds): 4.47 Percent of CPU this job got: 666% Elapsed (wall clock) time (h:mm:ss or m:ss): 1:04.24 Average shared text size (kbytes): 0 Average unshared data size (kbytes): 0 Average stack size (kbytes): 0 Average total size (kbytes): 0 Maximum resident set size (kbytes): 1542764 Average resident set size (kbytes): 0 Major (requiring I/O) page faults: 0 Minor (reclaiming a frame) page faults: 1839470 Voluntary context switches: 368082 Involuntary context switches: 8537 Swaps: 0 File system inputs: 0 File system outputs: 2643840 Socket messages sent: 0 Socket messages received: 0 Signals delivered: 0 Page size (bytes): 4096 Exit status: 0