Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert the submodule star-sys/STAR to a git subtree #29

Merged
merged 594 commits into from
Aug 14, 2020
Merged
Show file tree
Hide file tree
Changes from 250 commits
Commits
Show all changes
594 commits
Select commit Hold shift + click to select a range
72ae23d
Passes chr22 TestSuite tests.
alexdobin Dec 1, 2017
b294afa
Merged master changes in to var.
alexdobin Dec 4, 2017
3e199f0
Implemented --readFilesPrefix option for specifying prefix (e.g. dire…
alexdobin Dec 7, 2017
c150910
Merge pull request #334 from drpowell/master
alexdobin Dec 7, 2017
6986377
Merged master changes (RG for chimeric junction output) into var.
alexdobin Dec 18, 2017
4e21924
Fixed a bug in chimeric detection code which sometimes led to uniniti…
alexdobin Dec 21, 2017
815e922
Implementing WASP algorithm.
alexdobin Dec 21, 2017
a3a4a8e
Implementing WASP algorithm.
alexdobin Dec 21, 2017
4995412
Merged chimeric bug-fix from master. WASP works for single variant pe…
alexdobin Dec 22, 2017
498bdbf
Implementing WASP algorithm.
alexdobin Dec 22, 2017
04c1f11
Implementing WASP algorithm.
alexdobin Dec 28, 2017
210deae
Added variant allele=4 for variant read base=N. Added various waspTyp…
alexdobin Jan 1, 2018
7316edc
Initial test release of WASP filtering.
alexdobin Jan 3, 2018
dd7566e
For BAM to signal conversion, process alignments without NH tags as u…
alexdobin Jan 5, 2018
bbd1139
Implemented --outBAMsortingBinsN option to control the number of sort…
alexdobin Jan 9, 2018
c889d3f
Fixed vW:i:2 type to mark multimappers that overlap variants. Recompi…
alexdobin Jan 17, 2018
3f51800
2.5.4a
alexdobin Jan 23, 2018
ea90fe0
Fixed a problem with non-default --sjdbOverhang genome generation.
alexdobin Jan 25, 2018
d7a903b
Merged 2.5.4a master into var.
alexdobin Jan 25, 2018
de4bffb
spelling fixes
satta Jan 26, 2018
30a79fd
Implemented merging and re-mapping for overlapping mates. To control …
alexdobin Jan 29, 2018
9c6df13
2.5.4b
alexdobin Feb 9, 2018
35e5360
Fixed a problem in the chimeric detection algorithm for overlapping P…
alexdobin Feb 25, 2018
ba0325e
Fixed some issues with the overlapping mates algorithm.
alexdobin Mar 13, 2018
920bbc5
Fixed some issues with the overlapping mates algorithm.
alexdobin Mar 13, 2018
347e063
Fixed some issues with the overlapping mates algorithm.
alexdobin Mar 13, 2018
048d9e7
Fixed a bug in the overlapping mates algorithm.
alexdobin Mar 19, 2018
a26ce03
Fixed a bug in the overlapping mates algorithm related to protruding …
alexdobin Mar 22, 2018
dfad0be
Fixed a bug in ReadAlign_stitchPieces.cpp.
alexdobin Mar 26, 2018
364456b
Fixed bugs in the algorithm that finds the best alignment.
alexdobin Mar 26, 2018
8b5775d
Fixed problems with merged PE chimeric output.
alexdobin Mar 30, 2018
b2b565d
Redefined TLEN in SAM output for overlapping mates, = distance betwee…
alexdobin Apr 6, 2018
3414de2
Fixed a problem with read name output in Chimeric.out.junction.
alexdobin Apr 6, 2018
8bd8798
Implemented --outSAMtlen options to specify old (1) or new (2) TLEN d…
alexdobin Apr 10, 2018
78e6740
Merge branch 'spelling' of https://github.com/satta/STAR into satta-s…
alexdobin Apr 10, 2018
9a31533
Implemented flushing of insertions to the right. This will prevent sp…
alexdobin Apr 10, 2018
d41a193
Fixed --outSAMtlen option.
alexdobin Apr 10, 2018
bddc60a
Cleaned up compilation warnings.
alexdobin Apr 11, 2018
8d2672c
Merged master into var. Preparing for 2.6.0
alexdobin Apr 11, 2018
4816bc9
--chimScoreJunctionNonGTAG is now added to chimeric score for thresho…
alexdobin Apr 20, 2018
18acf2c
WASP filtering switched off for the 1st pass of the 2-pass mapping.
alexdobin Apr 23, 2018
969af45
Recompiled Linux executables.
alexdobin Apr 23, 2018
cc05852
Manual, README, RELEASEnotes, CHANGES
alexdobin Apr 23, 2018
db557a2
Fixed a bug that accidentally turns on --peOverlap* option and causes…
alexdobin Apr 26, 2018
e89c1ef
Fixed a few bugs detected by valgrind.
alexdobin May 2, 2018
25d0e24
2.6.0b
alexdobin May 2, 2018
5ff5585
Fixed another bug in the peOverlap algorithm.
alexdobin May 3, 2018
811e5ab
Fixed the problem with Ns in the overlap region for the peOverlap alg…
alexdobin May 7, 2018
2c09273
Fixed problems with WASP filtering.
alexdobin May 10, 2018
2f632bd
Fixed valgrind error.
alexdobin Jun 14, 2018
c6c2cd9
Process substitution can now be used with zipped VCF files, e.g. --va…
alexdobin Jun 14, 2018
5cd05b3
Fixed the problem with alignment scoring with peOverlap option which …
alexdobin Jul 13, 2018
de24477
Fixed another problem with alignment scoring with peOverlap option wh…
alexdobin Jul 20, 2018
a591529
Fixed a bug with multiple RG lines when inputting reads in SAM format…
alexdobin Aug 9, 2018
c29d310
Fixed the problem with control characters (ASCII<32) in genome and in…
alexdobin Aug 12, 2018
63d626f
Implemented --chimOutJunctionFormat 1 option to output some metadata …
alexdobin Aug 13, 2018
f92f55f
Fixed a bug that caused serious problems with --sjdbInsertSave All op…
alexdobin Aug 14, 2018
2b4f1e3
Ready for 2.6.1a
alexdobin Aug 14, 2018
23dac25
fix spelling issues
satta Aug 18, 2018
15b8d40
Fixed a problem with --outSAMfilter KeepOnlyAddedReferences option. …
alexdobin Aug 27, 2018
32c6d6b
Preparing for 2.6.1b
alexdobin Sep 6, 2018
df39b26
Enforced the consistent choice of supplementary chimeric alignments f…
alexdobin Oct 1, 2018
9286e15
Changed back internal binary encoding for N in the genome to 4.
alexdobin Oct 3, 2018
106aa1b
Implementing single cell pipeline.
alexdobin Oct 3, 2018
2c5c5bd
Merge branch 'master' into solo
alexdobin Oct 3, 2018
808b341
Implementing solo.
alexdobin Oct 5, 2018
20e31cf
Preparing for 2.6.1c
alexdobin Oct 17, 2018
3b6743b
Solo: CB matching to whitelist.
alexdobin Oct 22, 2018
a9ec276
Binary search for CB matches.
alexdobin Oct 27, 2018
4c74ada
Binary search for CB matches.
alexdobin Oct 27, 2018
d5b601d
Implementing gene from transcript alignment
Oct 29, 2018
f8767ad
Implementing gene from transcript alignment 2
Oct 29, 2018
2fa56ca
Fixed the bug causing inconsistent output for mate1/2 in the Unmapped…
alexdobin Oct 30, 2018
a942b94
Solo: implemented gene from transcripts.
Oct 30, 2018
16b1149
Solo: implementing post-map aggregation.
Oct 30, 2018
fc1f88a
Implementing UMI collapsing.
alexdobin Nov 3, 2018
b61071e
Fixed the problem causing BAM sorting error with large number of thre…
alexdobin Nov 5, 2018
439bb08
Implemented exact UMI collapsing.
alexdobin Nov 5, 2018
e575454
Implementing UMI collapsing with 1MM
alexdobin Nov 6, 2018
1951f8d
Implemented UMI collapsing with 1MM
alexdobin Nov 7, 2018
1d93208
Fixed the non-thread safe error/exit (github.com/alexdobin/STAR/issue…
alexdobin Nov 16, 2018
ef9f7c6
Merge pull request #475 from satta/spelling
alexdobin Nov 16, 2018
04fbf78
Ready for 2.6.1d
alexdobin Nov 16, 2018
d267c4c
Merged 2.6.1d master into solo.
alexdobin Nov 16, 2018
5057327
Fixing bugs in solo.
alexdobin Dec 15, 2018
c96cd26
Allowed Ns in CBs in accordance with CellRanger pipeline.
alexdobin Dec 17, 2018
ee07ea5
Fixing bugs in solo, comparing with CellRanger. Implemented CR/CY/UR/…
alexdobin Dec 21, 2018
af331ec
Fixing bugs in solo, comparing with CellRanger. Last commit before sw…
alexdobin Dec 21, 2018
3f11605
Fixing bugs in solo, comparing with CellRanger. Last commit before sw…
alexdobin Dec 21, 2018
eafef6f
Switching to graph collapsing for UMIs. Last commit with the same res…
alexdobin Dec 21, 2018
dbaf3cf
Implemented 1MM UMI collapsing via graph connected components. Tests …
alexdobin Dec 22, 2018
6eb35e3
Changed solo file names.
alexdobin Dec 22, 2018
21f64cd
Streamlined solo stats
alexdobin Dec 23, 2018
6b17725
Finalized solo stats.
alexdobin Dec 23, 2018
8b464d0
solo: implementing SJ output.
alexdobin Dec 25, 2018
1770c84
Solo: implemented featureType, tests OK for Gene
alexdobin Dec 25, 2018
d4799f4
Solo: implementing SJ output
alexdobin Dec 25, 2018
f159747
Solo: finished SJ output
alexdobin Dec 25, 2018
286d802
Solo: SJ output works
alexdobin Dec 25, 2018
5075cbf
Solo: SJ output finalized.
alexdobin Dec 25, 2018
9c9a6c5
Added gene name and biotype output, and gene output for splice juncti…
alexdobin Dec 29, 2018
90fd93d
Solo: changing SJ output logic to better match gene output.
alexdobin Dec 30, 2018
b94e562
Decoupling CB/UMI matching and feature output.
alexdobin Dec 30, 2018
b2c26fd
Redesigning Solo.
alexdobin Dec 30, 2018
6693a34
Redesigning Solo.
alexdobin Dec 30, 2018
4f3f068
Redesigning Solo.
alexdobin Dec 30, 2018
6cc4f93
Finalized new Solo design, Gene output tests OK.
alexdobin Dec 31, 2018
d7e97fa
Solo: changed SJ output to better match Gene output
alexdobin Dec 31, 2018
b42dfe1
Preliminary STARsolo release.
alexdobin Jan 9, 2019
63b9eb5
Default transcriptome conversion options for STARsolo.
alexdobin Jan 15, 2019
76a0105
Implemented --umiDedup option to specify dedup types.
alexdobin Jan 17, 2019
5ba4e06
Fixed problems with the previous commit. Solo tests passed.
alexdobin Jan 17, 2019
714ae0c
Final changes for solo* parameters.
alexdobin Jan 22, 2019
88e3cbe
Merged solo into master. Getting ready for 2.7.0a.
alexdobin Jan 23, 2019
d45b1c5
Ready for 2.7.0a
alexdobin Jan 23, 2019
61019ad
2.7.0a
alexdobin Jan 24, 2019
6b90d11
header needed for python scipy mmread compatibility
k3yavi Jan 25, 2019
400a13d
Just a final blank in a Makefile
smoe Jan 29, 2019
39769ad
Fixed minor bugs in STARsolo.
alexdobin Feb 4, 2019
4cf002a
Merge pull request #550 from k3yavi/master
alexdobin Feb 4, 2019
286977c
Merge pull request #552 from smoe/patch-6
alexdobin Feb 4, 2019
60cf19c
Ready for 2.7.0b
alexdobin Feb 5, 2019
bc77b84
Fixed compilation problems and docker file.
alexdobin Feb 6, 2019
2a24d53
Replaced tabs with spaces in STARsolo matrix.mtx output.
alexdobin Feb 6, 2019
b297311
Fixed a problem with STARsolo genes.tsv output.
alexdobin Feb 7, 2019
a24b366
Ready for 2.7.0c
alexdobin Feb 8, 2019
aa0b138
Enforced genome version rules for 2.7.0
alexdobin Feb 13, 2019
4a5e088
* Implemented --soloBarcodeReadLength option for barcode read length …
alexdobin Feb 17, 2019
e76f068
Ready for 2.7.0d
alexdobin Feb 18, 2019
9b0e664
Ready for 2.7.0d
alexdobin Feb 18, 2019
fd73c09
Started implementing STARsolo GeneFull option.
alexdobin Feb 19, 2019
2fb6ad2
Debugging STARsolo GeneFull option.
alexdobin Feb 20, 2019
05140d3
Reverted to 2.7.0d for debugging.
alexdobin Feb 20, 2019
d2094c0
Fixed problems with --quantMode GeneCounts and --parametersFiles opti…
alexdobin Feb 21, 2019
5dd7c3a
Reverted the revert to merge master changes.
alexdobin Feb 21, 2019
511aa2f
Fixed problem in gene info input.
alexdobin Feb 22, 2019
4aff05c
Finished coding GeneFull option.
alexdobin Feb 25, 2019
fc73fd4
Ready for 2.7.0e
alexdobin Feb 25, 2019
295440b
Minor text fixes in parametersDefault.
alexdobin Mar 7, 2019
ee9ce30
Fixed problems with STARsolo and 2-pass.
alexdobin Mar 14, 2019
73ccc96
Merge branch 'master' into solo_GeneFull
alexdobin Mar 15, 2019
99e73e6
Implement CB and UMI output into Chimeric.out.junction file.
alexdobin Mar 15, 2019
39c8df7
Implementing solo no-WhiteList operation.
alexdobin Mar 15, 2019
09ffccb
Allow same tag for "--sjdbGTFtagExonParentTranscript" and "--sjdbGTFt…
ghuls Mar 18, 2019
94d7457
Implemented no-whitelist solo operation.
alexdobin Mar 19, 2019
7d4a7da
Fixed a problem with CR,CY,UR,UQ SAM tags in solo output. Issue #593.
alexdobin Mar 20, 2019
fc4d116
Removed trailing spaces and \r from all files.
alexdobin Mar 20, 2019
1932e34
Preparing for 2.7.0f
alexdobin Mar 20, 2019
bbffcfd
Fixed a problem in STARsolo with empty Unmapped.out.mate2 file. Issue…
alexdobin Mar 22, 2019
72f29d3
Fixed a solo output problem introduced in f8adfa4, now the results ar…
alexdobin Mar 25, 2019
55517da
Preparing for 2.7.0f
alexdobin Mar 25, 2019
df7ab1a
Fixed several problems with previous changes.
alexdobin Mar 27, 2019
8b645e3
Ready for 2.7.0f
alexdobin Mar 28, 2019
e33cd86
Fixed a problem which may cause seg-faults for reads with many blocks.
alexdobin Mar 28, 2019
ad92120
Minor changes before mergin soloFullGene branch.
alexdobin Mar 30, 2019
73b650c
Removed trailing spaces and CR with sed -i -e 's/\r$//' -e 's/ \+$//'…
alexdobin Mar 30, 2019
b7fcfa6
Merged in master 2.7.0f_
alexdobin Mar 30, 2019
c393cd9
Add support for cell barcodes which are longer than 16 nucleotides.
ghuls Mar 15, 2019
984bf42
Some tweaks and test on solo_GeneFull branch. First public release of…
alexdobin Apr 5, 2019
0a6080c
Pulled in Gert's request.
alexdobin Apr 8, 2019
411dcb1
Merge branch 'allow_same_tag_for_exon_parent_transcript_and_gene' of …
alexdobin Apr 8, 2019
21c8dc9
Pulled in Gert's request #592
alexdobin Apr 8, 2019
3a744d8
Fixed problems with CB>16.
alexdobin Apr 10, 2019
a80414d
Fixed problems with --soloFeatures GeneFull option.
alexdobin Apr 10, 2019
210953b
Fixed a problem with solo GeneFull
alexdobin Apr 11, 2019
5a754f7
2.7.0f_solo_GeneFull_0411
alexdobin Apr 11, 2019
fa21a5a
Stratified the GTF gene ID/name/type and transcript ID loading to saf…
alexdobin Apr 29, 2019
4af25ab
2.7.0f_solo_GeneFull_0430
alexdobin May 1, 2019
cee6568
Added collapsing of CB in whitelist. Added output of unmapped read nu…
alexdobin May 14, 2019
04118fd
Implemented extras/scripts/soloBasicCellFilter.awk script to perform …
alexdobin May 21, 2019
87cee09
Fixed the problem with ALT=* in STAR-WASP.
alexdobin Jun 19, 2019
333c34f
added comments
brianjohnhaas Aug 6, 2019
dd08bca
more comments
brianjohnhaas Aug 6, 2019
3917c79
comparing chim score to non-chim align score
brianjohnhaas Aug 6, 2019
9caaf5c
comparing chim score to non-chim align score
brianjohnhaas Aug 6, 2019
3c8e9aa
comparing chim score to non-chim align score
brianjohnhaas Aug 6, 2019
fb8c2b2
comparing chim score to non-chim align score
brianjohnhaas Aug 6, 2019
66f2bb4
comparing chim score to non-chim align score
brianjohnhaas Aug 6, 2019
04ea76f
comparing chim score to non-chim align score
brianjohnhaas Aug 6, 2019
3e4d7d9
comparing chim score to non-chim align score
brianjohnhaas Aug 6, 2019
c1f533d
comparing chim score to non-chim align score
brianjohnhaas Aug 6, 2019
16c035c
comparing chim score to non-chim align score
brianjohnhaas Aug 6, 2019
689e987
include max possible alignment score in chim output
brianjohnhaas Aug 7, 2019
5fc68be
need to count overlapping bases in chimeric reads in chim score for c…
brianjohnhaas Aug 7, 2019
0fa8c51
perform chimeric stitching on all chim alignments captured for evalua…
brianjohnhaas Aug 7, 2019
6b68211
perform chimeric stitching on all chim alignments captured for evalua…
brianjohnhaas Aug 7, 2019
5b2f141
perform chimeric stitching on all chim alignments captured for evalua…
brianjohnhaas Aug 7, 2019
eb17539
require chimscore to exceed non-chim score
brianjohnhaas Aug 7, 2019
b0a4524
include max possible alignment score in chim detection routine directly
brianjohnhaas Aug 7, 2019
6f9e2f5
include max possible alignment score in chim detection routine directly
brianjohnhaas Aug 7, 2019
41d86a7
incorporated Alex Chimeric_1 branch updates
brianjohnhaas Aug 7, 2019
53ef2c0
use mergedPE align score for comparison w/ chimeric mergedPE score
brianjohnhaas Aug 9, 2019
7d3c72b
Added header to Chimeric.out.junction. Remove all parameters output f…
alexdobin Aug 12, 2019
3966342
bugfix needed to output column headers for Chimeric.out.junction file
brianjohnhaas Aug 13, 2019
85c4977
Ready for 2.7.2a.
alexdobin Aug 13, 2019
a41d7ba
Initial orbit commit
Jul 17, 2019
f73c7bb
Removed deprecated print statements, added current output
Jul 17, 2019
3cca0c7
Added orbit.h which includes extern C for main API, refactored method…
Jul 17, 2019
1decdbc
Fixed a few bugs, and added a new barrage of commented-out print stat…
Jul 18, 2019
20ed7ba
Update orbit c++ building config
Jul 19, 2019
462e68b
Removed compiled static libraries
Jul 19, 2019
99faf2f
Updated Makefile to actually work with liborbit.a
Jul 20, 2019
ceb37eb
Updates and removing useless stuff and bug fixes galore
Jul 23, 2019
ed01608
Return pointer instead of string object to avoid memory issues
Jul 23, 2019
1b5c496
Removed leftover omp stuff
Jul 23, 2019
f1d4116
Constification of mapGen
Jul 23, 2019
48e417b
Simplified Makefile
Jul 24, 2019
ad00a99
Updated output to not include a dummy read name
Jul 24, 2019
72e0294
Remove extra print statements
Jul 25, 2019
0bc8097
Added support to API for read pair alignment
Jul 25, 2019
b0d96ee
Added support for cloning an aligner
Jul 26, 2019
8c8834f
Removing unneeded print statements
Jul 27, 2019
9486a8c
Updated makefile to default to liborbit.a, removed tmp directory crea…
Aug 1, 2019
ee09064
Added comments to main orbit source files to document what they do
Aug 5, 2019
6bb7381
Made parameters const and updated APIs to separate information shared…
Aug 7, 2019
e8dc505
Removed debug print statement
Aug 7, 2019
b301377
Updated destructors to make sense with aligners initialized from a re…
Aug 7, 2019
0875779
Fixed a few bugs that came up when running multiple instances in para…
Aug 7, 2019
ab15a5a
Bug fix
Aug 12, 2019
ca800b4
Add destructor for OutSJ
Aug 15, 2019
0f13b0b
Cleaned up some comments
Aug 15, 2019
e30f675
build on Mac
pmarks Aug 23, 2019
809cd9c
don't make the huge SJ output vec
pmarks Sep 1, 2019
a179e4d
Fix small memory leak (inspire by f0cadaf)
Sep 26, 2019
83b8674
Workaround for problems with older libc.
adam-azarchs Jul 10, 2017
17eb4ac
More workarounds for old glibc bug with strtoul.
adam-azarchs Jul 21, 2017
6e19a8a
More signed/unsiged fixes.
adam-azarchs Jul 24, 2017
17d52ad
Fix use of nonstandard istringstream buffer API.
adam-azarchs Jul 24, 2017
7adccf8
Fix output buffer initialization.
adam-azarchs Jul 24, 2017
f100935
Free the memory of the reference genome (#3)
sjackman Nov 12, 2019
54baf3d
fix resource cleanup code
Jan 3, 2020
81058be
make logging a no-op
Dec 18, 2019
704c8a6
fix whitespace issues cause by lack of vim config
Dec 18, 2019
9b3b380
orbit: fix memory leak
Jan 14, 2020
1cd2da1
orbit: fix missing includes
Jan 14, 2020
607a70d
Genome: fix use (delete[]) after free
Jan 15, 2020
2b11129
fix other memleak
Jan 16, 2020
a0ce436
trim links to unused code
pmarks Feb 2, 2020
bb4aad1
fixes for paired-end support
Apr 26, 2020
190e154
new interface requires fewer mallocs per alignment
Apr 27, 2020
6f511be
Fix Makefile (#9)
nlhepler May 12, 2020
2ff0e8c
Merge pull request #8 from 10XGenomics/lh/fewer-mallocs
sjackman May 12, 2020
b864977
Fix warnings and treat warnings as errors (#10)
sjackman May 12, 2020
e059bc7
re-instate reference version check
Jul 13, 2020
f49e976
forfeit STARsolo to bring back quantMode
Jul 14, 2020
05d6c8b
Remove the submodule star-sys/STAR
sjackman Aug 13, 2020
ab8c1ae
Add 'star-sys/STAR/source/' from commit '85c4977512e8454b503c11d49570…
sjackman Aug 13, 2020
cb1ccec
star-sys/STAR: Merge commit 'f49e976d9330e5306753c1f296093d79d6260169…
sjackman Aug 13, 2020
6e5be8d
Add star-sys/STAR/LICENSE
sjackman Aug 13, 2020
07e9bbd
Add LICENSE
sjackman Aug 13, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 0 additions & 4 deletions .gitmodules

This file was deleted.

1 change: 0 additions & 1 deletion star-sys/STAR
Submodule STAR deleted from 87af89
2,220 changes: 2,220 additions & 0 deletions star-sys/STAR/source/1.fastq

Large diffs are not rendered by default.

1,745 changes: 1,745 additions & 0 deletions star-sys/STAR/source/1.sam

Large diffs are not rendered by default.

74 changes: 74 additions & 0 deletions star-sys/STAR/source/BAMbinSortByCoordinate.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
#include "BAMbinSortByCoordinate.h"
#include "ErrorWarning.h"
#include "serviceFuns.cpp"
#include "BAMfunctions.h"

void BAMbinSortByCoordinate(uint32 iBin, uint binN, uint binS, uint nThreads, string dirBAMsort, Parameters &P, Genome &mapGen) {

if (binS==0) return; //nothing to do for empty bins
//allocate arrays
char *bamIn=new char[binS+1];
uint *startPos=new uint[binN*3];

uint bamInBytes=0;
//load all aligns
for (uint it=0; it<nThreads; it++) {
string bamInFile=dirBAMsort+to_string(it)+"/"+to_string((uint) iBin);
ifstream bamInStream;
bamInStream.open(bamInFile.c_str(),std::ios::binary | std::ios::ate);//open at the end to get file size
int64 s1=bamInStream.tellg();
if (s1>0) {
bamInStream.seekg(std::ios::beg);
bamInStream.read(bamIn+bamInBytes,s1);//read the whole file
} else if (s1<0) {
ostringstream errOut;
errOut << "EXITING because of FATAL ERROR: failed reading from temporary file: " << dirBAMsort+to_string(it)+"/"+to_string((uint) iBin);
exitWithError(errOut.str(),std::cerr, P.inOut->logMain, 1, P);
};
bamInBytes += bamInStream.gcount();
bamInStream.close();
remove(bamInFile.c_str());
};
if (bamInBytes!=binS) {
ostringstream errOut;
errOut << "EXITING because of FATAL ERROR: number of bytes expected from the BAM bin does not agree with the actual size on disk: ";
errOut << "Expected bin size=" <<binS <<" ; size on disk="<< bamInBytes <<" ; bin number="<< iBin <<"\n";
exitWithError(errOut.str(),std::cerr, P.inOut->logMain, 1, P);
};

//extract coordinates

for (uint ib=0,ia=0;ia<binN;ia++) {
uint32 *bamIn32=(uint32*) (bamIn+ib);
startPos[ia*3] =( ((uint) bamIn32[1]) << 32) | ( (uint)bamIn32[2] );
startPos[ia*3+2]=ib;
ib+=bamIn32[0]+sizeof(uint32);//note that size of the BAM record does not include the size record itself
startPos[ia*3+1]=*( (uint*) (bamIn+ib) ); //read order
ib+=sizeof(uint);
};

//sort
qsort((void*) startPos, binN, sizeof(uint)*3, funCompareArrays<uint,3>);

BGZF *bgzfBin;
bgzfBin=bgzf_open((dirBAMsort+"/b"+to_string((uint) iBin)).c_str(),("w"+to_string((long long) P.outBAMcompression)).c_str());
if (bgzfBin==NULL) {
ostringstream errOut;
errOut <<"EXITING because of fatal ERROR: could not open temporary bam file: " << dirBAMsort+"/b"+to_string((uint) iBin) << "\n";
errOut <<"SOLUTION: check that the disk is not full, increase the max number of open files with Linux command ulimit -n before running STAR";
exitWithError(errOut.str(), std::cerr, P.inOut->logMain, EXIT_CODE_PARAMETER, P);
};

outBAMwriteHeader(bgzfBin,P.samHeaderSortedCoord,mapGen.chrNameAll,mapGen.chrLengthAll);
//send ordered aligns to bgzf one-by-one
for (uint ia=0;ia<binN;ia++) {
char* ib=bamIn+startPos[ia*3+2];
bgzf_write(bgzfBin,ib, *((uint32*) ib)+sizeof(uint32) );
};

bgzf_flush(bgzfBin);
bgzf_close(bgzfBin);
//release memory
delete [] bamIn;
delete [] startPos;
};
11 changes: 11 additions & 0 deletions star-sys/STAR/source/BAMbinSortByCoordinate.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
#ifndef CODE_BAMbinSortByCoordinate
#define CODE_BAMbinSortByCoordinate
#include "IncludeDefine.h"
#include "Parameters.h"
#include "Genome.h"

#include SAMTOOLS_BGZF_H

void BAMbinSortByCoordinate(uint32 iBin, uint binN, uint binS, uint nThreads, string dirBAMsort, Parameters &P, Genome &mapGen);

#endif
80 changes: 80 additions & 0 deletions star-sys/STAR/source/BAMbinSortUnmapped.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,80 @@
#include "BAMbinSortUnmapped.h"
#include "ErrorWarning.h"
#include "BAMfunctions.h"

void BAMbinSortUnmapped(uint32 iBin, uint nThreads, string dirBAMsort, Parameters &P, Genome &mapGen) {

BGZF *bgzfBin;
bgzfBin=bgzf_open((dirBAMsort+"/b"+to_string((uint) iBin)).c_str(),("w"+to_string((long long) P.outBAMcompression)).c_str());
if (bgzfBin==NULL) {
ostringstream errOut;
errOut <<"EXITING because of fatal ERROR: could not open temporary bam file: " << dirBAMsort+"/b"+to_string((uint) iBin) << "\n";
errOut <<"SOLUTION: check that the disk is not full, increase the max number of open files with Linux command ulimit -n before running STAR";
exitWithError(errOut.str(), std::cerr, P.inOut->logMain, EXIT_CODE_PARAMETER, P);
};

outBAMwriteHeader(bgzfBin,P.samHeaderSortedCoord,mapGen.chrNameAll,mapGen.chrLengthAll);


vector<string> bamInFile;
std::map <uint,uint> startPos;

for (uint it=0; it<nThreads; it++) {//files from all threads, and BySJout
bamInFile.push_back(dirBAMsort+to_string(it)+"/"+to_string((uint) iBin));
bamInFile.push_back(dirBAMsort+to_string(it)+"/"+to_string((uint) iBin)+".BySJout");
};
vector<uint32> bamSize(bamInFile.size(),0);//record sizes

//allocate arrays
char **bamIn=new char* [bamInFile.size()];
ifstream *bamInStream = new ifstream [bamInFile.size()];

for (uint it=0; it<bamInFile.size(); it++) {//initialize
bamIn[it] = new char [BAMoutput_oneAlignMaxBytes];

bamInStream[it].open(bamInFile.at(it).c_str());//opean all files

bamInStream[it].read(bamIn[it],sizeof(int32));//read BAM record size
if (bamInStream[it].good()) {
bamSize[it]=((*(uint32*)bamIn[it])+sizeof(int32));//true record size +=4 (4 bytes for uint-iRead)
bamInStream[it].read(bamIn[it]+sizeof(int32),bamSize.at(it)-sizeof(int32)+sizeof(uint));//read the rest of the record, including last uint = iRead
startPos[*(uint*)(bamIn[it]+bamSize.at(it))]=it;//startPos[iRead]=it : record the order of the files to output
} else {//nothing to do here, file is empty, do not record it
};
};

//send ordered aligns to bgzf one-by-one
while (startPos.size()>0) {
uint it=startPos.begin()->second;
uint startNext=startPos.size()>1 ? (++startPos.begin())->first : (uint) -1;

while (true) {
bgzf_write(bgzfBin, bamIn[it], bamSize.at(it));
bamInStream[it].read(bamIn[it],sizeof(int32));//read record size
if (bamInStream[it].good()) {
bamSize[it]=((*(uint32*)bamIn[it])+sizeof(int32));
bamInStream[it].read(bamIn[it]+sizeof(int32),bamSize.at(it)-sizeof(int32)+sizeof(uint));//read the rest of the record, including la$
uint iRead=*(uint*)(bamIn[it]+bamSize.at(it));
if (iRead>startNext) {//this read from this chunk is > than a read from another chunk
startPos[iRead]=it;
break;
};
} else {//nothing to do here, reached the end of the file
break;
};
};
startPos.erase(startPos.begin());
};

bgzf_flush(bgzfBin);
bgzf_close(bgzfBin);


for (uint it=0; it<bamInFile.size(); it++) {//destroy at the end
bamInStream[it].close();
remove(bamInFile.at(it).c_str());
delete [] bamIn[it];
};
delete [] bamIn;
delete [] bamInStream;
};
11 changes: 11 additions & 0 deletions star-sys/STAR/source/BAMbinSortUnmapped.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
#ifndef CODE_BAMbinSortUnmapped
#define CODE_BAMbinSortUnmapped
#include "IncludeDefine.h"
#include "Parameters.h"
#include "Genome.h"

#include SAMTOOLS_BGZF_H

void BAMbinSortUnmapped(uint32 iBin, uint nThreads, string dirBAMsort, Parameters &P, Genome &mapGen);

#endif
112 changes: 112 additions & 0 deletions star-sys/STAR/source/BAMfunctions.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,112 @@
#include <stdexcept>
#include "BAMfunctions.h"
#include "htslib/htslib/kstring.h"

string bam_cigarString (bam1_t *b) {//output CIGAR string
// kstring_t strK;
// kstring_t *str=&strK;
const bam1_core_t *c = &b->core;

string cigarString("");
if ( c->n_cigar > 0 ) {
uint32_t *cigar = bam_get_cigar(b);
for (int i = 0; i < c->n_cigar; ++i) {
cigarString+=to_string((uint)bam_cigar_oplen(cigar[i]))+bam_cigar_opchr(cigar[i]);
};
};


// if (c->n_cigar) { // cigar
// for (int i = 0; i < c->n_cigar; ++i) {
// kputw(bam_cigar_oplen(cigar[i]), str);
// kputc(bam_cigar_opchr(cigar[i]), str);
// }
// } else kputc('*', str);
//
// string cigarString (str->s,str->l);
return cigarString;
};

int bam_read1_fromArray(char *bamChar, bam1_t *b) //modified from samtools bam_read1 to assign BAM record in mmemry to bam structure
{
bam1_core_t *c = &b->core;
int32_t block_len; //, ret, i;
// // uint32_t x[8];
// // if ((ret = bgzf_read(fp, &block_len, 4)) != 4) {
// // if (ret == 0) return -1; // normal end-of-file
// // else return -2; // truncated
// // }
uint32_t *x;

uint32_t *bamU32=(uint32_t*) bamChar;
block_len=bamU32[0];

// // if (bgzf_read(fp, x, 32) != 32) return -3;
// // if (fp->is_be) {
// // ed_swap_4p(&block_len);
// // for (i = 0; i < 8; ++i) ed_swap_4p(x + i);
// // }
x=bamU32+1;

c->tid = x[0]; c->pos = x[1];
c->bin = x[2]>>16; c->qual = x[2]>>8&0xff; c->l_qname = x[2]&0xff;
c->flag = x[3]>>16; c->n_cigar = x[3]&0xffff;
c->l_qseq = x[4];
c->mtid = x[5]; c->mpos = x[6]; c->isize = x[7];
b->l_data = block_len - 32;
if (b->l_data < 0 || c->l_qseq < 0) return -4;
if ((char *)bam_get_aux(b) - (char *)b->data > b->l_data)
return -4;
if (b->m_data < b->l_data) {
b->m_data = b->l_data;
kroundup32(b->m_data);
b->data = (uint8_t*)realloc(b->data, b->m_data);
if (!b->data)
return -4;
}
// // if (bgzf_read(fp, b->data, b->l_data) != b->l_data) return -4;
// // //b->l_aux = b->l_data - c->n_cigar * 4 - c->l_qname - c->l_qseq - (c->l_qseq+1)/2;
// // if (fp->is_be) swap_data(c, b->l_data, b->data, 0);
b->data=(uint8_t*) bamChar+4*9;

return 4 + block_len;
}


void outBAMwriteHeader (BGZF* fp, const string &samh, const vector <string> &chrn, const vector <uint> &chrl) {
throw std::runtime_error("Unimplemented!");
//bgzf_write(fp,"BAM\001",4);
int32 hlen=samh.size();
//bgzf_write(fp,(char*) &hlen,sizeof(hlen));
//bgzf_write(fp,samh.c_str(),hlen);
int32 nchr=(int32) chrn.size();
//bgzf_write(fp,(char*) &nchr,sizeof(nchr));
for (int32 ii=0;ii<nchr;ii++) {
int32 rlen = (int32) (chrn.at(ii).size()+1);
int32 slen = (int32) chrl[ii];
//bgzf_write(fp,(char*) &rlen,sizeof(rlen));
//bgzf_write(fp,chrn.at(ii).data(),rlen); //this includes \0 at the end of the string
//bgzf_write(fp,(char*) &slen,sizeof(slen));
};
//bgzf_flush(fp);
};

template <class TintType>
TintType bamAttributeInt(const char *bamAux, const char *attrName) {//not tested!!!
const char *attrStart=strstr(bamAux,attrName);
if (attrStart==NULL) return (TintType) -1;
switch (attrStart[2]) {
case ('c'):
return (TintType) *(int8_t*)(attrStart+3);
case ('s'):
return (TintType) *(int16_t*)(attrStart+3);
case ('i'):
return (TintType) *(int32_t*)(attrStart+3);
case ('C'):
return (TintType) *(uint8_t*)(attrStart+3);
case ('S'):
return (TintType) *(uint16_t*)(attrStart+3);
case ('I'):
return (TintType) *(uint32_t*)(attrStart+3);
};
};
10 changes: 10 additions & 0 deletions star-sys/STAR/source/BAMfunctions.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
#ifndef DEF_BAMfunctions
#define DEF_BAMfunctions

#include "IncludeDefine.h"
#include SAMTOOLS_BGZF_H
#include SAMTOOLS_SAM_H
void outBAMwriteHeader (BGZF* fp, const string &samh, const vector <string> &chrn, const vector <uint> &chrl);
int bam_read1_fromArray(char *bamChar, bam1_t *b);
string bam_cigarString (bam1_t *b);
#endif
Loading