You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is it correct to assume that "Clean" sam files should be usable by other Picard tools? It would be useful if CleanSam removed duplicate alignments from BAM files. Tophat alignments with duplicate alignment records that are then cleaned with CleanSam do not play nicely with MarkDuplicates.
Tophat v2.1.1
Picard v2.2.2
>/apps/sys/galaxy/external_packages/jdk1.8.0_60/bin/java -jar /apps/sys/gal
axy/external_packages/picard-tools-2.2.2/picard.jar CleanSam INPUT=mark_dup.bam OUTPUT=mark_dup.cleaned.sorted.bam
[Sun Apr 24 15:49:51 EDT 2016] picard.sam.CleanSam INPUT=mark_dup.bam OUTPUT=mark_dup.cleaned.sorted.bam VERBOSITY=INFO QUIET=false VA
LIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client
_secrets.json
...
INFO 2016-04-24 16:07:24 CleanSam Processed 125,000,000 records. Elapsed time: 00:17:33s. Time for last 1,000,000: 8s.
Last read position: */*
INFO 2016-04-24 16:07:32 CleanSam Processed 126,000,000 records. Elapsed time: 00:17:41s. Time for last 1,000,000: 8s.
Last read position: */*
[Sun Apr 24 16:07:37 EDT 2016] picard.sam.CleanSam done. Elapsed time: 17.78 minutes.
Runtime.totalMemory()=1691877376
>/apps/sys/galaxy/external_packages/jdk1.8.0_60/bin/java -jar /apps/sys/gal
axy/external_packages/picard-tools-2.2.2/picard.jar MarkDuplicates INPUT=mark_dup.cleaned.sorted.bam OUTPUT=/dev/null METRICS_FILE=test_hisat.$
upstats.txt VALIDATION_STRINGENCY=SILENT
[Sun Apr 24 16:43:14 EDT 2016] picard.sam.markduplicates.MarkDuplicates INPUT=[mark_dup.cleaned.sorted.bam] OUTPUT=/dev/null METRICS_FILE=test$
hisat.duplicate_stats.txt VALIDATION_STRINGENCY=SILENT MAX_SEQUENCES_FOR_DISK_READ_ENDS_MAP=50000 MAX_FILE_HANDLES_FOR_READ_ENDS_MAP=8000
SORTING_COLLECTION_SIZE_RATIO=0.25 REMOVE_SEQUENCING_DUPLICATES=false TAGGING_POLICY=DontTag REMOVE_DUPLICATES=false ASSUME_SORTED=false DUP$
ICATE_SCORING_STRATEGY=SUM_OF_BASE_QUALITIES PROGRAM_RECORD_ID=MarkDuplicates PROGRAM_GROUP_NAME=MarkDuplicates READ_NAME_REGEX=<optimized c$
pture of last three ':' separated fields as numeric values> OPTICAL_DUPLICATE_PIXEL_DISTANCE=100 VERBOSITY=INFO QUIET=false COMPRESSION_LEVE$
=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json
[Sun Apr 24 16:43:14 EDT 2016] Executing as ralstonm@kraken.pri.bms.com on Linux 2.6.32-431.23.3.el6.x86_64 amd64; Java HotSpot(TM) 64-Bit S$
rver VM 1.8.0_60-b27; Picard version: 2.2.2(20d49152d0840a960fcb97df76dbaca260b39244_1461168806) IntelDeflater
INFO 2016-04-24 16:43:15 MarkDuplicates Start of doWork freeMemory: 2046008680; totalMemory: 2058354688; maxMemory: 28631367680
INFO 2016-04-24 16:43:15 MarkDuplicates Reading input file and constructing read end information.
INFO 2016-04-24 16:43:15 MarkDuplicates Will retain up to 110120644 data points before spilling to disk.
INFO 2016-04-24 16:43:34 MarkDuplicates Read 1,000,000 records. Elapsed time: 00:00:18s. Time for last 1,000,000: 18s. Last
read position: chr2:153,537,837
INFO 2016-04-24 16:43:34 MarkDuplicates Tracking 13593 as yet unmatched pairs. 218 records in RAM.
[Sun Apr 24 16:43:35 EDT 2016] picard.sam.markduplicates.MarkDuplicates done. Elapsed time: 0.33 minutes.
Runtime.totalMemory()=3172990976
To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
Exception in thread "main" htsjdk.samtools.SAMException: Value was put into PairInfoMap more than once. 3: null:NB501257:32:HWCMVBGXX:1:21209:24100:6206
at htsjdk.samtools.CoordinateSortedPairInfoMap.ensureSequenceLoaded(CoordinateSortedPairInfoMap.java:133)
at htsjdk.samtools.CoordinateSortedPairInfoMap.remove(CoordinateSortedPairInfoMap.java:86)
at picard.sam.markduplicates.util.DiskBasedReadEndsForMarkDuplicatesMap.remove(DiskBasedReadEndsForMarkDuplicatesMap.java:61)
at picard.sam.markduplicates.MarkDuplicates.buildSortedReadEndLists(MarkDuplicates.java:442)
at picard.sam.markduplicates.MarkDuplicates.doWork(MarkDuplicates.java:193)
at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:209)
at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:95)
at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:105)
>samtools view mark_dup.bam | grep NB501257:32:HWCMVBGXX:1:21209:24100:6206
NB501257:32:HWCMVBGXX:1:21209:24100:6206 177 chr1 113980481 50 75M chr12 95825708 0 TCTTCTTCTTCTTCTTCTTCTCCTTCTCCTTCTCCTTCTCCTTCTCCTTCTCCTTCTCCTTCTCCTCCTTCTCCT EEEEEEEEEEAAEEAEEAEEE/EEEEE/EEEEE/EEEEE/EEEEEAEEEEEEEAEEEEEEEEEEEEEE6EAAAA6 AS:i:-5 XN:i:0 XM:i:1 XO:i:0 XG:i:0 NM:i:1 MD:Z:5C69 YT:Z:UU NH:i:1
NB501257:32:HWCMVBGXX:1:21209:24100:6206 177 chr1 113980481 50 75M chr12 95825708 0 TCTTCTTCTTCTTCTTCTTCTCCTTCTCCTTCTCCTTCTCCTTCTCCTTCTCCTTCTCCTTCTCCTCCTTCTCCT EEEEEEEEEEAAEEAEEAEEE/EEEEE/EEEEE/EEEEE/EEEEEAEEEEEEEAEEEEEEEEEEEEEE6EAAAA6 AS:i:-5 XN:i:0 XM:i:1 XO:i:0 XG:i:0 NM:i:1 MD:Z:5C69 YT:Z:UU NH:i:1
NB501257:32:HWCMVBGXX:1:21209:24100:6206 113 chr12 95825708 50 76M chr1 113980481 0 AGAAGTAGAAGAAGAAGAAGAAGAAGAAGAAGAAGAAGAAGAAGAAGAAGAAGAAGAAGAAGAAGAAGAAGAAGAA AE/EE/EE/E/EEA/EE<AE/EA/EAEEAEEEEEEEEEEE<AEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAAAAAAS:i:0 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:76 YT:Z:UU NH:i:1
NB501257:32:HWCMVBGXX:1:21209:24100:6206 113 chr12 95825708 50 76M chr1 113980481 0 AGAAGTAGAAGAAGAAGAAGAAGAAGAAGAAGAAGAAGAAGAAGAAGAAGAAGAAGAAGAAGAAGAAGAAGAAGAA AE/EE/EE/E/EEA/EE<AE/EA/EAEEAEEEEEEEEEEE<AEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAAAAAAS:i:0 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:76 YT:Z:UU NH:i:1
The text was updated successfully, but these errors were encountered:
From the usage of CleanSam: "Cleans the provided SAM/BAM, soft-clipping
beyond-end-of-reference alignments and setting MAPQ to 0 for unmapped
reads". Looking at the code, that is exactly what it does, no more, no
less.
So, no CleanSam does not remove duplicate alignments. if you don't want
duplicate alignements in your bam, it might be easier to tell TopHat to
only emit one alignment... (I think -g 1 is the flag you want.)
Is it correct to assume that "Clean" sam files should be usable by other Picard tools? It would be useful if CleanSam removed duplicate alignments from BAM files. Tophat alignments with duplicate alignment records that are then cleaned with CleanSam do not play nicely with MarkDuplicates.
Tophat v2.1.1
Picard v2.2.2
The text was updated successfully, but these errors were encountered: