Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sourmash compare protein error #781

Closed
bluegenes opened this issue Nov 30, 2019 · 3 comments
Closed

sourmash compare protein error #781

bluegenes opened this issue Nov 30, 2019 · 3 comments
Labels

Comments

@bluegenes
Copy link
Contributor

I have some protein signatures with protein, dayhoff, and hp moltypes. In running compare on the protein sigs, I get the following error:

$ sourmash compare --protein --no-dna --no-dayhoff --no-hp  -k 11  -o testing smash-testing/gingivalis/protein/sigs/*.sig
== This is sourmash version 2.3.0. ==
== Please cite Brown and Irber (2016), doi:10.21105/joss.00027. ==

loaded 106 signatures total.
downsampling to scaled value of 2000

Traceback (most recent call last):
  File "/home/ntpierce/2019-protein-work/.snakemake/conda/db49f6a3/bin/sourmash", line 11, in <module>
    sys.exit(main())
  File "/home/ntpierce/2019-protein-work/.snakemake/conda/db49f6a3/lib/python3.7/site-packages/sourmash/__main__.py", line 83, in main
    cmd(sys.argv[2:])
  File "/home/ntpierce/2019-protein-work/.snakemake/conda/db49f6a3/lib/python3.7/site-packages/sourmash/commands.py", line 148, in compare
    n_jobs=args.processes)
  File "/home/ntpierce/2019-protein-work/.snakemake/conda/db49f6a3/lib/python3.7/site-packages/sourmash/compare.py", line 209, in compare_all_pairs
    similarities = compare_serial(siglist, ignore_abundance, downsample)
  File "/home/ntpierce/2019-protein-work/.snakemake/conda/db49f6a3/lib/python3.7/site-packages/sourmash/compare.py", line 39, in compare_serial
    similarities[i][j] = similarities[j][i] = siglist[i].similarity(siglist[j], ignore_abundance, downsample)
  File "/home/ntpierce/2019-protein-work/.snakemake/conda/db49f6a3/lib/python3.7/site-packages/sourmash/signature.py", line 121, in similarity
    return self.minhash.similarity(other.minhash, ignore_abundance)
  File "sourmash/_minhash.pyx", line 413, in sourmash._minhash.MinHash.similarity
  File "sourmash/_minhash.pyx", line 392, in sourmash._minhash.MinHash.jaccard
  File "sourmash/_minhash.pyx", line 387, in sourmash._minhash.MinHash.compare
  File "sourmash/_minhash.pyx", line 375, in sourmash._minhash.MinHash.intersection
ValueError: DNA/prot minhashes cannot be compared

An example signature (note, no dna signatures):

signature:

[{"class":"sourmash_signature","email":"","filename":"smash-testing/gingivalis/protein/GCA_000007585.1_ASM758v1_protein.faa.gz","hash_function":"0.murmur64","license":"CC0","signatures":[{"ksize":7,"max_hash":9223372036854776,"md5sum":"8f14e45fceea167a5a36dedd4bea2543","mins":[],"molecule":"protein","num":0,"seed":42},{"ksize":7,"max_hash":9223372036854776,"md5sum":"8f14e45fceea167a5a36dedd4bea2543","mins":[],"molecule":"dayhoff","num":0,"seed":42},{"ksize":7,"max_hash":9223372036854776,"md5sum":"8f14e45fceea167a5a36dedd4bea2543","mins":[],"molecule":"hp","num":0,"seed":42},{"ksize":11,"max_hash":9223372036854776,"md5sum":"eddce6d9ac96be04ed806aadf53665d1","mins":[935333424279453,2538868037911012,7421288836408751,7960286979232677,8654912015719540],"molecule":"protein","num":0,"seed":42},{"ksize":11,"max_hash":9223372036854776,"md5sum":"6512bd43d9caa6e02c990b0a82652dca","mins":[],"molecule":"dayhoff","num":0,"seed":42},{"ksize":11,"max_hash":9223372036854776,"md5sum":"6512bd43d9caa6e02c990b0a82652dca","mins":[],"molecule":"hp","num":0,"seed":42},{"ksize":17,"max_hash":9223372036854776,"md5sum":"4e7f2da275e0d3cd3488a7429f23c0e7","mins":[80545341491115,95893197637407,115032912017982,167876181889216,186411802953277,222466650340651,224967183012543,258505356847424,309875701424311,355213055946019,386676749766432,412910340994277,462537020181146,494965971291721,559141879168033,561484232050194,644615490355418,658172460965864,720535257532265,738352407205548,745810431880043,759181896175271,850588167607652,887659205846183,894164001336358,905078595321889,970165616477383,1017856994095351,1028412790624984,1050795138079130,1139932302760059,1186388344819559,1192818915503467,1198212766664640,1216892715424691,1219438185071604,1237070334261946,1387123735448234,1434001421971507,1485329804804021,1506460396271450,1557881761202533,1621319275541247,1652996636263092,1759458369106525,1775445012461498,1778569708504615,1790264270235488,1814476471419496,1836560547212168,1860873491028851,1896569881953525,1897420327685705,1923345118834304,1936637912433824,1937255081379236,1947138500658091,1975618333674647,2144882621679500,2149108682537798,2161921385516085,2246818310352884,2290340663993180,2304543670495485,2364205861776990,2412514993466726,2452585910741483,2509528096806449,2539884144415183,2541107353815135,2572053948720929,2622479875739911,2647527190109029,2713096700045735,2761919595787895,2867643251943479,2925231867832341,3025758098634438,3033763278178401,3058704182312610,3115261468695424,3166876962072314,3233211247123021,3235567354197305,3243245537683617,3259305016866742,3337410579699057,3418776874767941,3423000771650184,3507216917735704,3595029584959389,3634370995551791,3768235701772236,3775046865943208,3817698542949254,3828452511096857,3833691931514236,4038649211585583,4067919805126295,4097280795487840,4221222492196035,4343951716797518,4367658693942699,4450087885545040,4509903538197388,4513107976808259,4527368785345773,4562209560900537,4639168970839038,4668382471614239,4669352267366042,4679521416287799,4686839507440048,4709562456621538,4724594069568581,4736749713837036,4738083948924482,4759958931942220,4789425451192135,4800497165006063,4820147941509539,4850297769952409,4850600607797904,4868447772682418,4905552855041575,4971930424454368,4985072012605862,4995739741362592,5001422416309153,5009161631707149,5030854929253622,5035217786452507,5065729136452008,5077023046760465,5082191099848633,5176553408615803,5187575110109033,5315950415531956,5380346145125625,5432557230189792,5443962862203433,5504634862131301,5523415209316411,5539462263412859,5578575768148920,5594872593367517,5602170740228452,5627969550666066,5645214919396423,5715244891476417,5716776196844718,5756975294595510,5791158195561112,5804298783161609,5859298762196680,5941703205411035,5984207756172447,6006605546729755,6040624656956813,6079431920736680,6120850189740657,6143463234120488,6403824852243549,6434823327466671,6452924095770729,6486607607852558,6493745783442873,6609514638731456,6621397331436691,6639512072137206,6651392661523258,6663467668433959,6746635771795261,6774460282204914,6798792706747163,6818258524917823,6853148606664897,6862857851511771,6863419022655845,6884221467246173,6960794347010868,6971091824735669,6996177331699958,7020119488788226,7041050337195108,7093433891425227,7105809197035780,7123502524719655,7152363307939419,7180066834856704,7230192109807876,7268204104636229,7369432586890070,7369545607316536,7452432790013634,7475586378544421,7518091663474830,7555974865996135,7567184934833446,7598164152364795,7691459572863761,7709831060098954,7733841383672718,7823392534007031,7863676561152009,7870327771274750,7886730508543846,7920363556752832,7939090045786917,7956077051018542,8006111745484961,8014778875958321,8022584532939757,8066519196340294,8068684551622667,8139161380118085,8156857187140640,8161593522327026,8243833392562765,8267006323898637,8308586921720912,8354821843549256,8375847106291424,8417181145027020,8463616731079599,8649752374463547,8676699653973820,8678492387990270,8746431528629889,8764389844740277,8806802453026237,8846826235070247,8937418166974287,8969593290271317,8982504059107137,9012010336229099,9027180495854695,9045704090610147,9104390213566336,9118607030191614,9198853303101481],"molecule":"protein","num":0,"seed":42},{"ksize":17,"max_hash":9223372036854776,"md5sum":"61b7ea9355dd355ea5d6670c919a45cf","mins":[1816309557216154,3456445761862780,7650346105396480],"molecule":"dayhoff","num":0,"seed":42},{"ksize":17,"max_hash":9223372036854776,"md5sum":"70efdf2ec9b086079795c442636b55fb","mins":[],"molecule":"hp","num":0,"seed":42}],"version":0.4}]

compute command:
sourmash compute --scaled 2000 -k 7,11,17 smash-testing/gingivalis/protein/GCA_000007585.1_ASM758v1_protein.faa.gz -o smash-testing/gingivalis/protein/sigs/GCA_000007585.1_ASM758v1_protein.sig -p 1 --input-is-protein --protein --dayhoff --hp

Note that dayhoff and hp comparisons do work:

dayhoff example: sourmash compare --no-protein --no-dna --dayhoff --no-hp -k 11 -o testing smash-testing/gingivalis/protein/sigs/*protein.sig

I got the same error with protein sigs computed with scaled=1.

@luizirber
Copy link
Member

Thanks for the minimal test case, @bluegenes! I managed to reproduce the error in a test in #782. Still need to fix it now 😬

@luizirber luizirber added the bug label Nov 30, 2019
@luizirber
Copy link
Member

This was released in 2.3.1, can you test and see if it works @bluegenes ?

@ctb
Copy link
Contributor

ctb commented Jan 8, 2020

Closing for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants