Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problems when generating matrix (episcanpy.ct.bld_mtx_fly()) #100

Open
malumbres opened this issue Jun 6, 2021 · 0 comments
Open

Problems when generating matrix (episcanpy.ct.bld_mtx_fly()) #100

malumbres opened this issue Jun 6, 2021 · 0 comments

Comments

@malumbres
Copy link

malumbres commented Jun 6, 2021

Dear Anna,
congratulations for this package! I am a big fan of the scanpy environment.

I was wondering whether you have any tutorial for scATAC-seq from 10XGenomics ( or scRNA-seq-scATA-seq).
Specifically, I have the following problem when reading ATAC-seq 10X data:

epi.ct.bld_mtx_fly(tsv_file="atac_fragments.tsv.gz", annotation="atac_peak_annotation.tsv", save="test.h5ad", )

ERROR:

loading barcodes

---------------------------------------------------------------------------
ParserError                               Traceback (most recent call last)
<ipython-input-12-4093bfdf2ff8> in <module>
      4 filename = P + "test.h5ad"
      5 
----> 6 epi.ct.bld_mtx_fly(tsv_file="atac_fragments.tsv.gz",
      7                    annotation="atac_peak_annotation.tsv",
      8                    save="test.h5ad",

~/opt/anaconda3/lib/python3.8/site-packages/episcanpy/count_matrix/_bld_atac_mtx.py in bld_mtx_fly(tsv_file, annotation, csv_file, genome, save)
     39 
     40         print('loading barcodes')
---> 41         barcodes = sorted(pd.read_csv(tsv_file, sep='\t', header=None).loc[:, 3].unique().tolist())
     42 
     43         # barcodes

~/opt/anaconda3/lib/python3.8/site-packages/pandas/io/parsers.py in read_csv(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, skipfooter, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, cache_dates, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, doublequote, escapechar, comment, encoding, dialect, error_bad_lines, warn_bad_lines, delim_whitespace, low_memory, memory_map, float_precision, storage_options)
    603     kwds.update(kwds_defaults)
    604 
--> 605     return _read(filepath_or_buffer, kwds)
    606 
    607 

~/opt/anaconda3/lib/python3.8/site-packages/pandas/io/parsers.py in _read(filepath_or_buffer, kwds)
    461 
    462     with parser:
--> 463         return parser.read(nrows)
    464 
    465 

~/opt/anaconda3/lib/python3.8/site-packages/pandas/io/parsers.py in read(self, nrows)
   1050     def read(self, nrows=None):
   1051         nrows = validate_integer("nrows", nrows)
-> 1052         index, columns, col_dict = self._engine.read(nrows)
   1053 
   1054         if index is None:

~/opt/anaconda3/lib/python3.8/site-packages/pandas/io/parsers.py in read(self, nrows)
   2054     def read(self, nrows=None):
   2055         try:
-> 2056             data = self._reader.read(nrows)
   2057         except StopIteration:
   2058             if self._first_chunk:

pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader.read()

pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._read_low_memory()

pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._read_rows()

pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._tokenize_rows()

pandas/_libs/parsers.pyx in pandas._libs.parsers.raise_parser_error()

ParserError: Error tokenizing data. C error: Expected 1 fields in line 53, saw 5

These are lines 52 and 53 in the tsv file:

# primary_contig=JH584295.1 | &nbsp; | &nbsp;
-- | -- | --
chr1 | 3000087 | 3000282 | GCCAATTAGCACTAAC-1 | 1
chr1 | 3001599 | 3001786 | AAGGTATAGCAGGTGG-1 | 1

Many thanks!
Marcos

@malumbres malumbres changed the title Problems when generating matrix Problems when generating matrix (episcanpy.ct.bld_mtx_fly()) Jun 6, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant