You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi,
I am using gffutils to plot genes in a given region. This is very useful but quite slow because I am doing gffutils.create_db(file_path, ':memory:') which creates a database for the full genome. Is it possible to create a database only for a region?
Thank you very much,
Lucille
The text was updated successfully, but these errors were encountered:
GFF and GTF files don't have a defined order, so in the general case you'd need to read the whole file even if you only want a subset. Even if you know your files are sorted, gffutils doesn't have a good mechanism of extracting the regions you want without first entering them into the database.
If your plotting requires a gffutils database but only of a small genomic region, I would use bedtools or pybedtools to do the subsetting first, and then only make a database on the subset.
Here's some code to get you started:
importpybedtoolsimportgffutilsgff=pybedtools.BedTool('full_genome_annotation.gff')
defdb_for_region(region):
# only the GFF features that intersect regions.# May want u=True to keep entire GFF features,# or keep as-is to truncate them to the regionsubset=gff.intersect(region)
# create database on subsetreturngffutils.create_db(subset.fn, ':memory:')
region= [pybedtools.Interval('chr1', 1, 100)]
# orregion=pybedtools.BedTool('chr1 1 100', from_string=True)
db=db_for_region(region)
Hi,
I am using gffutils to plot genes in a given region. This is very useful but quite slow because I am doing
gffutils.create_db(file_path, ':memory:')
which creates a database for the full genome. Is it possible to create a database only for a region?Thank you very much,
Lucille
The text was updated successfully, but these errors were encountered: