Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is it possible to create a db only for one chromosome? #143

Closed
lldelisle opened this issue Sep 18, 2019 · 2 comments
Closed

Is it possible to create a db only for one chromosome? #143

lldelisle opened this issue Sep 18, 2019 · 2 comments

Comments

@lldelisle
Copy link

Hi,
I am using gffutils to plot genes in a given region. This is very useful but quite slow because I am doing gffutils.create_db(file_path, ':memory:') which creates a database for the full genome. Is it possible to create a database only for a region?
Thank you very much,

Lucille

@daler
Copy link
Owner

daler commented Dec 29, 2019

Sorry for the late reply . . .

GFF and GTF files don't have a defined order, so in the general case you'd need to read the whole file even if you only want a subset. Even if you know your files are sorted, gffutils doesn't have a good mechanism of extracting the regions you want without first entering them into the database.

If your plotting requires a gffutils database but only of a small genomic region, I would use bedtools or pybedtools to do the subsetting first, and then only make a database on the subset.

Here's some code to get you started:

import pybedtools
import gffutils

gff = pybedtools.BedTool('full_genome_annotation.gff')

def db_for_region(region):

    # only the GFF features that intersect regions.
    # May want u=True to keep entire GFF features,
    # or keep as-is to truncate them to the region
    subset = gff.intersect(region)

    # create database on subset
    return gffutils.create_db(subset.fn, ':memory:')

region = [pybedtools.Interval('chr1', 1, 100)]
# or
region = pybedtools.BedTool('chr1 1 100', from_string=True)

db = db_for_region(region)

@daler daler closed this as completed Dec 29, 2019
@lldelisle
Copy link
Author

Many thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants