Skip to content

greymonroe/polymorphology2

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

38 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

polymorphology2

update

Overview

polymorphology2 is an R package offering a general toolkit to efficiently handle genomic data. The primary focus of these functions is to enable the analysis of polymorphisms in the context of various genome features, such as gene bodies, epigenome enrichments, and SBS mutation profiles.

This package provides critical functionalities for identifying overlaps between genome features, and sites and features, with capabilities like calculating the distribution of sites across all genome features. It encompasses various functions routinely used by our lab, including filtering somatic mutations identified by strelka2, constructing windows around genome features like genes, and evaluating the enrichment of ChIPseq experiments in specific genome features.

Example Functions

The package includes the following core functions, among others:

  1. bedGraph_total(): Calculates the total depth of a bedGraph file.
  2. feature_windows(): Constructs windows around features.
  3. features_chip_enrich(): Calculates the enrichment of ChIPseq experiments in genome features.
  4. features_in_features(): Finds overlaps between different feature sets.
  5. features_in_sites(): Finds the features for given sites.
  6. motif_hunter(): Searches for a specific motif in the provided sequence.
  7. plot_feature_windows(): Plots feature windows.
  8. plot_tricontexts(): Plots trinucleotide contexts.
  9. read.GFF(): Reads GFF files.
  10. read.VCF(): Reads VCF files.
  11. read.bedGraph(): Reads bedGraph files.
  12. sites_in_features(): Finds the sites that are located within features.
  13. strelka2_filter(): Filters somatic mutations called by Strelka2.
  14. tricontexts(): Finds the trinucleotide context for given mutations.

Installation

The package can be installed directly from GitHub using devtools with:

devtools::install_github("greymonroe/polymorphology2")

Data Structure

The package works primarily with two kinds of objects:

  1. Features - These are data.table objects comprising CHROM, START, STOP, and ID columns.
  2. Sites - These are data.table objects encompassing CHROM, POS, and ID columns.

Both sites and features can incorporate other columns, which can be leveraged for various computations (e.g., calculating the average depth of ChIP results).

Future Development

Future updates will include:

  • Parsing (extract GT, DP, etc into columns) for VCF files with particular formats (e.g., variants identified with DeepVariant, Strelka2, pbsv, HaplotypeCaller) and additional functions for VCFs with multiple samples.
  • Nmer amino acid frequencies across protein sequence
  • Addition of user tutorials for a more comprehensive understanding of package usage and functionalities.

Dependencies

The package relies on these R packages (automatically installed):

  • data.table
  • ggplot2
  • seqinr
  • vcfR
  • stringr

License

This package is open-source and free to use, modify, and repurpose as needed.

Contact

For any issues or inquiries, feel free to reach out to the maintainer:

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages