Skip to content

Latest commit

 

History

History
executable file
·
384 lines (330 loc) · 39.8 KB

Biology.md

File metadata and controls

executable file
·
384 lines (330 loc) · 39.8 KB

Bioinformatics, genomics, agriculture, food science, medicine, genetic engineering, etc...


  • longevity :: A tool to model global population under various life extension scenarios.
  • population_pyramid :: Uses matplotlib to create a population pyramid graph.

BIOTOOLS

  • annotations :: Multimodal stimulus annotation in Python.
  • antz :: Bioinformatic tools for biologists, made pythonic!
  • BioBERT :: a pre-trained biomedical language representation model. https://doi.org/10.1093/bioinformatics/btz682.
  • bionetwork :: Graph Database, a network of everything bio.
  • Bionitio :: provides a template for command line bioinformatics tools in various programming languages.
  • biostar-central :: The software that runs the Biostars Bioinformatics Q&A.
  • Bio_Eutils :: The standalone version of the Entrez and Medline BioPython modules.
  • bipy :: Lightweight bioinformatics pipeline tools using iPython.
  • CloudBioLinux : configure virtual (or real) machines with tools for biological analyses. Source Code
  • chowda :: Python module for analysis of CLAMS (comprehensive lab animal monitoring system) data.
  • Cosmos :: A Python library for Workflow Management System that tracks massively parallel computing clusters as well as cloud-based services. Download it here.
  • CTDopts :: CTDopts is a module for enabling tools with CTD reading/writing, argument parsing, validating and manipulating capabilities.
  • Encode-dataframe :: Convert UCSC's ENCODE metadata into pandas DataFrames.
  • Galaxy is an open, web-based platform for data intensive biomedical research. Use it online
  • rabix :: Reproducible Analyses for Bioinformatics.
  • samtools-trio-nexus :: An applet for running samtools on trios (child and both parents) on DNAnexus.
  • tiny-test-data :: Super small biological datasets for unit testing.

Docker


EMR

  • AuShadha :: AuShadha (औषध) means medicine in Sanskrit. This is a Electronic Medical Records (EMR) and Public Health Management for small clinics written in Django and Dojo.

ECOLOGY

Bioacoustics

  • Chirp :: A set of related tools for pitch-based analysis and comparison of bioacoustic signals. Source code

EPIDEMIOLOGY

  • Epibayes :: (Rudimentary) tools for epidemiological modeling w/Bayesian statistics.
  • Epipy :: Epipy is a Python package for epidemiology.
  • episounds :: A sonification of human to human transmission of Middle East Respiratory Syndrome Coronavirus.
  • Epitopes → Python interface to immunology and bioinformatics datasets (i.e. IEDB, cancer antigens, TCGA mutant peptides).


GENOMICS

  • allbiotc2 :: Benchmark pipeline for Structural Variation analyses, funded by the ALLBio.
  • anhima :: Analyse genetic variation
  • ASAP :: Amino-Acid Sequence Annotation Predictor.
  • bamslider :: Sliding windows in BAM/SAM files with Python's deques.
  • Bamsurgeon :: Tools for adding mutations to existing .bam files, used for testing mutation callers.
  • batch_clustalo :: Multiple Sequences Alignments in Batch.
  • bcbio-nextgen is validated, scalable, community developed variant calling and RNA-seq analysis. Documentation
  • biomartpy :: Simple interface to BioMart (Python -> rpy2 -> R/BioConductor's biomaRt).
  • BioSeq is a python lib for Sequence Alignment Map (SAM), a standard data storage format for DNA sequencing.
  • brat :: Brat rapid annotation tool - for all your textual annotation needs http://brat.nlplab.org
  • BreakSeq2 :: Ultrafast and accurate nucleotide-resolution analysis of structural variants.
  • Chanjo :: This package provides a better way to analyze coverage data in clinical sequencing. Source Code.
  • chrom_sweep :: Sweep-line algorithm for genomic features. Detect overlaps on large files w/ minimal memory.
  • codachrom :: Chromosomal copy number tools.
  • CompleteGenomicsTools → Complete Genomics provides whole-genome sequencing using DNA nanoball arrayed sequencing. Software for manipulating and visualizing Complete Genomics data, with a focus on cancer
  • CaPSID (Computational Pathogen Sequence IDentification) :: A comprehensive open source platform which integrates a high-performance computational pipeline for pathogen sequence identification and characterization in human genomes and transcriptomes together with a scalable results database and a user-friendly web-based software application for managing, querying and visualizing results. Source code and the Documentation Wiki.
  • cyvcf :: A fast Python library for VCF files leveraging Cython for speed.
  • collage-dicty :: Gene prioritization by compressive data fusion and chaining. Paper.
  • ETE - Environment for Tree Exploration :: A python programming toolkit that assists in the automated manipulation, analysis and visualization of any type of hierarchical trees. This includes phylogenetic trees, clustering results and profile-based trees. It supports node annotation, programatic tree drawing, circular visualization, SVG, PNG and PDF image rendering, and more! View the source code on github.
  • figmop :: Finding Genes using Motif Patterns.
  • fusenet :: Gene network inference by fusing data from diverse distributions. Paper.
  • Gemini :: A lightweight db framework for disease and population genetics.
  • genomeEdit :: Python script to count SMRT sequenced reads of amplicons generated from genome editing using CRISPR or TALEN methods.
  • gffutils is a Python package for working with and manipulating the GFF and GTF format files typically used for genomic annotations. Documentation.
  • gtf-parse-off :: Experiments with parsing gene transfer format (GTF).
  • gvl_flavor :: Genomics Virtual Lab (GVL) flavor for CloudBioLinux.
  • g-quad :: Calculates the G-quadraplex score for a FASTA file of sequences.
  • Harvest is a suite of core-genome alignment and visualization tools for quickly analyzing thousands of intraspecific microbial genomes. It includes Parsnp, a fast core-genome multi-aligner, and Gingr, a dynamic visual platform. Combined they provide interactive core-genome alignments, variant calls, recombination detection, and phylogenetic trees.
  • HasBin :: A project for keeping track of genes/diagnoses and custom annotations.
  • hapmuc :: A somatic mutation caller, which can utilize the information of heterozygous germline variants near candidate mutations.
  • hts-python :: pythonic wrapper for libhts.
  • HTSeq :: A framework to process and analyze data from high-throughput sequencing (HTS) assays. Documentation.
  • khmer :: In-memory nucleotide sequence k-mer counting, filtering, graph traversal and more. Documentation
  • KGML :: Parse, manipulate, download, and visualise KGML (KEGG markup language) biological pathway data.
  • KneadData :: A tool designed to perform quality control on metagenomic sequencing data, especially data from microbiome experiments.
  • MAF :: a light framework to pipeline short read mapper/aligner testing.
  • metaseq :: A Framework for integrated analysis and plotting of ChIP/RIP/RNA/*-seq data. Documentation.
  • methtools :: Tools for the processing of genome-wide bisulfite sequencing data.
  • MetaSVMerge :: An accurate method-aware merging algorithm for structural variations.
  • mmgenome :: Tools for extracting individual genomes from metagneomes.
  • MixClone :: A mixture model for inferring tumor subclonal populations.
  • nanopore-scripts :: Various scripts and recipes for working with nanopore data.
  • nexus-fusetester:: A simple applet to test use of python-llfuse to access files within a DNAnexus node.
  • Oncotator :: A web application for annotating human genomic point mutations and indels with data relevant to cancer researchers. See, http://www.broadinstitute.org/oncotator
  • PAAP :: Preprocessing and Alignment Pipeline (PAAP) using Slurm array jobs.
  • peddy :: An API for dealing with pedigree files.
  • piper :: A genomics pipeline build on top of the GATK Queue framework.
  • Platypus :: The Platypus variant caller.
  • poretools :: A toolkit for working with Oxford nanopore data.
  • PyBEDtools is a Python wrapper for Aaron Quinlan's BEDtools programs, which are widely used for genomic interval manipulation or "genome algebra". Pybedtools extends BEDTools by offering feature-level manipulations from with Python. See the online documentation, including installation instructions.
  • PyVCF :: A Variant Call Format reader for Python. Documentation.
  • pygenetorrent → A Python client for GeneTorrent.
  • rbac-23andme-oauth2 :: Genetic Access Control using the 23andME API.
  • rubra :: Infrastructure code to support DNA pipeline.
  • scurgen :: A tool for detecting patterns in genomic data with space filling curves.
  • seq2seq-attn :: Sequence-to-sequence model with LSTM encoder/decoders and attention. http://nlp.seas.harvard.edu/code
  • seqscripts :: Scripts for parsing the output from the Tophat/Cufflinks/Cuffcompare pipeline into other formats and analyzing and editing GTF files.
  • singlecell data analysis incubator for analysing single-cell data generated by the method outlined here: http://biorxiv.org/content/early/2014/03/05/003236
  • Smash :: A benchmarking toolkit for variant calling.
  • theprimerdirective :: A Python interface to Primer3.

DNA

  • dna-traits :: A fast 23andMe genome text file parser.
  • PyPore :: This package is based off of a few core data analysis packages in order to provide a consistent and easy framework for handling nanopore data in the UCSC nanopore lab.
  • Railroadtracks :: A Python package to handle connected computation steps for DNA and RNA Seq.

RNA

  • findorf :: ORF prediction of de novo transcriptome assemblies and contig annotation tool designed to be non-model organism-friendly.
  • flotilla :: A Python package for visualizing transcriptome (RNA expression) data from hundreds of samples - Reproducible machine learning analysis of gene expression and alternative splicing data. Documentation.
  • gimme :: A lightweight reference-guided Alignment-based assembler for transcriptome analysis.
  • RNASeqReadSimulator :: A simple tool to generate simulated single-end or paired-end RNA-Seq reads. Source Code.
  • tcgaparse :: Python Scripts to Parse TCGA data.
  • Wikipedia's list of RNA-Seq bioinformatics tools, not all of which are in Python, but they may have an API one can use.
  • YAP :: An extensible parallel framework, written in python using openmpi libraries that allows researchers to quickly build high throughput big data pipelines without extensive knowledge of parallel programming.
Research Publications
Resources

Jupyter notebooks/Cookbooks, tutorials and learning materials from Workshops, hackathon codebases, etc..



MICROBIOLOGY

  • Ebola :: Data for the 2014 Global Ebola outbeak.
  • fylm :: Extracts data from Fission Yeast Lifespan Microdissector images.
  • molecbio :: Molecular biology informatics functions.
  • patternHmm :: Matching patterns of symbols using a profile Hidden Markov Model.
  • phageParser :: A parser to extract the relevant data from the PhagesDB Mycobacteriophage database.
  • ProkaryMetrics :: Visualize and analyze 3D biofilm data from fluorescent micrographs. http://justicelab.org/pkm
  • burrito :: Python framework for controlling command-line applications.
  • burrito-fillings :: Application controllers for command line bioinformatics applications
  • emperor :: Emperor a tool for the analysis and visualization of large microbial ecology datasets. http://emperor.colorado.edu
  • glowing-dangerzone :: Easy SQL connection handlers.
  • QIIME :: Quantitative Insights Into Microbial Ecology. Official repository for software and unit tests. Fork
  • mustached-octo-ironman :: Easy dispatched compute via in a Tornado environment.
  • my-microbes :: A set of tools for delivering personal microbiome results to individuals participating in microbiome sequencing studies.
  • PyNAST :: Python Nearest Alignment Space Termination tool - Official repository for software and unit tests.
  • qiita :: A QIIME databasing effort.
  • Platypus Conquistador :: A bioinformatic command line tool for the confirmation of the presence/absence of a specific taxon (or set of taxa) in environmental shotgun sequence reads.
  • pyqi :: Tools for developing and testing command line interfaces in Python.
  • scikit-bio :: Core objects, functions and statistics for working with biological data in Python. Source code.
  • StrainDB :: A database to store strain genomes.
  • tax2tree :: Automated taxonomy decoration onto a tree.
  • verman :: Python module version information.

MOLECULAR BIOLOGY

  • BioPython :: project is an international association of developers of freely available Python tools for computational molecular biology. Source code
  • bcbb :: An incubator (collection) of useful bioinformatics code related to biological analysis, primarily in Python and R. Blog
  • mdtraj :: A modern, open library for the analysis of molecular dynamics trajectories. Source code on GHub.
  • MMTK :: The Molecular Modelling Toolkit is an Open Source program library for molecular simulation applications.
  • MSMBuilder :: An open source software package for automating the construction and analysis of Markov State Models for Biomolecular Conformational Dynamics. MSMs are a powerful means of modeling the structure and dynamics of molecular systems, like proteins. Installation is easy via Anaconda : read the documentation, or the source code to report bugs and contribute patches.
  • tRNA_evo :: This project analyzes the evolution of tRNAs, from across the tree of life. How prevalent is anticodon switching?
Resources

Jupyter notebooks/Cookbooks, tutorials and learning materials from Workshops, hackathon codebases, etc..


NEUROSCIENCE

  • BrainImagingPipelines → Optimized Nipype pipelines for brain imaging.
  • brainx :: Tools for analysis of brain imaging-derived networks, based on NetworkX.
  • CodeNeuro :: Bringing neuroscience and data science together.
  • CogSlots :: Simulated Slot Machine for the behavioral sciences.
  • CPAC :: A configurable pipeline for rsfMRI/connectome analyses.
  • DCMPy :: A Python module for longitudinal surface-based DCM of fMRI data.
  • dicomsort :: A project to provide custom sorting and renaming of dicom files.
  • dipy → Diffusion MR Imaging in Python. Source code
  • ERPy :: An open-source neuroscience application for visualization and analysis of electroencephalographic data using the Python programming language.
  • fit_neuron :: A neuroscience python package for the estimation and evaluation of neural models from patch clamp neural recordings, including a library of spike distance metrics.
  • fmri-analysis-vm :: A VM setup for use in fMRI analysis and education.
  • ICA-AROMA :: Software package of ICA-AROMA; a data-driven method to identify and remove motion-related independent components from functional MRI data.
  • MIEN :: Python tool chain framework and data visualization suite for neuroscience. Source code
  • MNE :: Magnetoencephalography (MEG) and Electroencephalography (EEG) in Python. Source code
  • nengo :: A Python library for creating and simulating large-scale brain mode.
  • NeuroDebian. Source code
  • Neurosciences :: Computational Neurosciences repository.
  • NIPY developers group on GH.
  • nibabel :: Python package to access a cacophony of neuro-imaging file formats. Source code.
  • NIDM :: Neuroimaging Data Model (NIDM) describing neuroimaging data and provenance. http://nidm.nidash.org
  • nipype :: Workflows and interfaces for neuroimaging packages with a nightly builds.
  • nilearn is a machine learning tool for NeuroImaging in Python. Source code
  • Nitime :: Timeseries analysis for neuroscience data.
  • pipelines :: Neuroimgaing data processing pipelines used in the lab.
  • PredPy :: is a collection of IPython notebooks predicting multiple sclerosis functional composite (MSFC) disability from MRI scans in people with MS.
  • PyBrain and its installation
  • PyCogMo :: is a modular Python framework to develop computational experiments in Cognitive Neuroscience. It makes use of PyNN and adds task-level scheduling and facilities (learning and testing), and visualisation functions.
  • pycone :: Python in Computational Neuroscience
  • pydcemri :: Python module for processing dynamic contrast enhanced magnetic resonance imaging (DCE-MRI) data. Given a T1-weighted, dynamic, contrast-enhanced data set, a multiflip data set, and either an AIF or blood curves, produce maps of Ktrans, ve, and vp.
  • Pycortex :: A python-based toolkit for surface visualization in fMRI data. Source Code on Github.
  • pydicom :: Read, modify and write DICOM files with python code.
  • PYEZMINC :: is a python module to read and write MINC files.
  • PyMVPA → MultiVariate Pattern Analysis in Python. Source code
  • pyneurovault :: A python wrapper for the NeuroVault API (in dev) with documentation
  • PySurfer :: Cortical neuroimaging visualization in Python. Online documentation (stable) and Mailing list
  • PyView :: A project written in Python to perform experiments on learning and decision making used in the department of Neuroscience at Stony Brook University.

Neural Networks

  • biaxial-rnn-music-composition :: A recurrent neural network designed to generate classical music.
  • blocks :: A Theano framework for building and training neural networks.
  • cnn-vis :: Use CNNs to generate images. https://github.com/hexahedria/biaxial-rnn-music-composition :: A recurrent neural network designed to generate classical music.
  • Kayak :: a library for automatic differentiation with applications to deep neural networks.
  • nupic :: a Platform for Intelligent Computing, brain-inspired machine intelligence platform, and biologically accurate neural network based on cortical learning algorithms by Numenta.
  • nupic.studio :: NuPIC Studio is a powerful all­-in-­one tool that allows users create a HTM neural network from scratch, train it, collect statistics, and share it among the members of the community.
  • neurobank :: A simple, low-overhead data management system designed for neural and behavioral data, but could be used for other kinds of experiments.
  • optofit :: A python framework for fitting biophysical models to optically recorded neural signals.
  • remote_associates_test :: Neural network simulation of the Remote Associates Test.
  • thunder :: neural data analysis in spark.
Resources

Neuropsychology

  • OpenPsyc :: Open source Python scripts for Psychology and the Neurosciences.
  • posner :: The Posner task as a demo for workshop.
  • PsychoPy :: An open-source package for creating psychology stimuli in Python. PsychoPy combines the graphical strengths of OpenGL with the easy Python syntax to give psychophysics a free and simple stimulus presentation and control package. Source code
  • python-pyepl :: A module for coding psychology experiments in Python.
  • VisionEgg :: is another open-source package for creating psychology stimuli in Python, with a specific emphasis on visual stimuli. Source code.

Large-scale electrophysiological data analysis framework in Python.

  • phy :: Interactive electrophysiological data analysis package.

Pharma


OpenWorm

  • The OpenWorm project aims to build the first comprehensive computational model of the Caenorhabditis elegans (C. elegans), a microscopic roundworm. Read more on Wikipedia and their various projects on github.
  • PyOpenWorm :: Unified, simple data access library for data & facts about c. elegans anatomy.
  • org.geppetto :: Geppetto is a web-based multi-algorithm, multi-scale Systems Biology simulation platform engineered to support the simulation of complex biological systems and their surrounding environment.

Synthetic Biology

  • SynBio :: is a Python Synthetic Biology library collection of synthetic biology code.

Structural Biology

  • CheShift is a software for prediction of 13Cα and 13Cβ chemical shifts and validation of protein structures.
  • Halpain-Lab :: Automated analysis for Fluorescent Microscopy of iPSC and in-vitro cell culture images. Source Code.
Resources

Jupyter notebooks/Cookbooks, tutorials and learning materials from Workshops, hackathon codebases, etc..

  • SBioA :: An introduction to Structural Bioinformatics Algorithms and scientific computing using Python/PyMOL.
  • PymolWiki :: Guide for PyMol users.

Resources-Teaching

NotaBene:: Some resources and teaching aids listed here are not Python language specific but you may be able to find something common and useful that can be reused and shared with attribution if it is released under a CC-license.

Other Resources

  • NIH project reporter
  • ExPASy :: The SIB Bioinformatics Resource Portal which provides access to scientific databases and software tools (i.e., resources) in different areas of life sciences including proteomics, genomics, phylogeny, systems biology, population genetics, transcriptomics etc.
  • MSDS
  • Open WetWare :: A wiki for molecular biology protocols.
  • MolBio Tools :: Tools for molecular biology.
  • Protocol-online :: A Q&A portal for molecular biology protocols.