Skip to content
rishidev edited this page Jan 2, 2020 · 14 revisions

Welcome to the homepage for the Large Scale Genomics Work Stream, part of the Global Alliance for Genomics and Health. Led by Oliver Hofmann and Thomas Keane this Work Stream creates standardized methods for accessing large-scale genomic data (reads, variants, and expression data) by file-based, API-based, cloud-based, and distributed access.

To understand the role of Work Streams in GA4GH please visit the https://www.ga4gh.org/howwework.

This Work Stream meets at a high level quarterly, mainly focusing on the reporting on the developments of sub-groups to Driver Projects. The GA4GH strategic roadmap details the planned standards developments of this Work Stream. Minutes from the meetings are available here.

Task Teams

The work of the Large Scale Genomics Work Stream is mainly done in sub-groups that usually meet every four weeks. Each of these teams have leads. All meetings are minuted. Links to these are available for all to view.

File Formats

Chair: James Bonfield (Sanger), Vice-Chair: Louis Bergelson (Broad)

This team deals with the development and maintenance of standard file formats for the following:

Encrypted Container Formats

Chair: Alexander Senf, Vice-Chair: Rob Davies (Sanger)

There is also a team to look at encrypted versions of these formats. Meeting Minutes

Future of VCF

Chair: Yossi Farjoun (Broad), Vice-Chair: Daniel Cameron (Walter+Eliza Hall Institute of Medical Research)

This team looks at the longer term roadmap for variant container formats. Minutes are still part of the normal File Formats meeting. Some of the discussions here coincide with the File Formats group above, but they do have their own agenda/minuting document available as well. Meeting Minutes

htsget

Chair: Mike Lin (DNAnexus), Vice-Chair: Jerome Kelleher (University of Oxford)

A standardised non-file based API for securely streaming the above listed file formats

GitHub home - Meeting Minutes

RNASeq

Chair: Sean Upchurch

Developing scalable ways of storing and transmitting expression information related to RNASeq data

GitHub home - Meeting Minutes

refget API

Chair: Andy Yates (EMBL-EBI)

A framework to retrieve ‘reference sequences’ by a unique checksum, allowing users to retrieve such reference sequences without ambiguity from different databases and servers.

GitHub home - Meeting Minutes

Specifications are open to suggestions made via participation at meetings, and PRs and Issues raised using GitHub. For changes to be incorporated a majority decision is required by named Specification Maintainers, who are detailed in Maintainers.md documents in the repositories.

New Specification Maintainers are proposed by an existing Specification Maintainer and can then be accepted by majority vote of the existing Specification Maintainers, excluding the individual whose position is being replaced. The two Large Scale Genomics Work Stream leads must then approve the individual.

Clone this wiki locally