-
Notifications
You must be signed in to change notification settings - Fork 3
Home
Welcome to the homepage for the Large Scale Genomics Work Stream, part of the Global Alliance for Genomics and Health. Led by Oliver Hofmann and Thomas Keane this Work Stream creates standardized methods for accessing large-scale genomic data (reads, variants, and expression data) by file-based, API-based, cloud-based, and distributed access.
To understand the role of Work Streams in GA4GH please visit the https://www.ga4gh.org/howwework.
This Work Stream meets at a high level quarterly, mainly focusing on the reporting on the developments of sub-groups to Driver Projects. The GA4GH strategic roadmap details the planned standards developments of this Work Stream. Minutes from the meetings are available here.
The work of the Large Scale Genomics Work Stream is mainly done in sub-groups that usually meet every four weeks. Each of these teams have leads. All meetings are minuted. Links to these are available for all to view.
Chair: James Bonfield (Sanger), Vice-Chair: Louis Bergelson (Broad)
This team deals with the development and maintenance of standard file formats for the following:
- standard read formats (BAM/CRAM/SAM)
- standard variant file formats (VCF/BCF) GitHub home - Meeting Minutes
Chair: Alexander Senf, Vice-Chair: Rob Davies (Sanger)
There is also a team to look at encrypted versions of these formats. Meeting Minutes
Chair: Yossi Farjoun (Broad), Vice-Chair: Daniel Cameron (Walter+Eliza Hall Institute of Medical Research)
This team looks at the longer term roadmap for variant container formats. Minutes are still part of the normal File Formats meeting. Some of the discussions here coincide with the File Formats group above, but they do have their own agenda/minuting document available as well. Meeting Minutes
Chair: Mike Lin (DNAnexus), Vice-Chair: Jerome Kelleher (University of Oxford)
A standardised non-file based API for securely streaming the above listed file formats
Chair: Sean Upchurch
Developing scalable ways of storing and transmitting expression information related to RNASeq data
Chair: Andy Yates (EMBL-EBI)
A framework to retrieve ‘reference sequences’ by a unique checksum, allowing users to retrieve such reference sequences without ambiguity from different databases and servers.
Specifications are open to suggestions made via participation at meetings, and PRs and Issues raised using GitHub. For changes to be incorporated a majority decision is required by named Specification Maintainers, who are detailed in Maintainers.md documents in the repositories.
New Specification Maintainers are proposed by an existing Specification Maintainer and can then be accepted by majority vote of the existing Specification Maintainers, excluding the individual whose position is being replaced. The two Large Scale Genomics Work Stream leads must then approve the individual.