The Matrix Service consumes data from the HCA Data Store to dynamically generate cell by gene expression matrices. Users can select cells to include in their matrix by specifying metadata and expression value filters via the API. Matrices also include metadata per cell for which fields to include can be specified in the POST request. For a quick example to get started, try this Jupytner Notebook vignette.
For information on the technical architecture of the service, please see Matrix Service Technical Architecture.
The API is available at https://matrix.data.humancellatlas.org. The complete API documentation is available here.
Version v0
of the API exposes a bundle-centric interface and has been deprecated in favor of v1
which enables
querying interface and is described below. v0
will continue to be maintained for internal use purposes only.
Expression matrices are generated asynchronously for which results are retrieved via a polling architecture.
To request the generation of a matrix, submit a POST request to /v1/matrix
and receive a job ID. Use this ID to poll
/v1/matrix/<ID>
to retrieve the status and results of your request.
When requesting a matrix, users are required to select cells by specifying metadata/expression data filters. Optionally, they may also specify which metadata fields to include in the matrix, the output format and the feature type to describe. These 4 fields are defined in the body of the POST request and are described below:
{
"filter": {},
"fields": [
"string"
],
"format": "string",
"feature": "string"
}
To select cells, the API supports a simple yet expressive language for specifying complex metadata and expression data filters capable of representing nested AND/OR structures as a JSON object. There are two types of filter objects to achieve this:
{
"op": one of [ =, !=, >, <, >=, <=, in ],
"field": a metadata filter,
"value": string or int or list
}
{
"op": one of [ and, or, not ],
"value": array of 2 filter objects if op==and|or, filter object if op==not
}
These filter types can be recursively nested via the value
field of a logical filter.
Select all full length cells:
...
"filter": {
"op": ">=",
"value": "full length",
"field": "library_preparation_protocol.end_bias"
}
...
Select all cells from the "Single cell transcriptome analysis of human pancreas" project with at least 3000 genes detected:
...
"filter": {
"op": "and",
"value": [
{
"op": "=",
"value": "Single cell transcriptome analysis of human pancreas",
"field": "project.project_core.project_short_name"},
{
"op": ">=",
"value": 3000,
"field": "genes_detected"
}
]
}
...
The list of available filter names is available at /v1/filters
. To retrieve more information about a specific filter,
GET /v1/filters/<filter>
.
Users can specify a list of metadata fields to be exported with an expression matrix. By default,
all available metadata fields are included in the generated matrix. The list of available metadata
fields is available at /v1/fields
. More information about a specific field is available at /v1/fields/<field>
.
The metadata fields in the expression matrix come from either metadata submitted to the HCA along with
the raw data or from outputs of a secondary analysis pipeline. One important exception is cellkey
,
which is generated by the matrix service itself, as unique identifiers for cells are needed by the
matrix service but do not exist elsewhere in the HCA. The cellkey
created by the matrix service varies
by assay type. For Smart-seq2, the cellkey
is just the id of the cell's
cell suspension. For 10x, the
cellkey
is a hash of the "bundle" containing the data for the cell and the cell's barcode.
The Matrix Service supports generating matrices in the following 3 formats:
This list is also available at /v1/formats
with additional information for a specific format available at
/v1/formats/<format>
.
The Matrix Service also supports generating cell by transcript matrices in addition to cell by gene matrices. To select
the feature type, specify either gene
(default) or transcript
in the POST request. Note that some assay types are
incompatible with certain feature types. For example, if transcript
is selected as the feature type, data from 3'
assays will not be included in the generated matrix. The list of available features is available at /v1/features
with
additional information for a specific feature available at /v1/features/<feature>
.
- Python >= 3.6
- Terraform == 0.11.10
- Clone the
matrix-service
repo - Create a virtualenv (recommended)
- Install requirements
- Run tests
git clone https://github.com/HumanCellAtlas/matrix-service.git && cd matrix-service
virtualenv -p python3 venv
. venv/bin/activate
pip install -r requirements-dev.txt --upgrade
make test
To run unit tests, in the top-level directory, run make test
.
Functional tests test the end-to-end functionality of a deployed environment of the service. To set the deployment
environment for which the tests will run against, set the DEPLOYMENT_STAGE
environment variable to an existing
deployment name (predev
| dev
| integration
| staging
| prod
).
To run functional tests, in the top level directory, run make functional-test
.
To deploy the Matrix API/Chalice app from your local machine for development purposes:
cd chalice
make build && cd ..
./scripts/matrix-service-api.py