bio-vcf can output many types of formats. In this exercise we will load a triple store (4store) with VCF data and do some queries on that.
See https://github.com/pjotrp/guix-notes/blob/master/packages/4store.org
Get root
su
apt-get install avahi-daemon
apt-get install raptor-utils
exit
As normal user
guix package -i sparql-query curl
Initialize and start the server again as root (or another user)
su
export PATH=/home/user/.guix-profile/bin:$PATH
mkdir -p /var/lib/4store
dbname=test
4s-backend-setup $dbname
4s-backend $dbname
4s-httpd -p 8000 $dbname
Try the web browser and point it to http://localhost:8000/status/
Open a new terminal as user.
Generate rdf with bio-vcf template
=HEADER
@prefix : <http://biobeat.org/rdf/ns#> .
=BODY
<%
id = ['chr'+rec.chr,rec.pos,rec.alt].join('_')
%>
:<%= id %>
:query_id "<%= id %>";
:chr "<%= rec.chr %>" ;
:alt "<%= rec.alt.join("") %>" ;
:pos <%= rec.pos %> .
so it looks like
:chrX_134713855_A
:query_id "chrX_134713855_A";
:chr "X" ;
:alt "A" ;
:pos 134713855 .
and test with rapper using gatk_exome.vcf
cat gatk_exome.vcf |bio-vcf -v --template rdf_template.erb
cat gatk_exome.vcf |bio-vcf -v --template rdf_template.erb > my.rdf
rapper -i turtle my.rdf
Load into 4store (when no errors)
rdf=my.rdf
uri=http://localhost:8000/data/http://biobeat.org/data/$rdf
curl -X DELETE $uri
curl -T $rdf -H 'Content-Type: application/x-turtle' $uri
201 imported successfully
This is a 4store SPARQL server
First SPARQL query
SELECT ?id
WHERE
{
?id <http://biobeat.org/rdf/ns#chr> "X".
}
cat sparql1.rq |sparql-query "http://localhost:8000/sparql/" -p
┌──────────────────────────────────────────────┐
│ ?id │
├──────────────────────────────────────────────┤
│ <http://biobeat.org/rdf/ns#chrX_107911706_C> │
│ <http://biobeat.org/rdf/ns#chrX_55172537_A> │
│ <http://biobeat.org/rdf/ns#chrX_134713855_A> │
└──────────────────────────────────────────────┘
A simple python query may look like
import requests
import subprocess
host = "http://localhost:8000/"
query = """
SELECT ?s ?p ?o WHERE {
?s ?p ?o .
} LIMIT 10
"""
r = requests.post(host, data={ "query": query, "output": "text" })
# print r.url
print r.text
renders
?id
<http://biobeat.org/rdf/ns#chrX_107911706_C>
<http://biobeat.org/rdf/ns#chrX_55172537_A>
<http://biobeat.org/rdf/ns#chrX_134713855_A>
A working example if you are using the server http://guix.genenetwork.org and the correct PREFIX:
#! /usr/bin/env python
import requests
import subprocess
host = "http://guix.genenetwork.org/sparql/"
query = """
PREFIX : <http://biobeat.org/rdf/pjotr/ns#>
SELECT ?id ?chr ?pos ?alt
WHERE
{
{ ?id :chr "X" . }
UNION
{ ?id :chr "1" . }
?id :chr ?chr .
?id :alt ?alt .
?id :pos ?pos .
FILTER (?pos > 107911705) .
}
"""
r = requests.post(host, data={ "query": query, "output": "text" })
print r.text
EBI SPARQL has some advanced examples of queries, such as
https://www.ebi.ac.uk/rdf/services/ensembl/sparql
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX dcterms: <http://purl.org/dc/terms/>
PREFIX dc: <http://purl.org/dc/elements/1.1/>
PREFIX obo: <http://purl.obolibrary.org/obo/>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX sio: <http://semanticscience.org/resource/>
PREFIX faldo: <http://biohackathon.org/resource/faldo#>
PREFIX identifiers: <http://identifiers.org/>
PREFIX ensembl: <http://rdf.ebi.ac.uk/resource/ensembl/>
PREFIX ensembltranscript: <http://rdf.ebi.ac.uk/resource/ensembl.transcript/>
PREFIX ensemblexon: <http://rdf.ebi.ac.uk/resource/ensembl.exon/>
PREFIX ensemblprotein: <http://rdf.ebi.ac.uk/resource/ensembl.protein/>
PREFIX ensemblterms: <http://rdf.ebi.ac.uk/terms/ensembl/>
SELECT DISTINCT ?transcript ?id ?typeLabel ?reference ?begin ?end ?location {
?transcript obo:SO_transcribed_from ensembl:ENSG00000139618 ;
a ?type;
dc:identifier ?id .
OPTIONAL {
?transcript faldo:location ?location .
?location faldo:begin [faldo:position ?begin] .
?location faldo:end [faldo:position ?end ] .
?location faldo:reference ?reference .
}
OPTIONAL {?type rdfs:label ?typeLabel}
}
See https://www.ebi.ac.uk/rdf/services/ensembl/sparql
Today's exercise is to create a graph using bio-vcf and/or a small program using RDF triples and define a SPARQL query.
The more interesting the graph/SPARQL the better.