gStore is an open-source graph database engine (or "triple store") born for managing large RDF datasets with the SPARQL query language. It works with Linux systems and amd64, arm64, and loongarch processors. gStore is a collaborative effort between the Data Management Lab of Peking University, University of Waterloo, and awesome contributors from the open-source community.
🔑 gStore is released under the BSD 3-Caluse License, with several third-party libraries under their own licenses. Check LICENSE for details.
🐛 Check out FAQ for frequently asked questions. Known bugs and limitations are listed in BUGS and LIMIT. If you find any bugs, please feel free to open an issue.
🎤 If you have any questions or suggestions, please open a thread in GitHub Discussions.
📖 For recommendations, project roadmap, and more, check online documentation.
The formal help document is in English(EN) and 中文(ZH).
The formal experiment result is in Experiment.
We have built an IRC channel named #gStore on freenode, and you can visit the homepage of gStore.
gStore has been uploaded to gitee (code cloud), which is recommended for faster download for users in mainland China. The website is https://gitee.com/PKUMOD/gStore.
You can also open https://github.com/pkumod/gStore, download gStore.zip, then decompress the zip package.
$ docker pull pkumodlab/gstore-docker:latest
Complete instruction documentation is on the Docker Deployment Instructions.
To compile gStore, first clone the repository:
git clone https://github.com/pkumod/gStore.git
Complete instruction documentation is on the Installation Instructions.
N-Triple Data format introduction
RDF data should be provided in n-triple format (XML is not currently supported), and queries must be provided in SPARQL1.1 syntax. The following is an example of the n-triple format file:
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
_:a foaf:name "Johnny Lee Outlaw" .
_:a foaf:mbox <mailto:jlow@example.com> .
_:b foaf:name "Peter Goodguy" .
_:b foaf:mbox <mailto:peter@example.org> .
_:c foaf:mbox <mailto:carol@example.org> .
Triples are typically stored in the W3C-defined NT file format and represent three RDF data, where the values wrapped in <
and >
are urIs of an entity, and the values wrapped in '"" are literals representing the value of an attribute of the entity, followed by'^^
to indicate the type of the value. The following three RDF data points represent two attributes of John
, gender
and age
, with values of male
and 28
respectively. The last one indicates that John
and Li
have a friend
relationship.
<John> <gender> "male"^^<http://www.w3.org/2001/XMLSchema#String>.
<John> <age> "28"^^<http://www.w3.org/2001/XMLSchema#Int>.
<John> <friend> <Li>.
More specific information about N-Triple please check N-Triple. Not all syntax in SPARQL1.1 is parsed and answered in gStore; for example, property paths are beyond the capabilities of the gStore system.
Initialize the system database
bin/ginit
Create database
bin/gbuild -db lubm -f data/lubm/lubm.nt
Database list
bin/gshow
Database query
bin/gquery -db lubm -q data/lubm/lubm_q0.sql
Complete instruction documentation is on the Quick Start.
If you use gStore in your research, please cite the following paper:
@article{zou2014gstore,
title={gStore: a graph-based SPARQL query engine},
author={Zou, Lei and {\"O}zsu, M Tamer and Chen, Lei and Shen, Xuchuan and Huang, Ruizhe and Zhao, Dongyan},
journal={The VLDB journal},
volume={23},
pages={565--590},
year={2014},
publisher={Springer}
}
Or cite this repository:
@misc{gStore,
author = {gStore Authors},
title = {gStore},
year = {2021},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/pkumod/gStore}},
}
1.2(stable):2023-11-11
New features in gStore 1.2 are listed as follows:
- Optimizing ORDER BY statements: streamlining the execution logic of ORDER BY, removing unnecessary type judgments and conversions, and significantly improving execution efficiency.
- Optimized Build Module: Supports building empty libraries.
- Optimizing the Triple Parser: Supports pure numeric IRIs, IRIs consisting only of numbers and letters, and IRIs starting with numbers.
- New API interfaces: gStore 1.2's ghttp and gRPC services have added five interfaces for uploading files, downloading files, counting system resources, renaming, and obtaining backup paths.
- New built-in advanced functions: gStore 1.2 version adds seven advanced functions, namely single source shortest path (
SSSP
,SSSPLen
), label propagation (labelProp
), weakly connected component (WCC
), global/local clustering coefficient (clusteringCoeff
), louvain algorithm (louvain
), K-hop count (kHopCount
), and K-hop neighbor (kHopNeighbor
). - Added support for calling
CONCAT
functions inSELECT
statements. - Optimizing some local commands and API interfaces: Optimizing the local command gconsole, optimizing the interfaces for building, loading, and statistical graph databases, and fixing potential bugs that may lead to memory leaks.
- Support for Multiple Data Formats: Added support for multiple formats such as Turtle, TriG, RDF/XML, RDFa, and JSON-LD.
- Optimization of custom graph analysis algorithm editing function: Redesign the interface of the custom graph analysis algorithm editing function, optimize the dynamic compilation algorithm, and improve compilation efficiency.
- Bug fixes: Fixed a series of bugs.
1.0:2022-10-01
New features in gStore 1.0 are listed as follows:
- Support of user-defined graph analysis functions: users can manage their own graph analysis functions through the API interfaces or the visual management platform gStore-workbench. Users can obtain the number of nodes and edges of the graph and neighbors of any given node, etc. through interface functions and use them as basic units to implement their own graph analysis functions. Dynamic compilation and execution of user-defined graph analysis functions are supported.
- The gRPC network interface service: gRPC is a high-performance network interface service based on HTTP protocol implemented based on the open source library
workflow
, which further improves the efficiency and stability of the interface service. Experiments show that gRPC achieves a great improvement in concurrent access performance compared with ghttp, the previous network interface; for example, in the case of 2000/QPS, the rate of denied access is 0%. - gConsole module: in gStore 1.0, we launched the gConsole module, which enables the long-session operation of gStore with contextual information.
- Decoupling of the optimizer and executor: gStore 1.0 decouples the optimizer and executor, converting from the original deeply coupled greedy strategy to a query optimizer based on dynamic programming and a query executor based on breadth-first traversal.
- Optimization of Top-K queries: We implemented a Top-K SPARQL processing framework based on the DP-B algorithm in gStore, including query segmentation and sub-result aggregation.
- Support of ACID transactions: by introducing the multi-version management mechanism, gStore 1.0 can start ACID transactions for insert and delete operations, which users can open, commit, and roll back. Currently gStore 1.0 supports four isolation levels: read-uncommitted, read-committed, repeatable read and serializable.
- Reconstruction of database kernel and optimization of the plan tree generation logic: in gStore 1.0, two types of join operations (worst-case-optimal joins and binary joins) are introduced to optimize query execution and further improve query efficiency.
- Optimized logging module: based on the log4cplus library, the system logs can be output in a unified format. Users can configure the log output mode (console output or file output), output format, and output level.
- New built-in advanced functions: gStore 1.0 supports four new advanced functions, namely triangleCounting, closenessCentrality, bfsCount and kHopEnumeratePath.
- Extended support for BIND statements: gStore 1.0 supports assigning values to variables using algebraic or logical expressions in BIND statements.
- Optimization of some local commands and API interfaces (e.g., the shutdown command), and fixing a series of bugs (e.g., more accurate gmonitor statistics).
0.9.1:2021-11-25
New features in gStore 0.9.1 are listed as follows:
- Decoupling the parsing and execution of queries in kernel, and further improvements on the query performance through optimized join ordering and other techniques. On complex queries, the performance is improved by over 40%.
- Rewriting of the HTTP service component, ghttp, with improved robustness and the addition of functions such as user permission, heartbeat detection, batch import, and batch deletion; API documents are added.
- Implementation of the Personalized PageRank (PPR) extension function, which can be invoked in the SELECT clause to calculate the correlation between entities.
- Support for arithmetic operations (e.g.,
?x + ?y = 5
) in the FILTER clause. - Support for transactional operations, such as begin, tquery (transactional query), commit, and rollback;
- A new executive component, gserver, is added to provide another pathway for remote access of gStore aside from the ghttp component, which implements two-way communication via the socket API.
- Unification of the format of command line arguments of executive components. The
--help
option is uniformly introduced (e.g.,$ bin/gbuild --help
or$ bin/gbuild -h
), by which users can view the command manual including the meaning of each option. - A number of bug fixes.
0.9:2021-02-10
New features in version 0.9 include:
- Upgrade of the SPARQL parser generator from ANTLR v3 to the newest, well-documented and well-maintained v4;
- Support for writing numeric literals without datatype suffixes in SPARQL queries;
- Support for arithmetic and logical operators in SELECT clause;
- Support for the aggregates SUM, AVG, MIN and MAX in SELECT clause;
- Additional support for built-in functions functions in FILTERs, including
datatype
,contains
,ucase
,lcase
,strstarts
,now
,year
,month
,day
, andabs
; - Support for path-related functions as an extension of SPARQL 1.1, including cycle detection, shortest paths and K-hop reachability;
- Support for full & incremental backup and recovery of databases, and automatic full backup can be enabled upon admin configuration;
- Support for log-based rollback opertions;
- Support for transactions with three levels of isolation: read committed, snapshot isolation and serializable;
- Expanding data structures to hold large-scale graphs of up to five billion triples.
If you want to understand the details of the gStore system, or you want to try some advanced operations(for example, using the API, server/client), please see the chapters below.
Bugs are recorded in BUG REPORT. You are welcomed to submit the bugs through Community Web questioning when you discover if they do not exist in this file.
We have written a series of short essays addressing recurring challenges in using gStore to realize applications, which are placed in Recipe Book.
You are welcome to report any advice or errors in the github Issues part of this repository, if not requiring in-time reply. However, if you want to urgent on us to deal with your reports, please email to gstore@pku.edu.cn to submit your suggestions and report bugs. A full list of our whole team is in Mailing List.
There are some restrictions when you use the current gStore project, you can see them on Limit Description.
Sometimes you may find some strange phenomena(but not wrong case), or something hard to understand/solve(don't know how to do next), then do not hesitate to visit the Frequently Asked Questions page.
Graph database engine is a new area and we are still trying to go further. Things we plan to do next is in Future Plan chapter, and we hope more and more people will support or even join us. You can support in many ways:
-
watch/star our project
-
fork this repository and submit pull requests to us
-
download and use this system, report bugs or suggestions
-
...
People who inspire us or contribute to this project will be listed in the Thanks List chapter.