Optimized bidirected sequence graph implementations for graph genomics
The main purpose of libbdsg
is to provide high performance implementations of sequence graphs for graph-based pangenomics applications. The repository contains two graph implementations with different performance tradeoffs:
- HashGraph: prioritizes speed
- PackedGraph: prioritizes low memory usage
Previously, a third implementation, ODGI, was provided, but that implementation is now part of its own odgi project.
All of these graph objects implement a common interface defined by libhandlegraph
, so they can be used interchangeably and swapped easily.
Additionally, libbdsg
provides a few "overlays", which are applied to the graph implementations in order to expand their functionality. The expanded functionality is also described generically using libhandlegraph
interfaces.
libbdsg
is written in C++. Using the instructions below, it is also possible to generate Python bindings to the underlying C++ library. The Python API is documented here. The documentation also includes a tutorial that serves as a useful introduction to libhandlegraph
and libbdsg
concepts.
A journal article that discusses the implementation and functionality of libbdsg
is available under the following citation:
Eizenga, JM, Novak, AM, Kobayashi, E, Villani, F, Cisar, C, Heumos, S, Hickey, G, Colonna, V, Paten, B, Garrison, E. (2020) Efficient dynamic variation graphs. Bioinformatics. doi:10.1093/bioinformatics/btaa640.
The peer-reviewed article was drafted in this GitHub respository, and a preprint is avilable here.
There are several ways to install libbdsg.
If you only want the Python bindings (bdsg
module), you can install via pip
:
pip install bdsg
Full CMake-based installation instructions, including tips on dependency installation, are available in the documentation. A basic guide is provided here.
When obtaining the source repo, make sure to clone with --recursive
to get all the submodules:
git clone --recursive https://github.com/vgteam/libbdsg.git
cd libbdsg
With CMake, we are able to build Python bindings that use pybind11
. However, we only support out-of-source builds from a directory named build
, and we still put the built artifacts in lib
in the main project directory.
To run a CMake-based build:
mkdir build
cd build
cmake ..
make -j 8
If the build fails, the Python bindings may be out of date with respect to the source files. See PYBIND_README.md for instructions on updating them. You may also need to install Doxygen. If you cannot install Doxygen, you can bypass the Doxygen portion of the build with cmake .. -DRUN_DOXYGEN=OFF
.
The documentation for libbdsg
is built using Sphinx, and will invoke the CMake-based build process if not already run. To build it, from the main project directory:
# Install Sphinx
virtualenv --python python3 venv
. venv/bin/activate
pip3 install -r bdsg/docs/requirements.txt
# Build the documentation
make docs
The documentation can then be found at docs/_build/html/index.html
.
libbdsg
has a few external dependencies:
The build process with make
assumes that these libraries and their headers have been installed in a place on the system where the compiler can find them (e.g. in CPLUS_INCLUDE_PATH
).
The libbdsg-easy
repository provides a simple method to coordinate these dependencies for a make
build using git
submodules.
The following commands will create the libbdsg.a
library in the lib
directory.
git clone https://github.com/vgteam/libbdsg.git
cd libbdsg
make -j 8
To install system-wide (in /usr/local/
):
make install
Or to install in an alternate location:
INSTALL_PREFIX=/other/path/ make install