-
Notifications
You must be signed in to change notification settings - Fork 22
TuningBitsy.md
Since Bitsy is an in-memory database, you need to provide an appropriately high -Xmx setting to the java process.
Please make sure that this setting is not too big triggering an overflow of the virtual memory into physical memory. For example, a machine with 8GB RAM could easily support -Xmx6g if there are no other applications. But setting -Xmx to 8g could cause a severe drop in performance due to thrashing.
Transactions that perform a lot of reads must use the READ_COMMITTED isolation level. This is faster and consumes less memory than the REPEATABLE_READ isolation level. Small transactions are better off with the default isolation level of REPEATABLE_READ because it offers cleaner semantics to the application developer.
Please refer to the page on Optimistic Concurrency for more information about isolation levels.
BitsyGraph supports a simple constructor that just takes a database path as shown in the "Embedding Bitsy" section in the home page. In addition to this, Bitsy a supports a detailed constructor which takes four parameters:
- Path dbPath: This is the path to the database directory which must be created before launching the application. The directory should preferably not be used for other purposes.
- boolean allowFullGraphScans: If set to false, the methods getVertices() or getEdges() defined in BitsyGraph will throw an exception. The methods getVertices(key, value) and getEdges(key, value) will also throw exception unless a key index is defined for the given key. You can set this to false to ensure that all Bitsy operations are executed quickly. Defaults to true.
- int txLogThreshold: The transaction log threshold is the size of the transaction log in bytes upon reaching which the transaction log flusher moves the records to vertex and edge logs. A higher number indicates fewer file operations, but more disk space and startup time in the worst case. Refer to the "transaction log flush" algorithm in the Write Algorithms page. Defaults to 4MB.
- double reorgFactor: The reorganization factor. Reorganization is triggered only when the total number of new vertices and edges added is more than the factor multiplied by the original number of vertices and edges. Refer to the "vertex and edge log reorganization algorithm" algorithm in the Write Algorithms page. A reasonable range for this value is between 0.1 and 10. Defaults to 1.
All of the above parameters, except dbPath, can be changed in runtime. The application can programatically change the parameters using the setter methods in BitsyGraph.
These setter methods are also exposed as JMX attributes and can be changed using jconsole or an alternate JMX client.
Bitsy creates eight files under the database directory. Each file types has two files -- one ending with A.txt and one ending with B.txt.
- metaA/B.txt: These files hold the meta-level information about the database like the configured indexes, etc. These files are very small and rarely modified.
- txA/B.txt: These files hold the transactional logs committed to the database. These files are small-medium sized, based on the txLogThreshold setting. They are frequently written to, read from, and "forced to disk".
- vA/B.txt: These files hold the vertex logs. They are large append-only files.
- eA/B.txt: These files hold the edge logs. They are large append-only files.
To get the best performance, the database folder must be mapped to the fastest disk/RAID available to the server. If there is more than one available disk, you can partition the files (using softlinks or symlinks) as follows:
- 2 disks: txA/B.txt on the first, other files on the second. This spreads out the write activity.
- 3 disks: txA/B.txt on the first, vA and eA.txt on the second and vB and eB.txt on the third. This spreads out the read as well as write activity. txA/B.txt files are small and will be cached in memory.
All write benchmarks for Bitsy are based on the files mapped to a single 7200rpm hard disk.
BitsyGraph supports an empty constructor that creates a non-durable memory-only graph. This is could be useful for unit tests and non-durable applications. It is similar to TinkerGraph, the reference implementation for Blueprints, but implements optimistic concurrency control.