Refactor mapping of files to memory locations #4952

TheMarex · 2018-03-12T10:49:27Z

Currently the logic to map a section of a file to a location in memory is hand written in storage.cpp for every file in the function PopulateLayout. This is tedious and error prone as it assume this logic needs to be adapted every time we change the structure of the files.

To change this we should capture the logic of figuring out the blocks of memory contained in each file to an own function.

Unblocks mapping separate files in #1947 and makes #4873.

The text was updated successfully, but these errors were encountered:

TheMarex · 2018-03-13T12:56:18Z

When thinking how to implement this it became clear very fast that the main blocker for us is that our on-disk format is relatively unstructured and does not allow us to compute the size of memory segments and read/map the data in a very general way.

To address these problems we need to change our file format. Seems like #2242 is rearing its ugly head again. Our new constraints would be:

List content of file and its size
Content of file mmap-able to memory (e.g. no decoding step with memory buffer)
Writable in a streamable way
Data should have a human-readable name
Data should be hierarchical

This already sound very much like a tar file and any implementation of it would probably be very close to a tar-lite format. However using tar files would additionally solve a lot a problems around tooling, they would be easy to inspect, extend and modify. Eventually we don't need to care about how data is packaged, we can accept any split of data.

Storage

Our investment in using a general abstraction layer with FileWriter and FileReader seems to pay off: We can swap out the current implementation to write named data. A low-level library to read/write tar files we can use microtar which we can easily wrap/bundle as third-party library.

We can use a "filesystem" like structure to get a human-readable hierarchical representation of data. For example the current .osrm.mldgr file could be represented as:

/mld/multilevelgraph/node_array
/mld/multilevelgraph/edge_array
/mld/multilevelgraph/node_to_edge_offsets
OSRM_VERSION

Memory

In conjunction with the new on-disk format we can now make osrm-datastore a lot more "stupid" as it only needs to care about the following things:

Discovering data in files
Building an index where to find the data in files
Allocating enough in-memory storage (potentially using multiple memory blocks)
Building an index to find the data in memory
Reading the data to memory

The last three steps would be optional if we go for an mmap based approach in the future.

Data organization

Using hierarchical naming makes it very easy to implement loading multiple datasets/profiles using osrm-datastore using namespaces:

osrm-datastore bike_data.osrm --name=bike
osrm-datastore walk_data.osrm --name=walk

Would create data in the namespaces:

/bike/ch/*
/bike/*
/walk/ch/*
/walk/*

How this data is split up between shared memory segments could be determined inside osrm-datastore by easy rules like all files matching */metric/routability/* get an own shared memory segment.

Impact

Moving on this refactor will make a range of issues much easier to implement:

#4007
#10
#2242
#4873

/cc @oxidase @danpat

TheMarex · 2018-04-06T00:06:12Z

This shipped.

TheMarex added the Refactor label Mar 12, 2018

TheMarex self-assigned this Mar 12, 2018

TheMarex mentioned this issue Mar 13, 2018

Refactor shared memory layout to expose Block as interface #4955

Merged

2 tasks

danpat added the Refactor 🐢🔥 label Mar 13, 2018

TheMarex mentioned this issue Mar 15, 2018

Use tar-format to encapsulate data #4960

Merged

3 tasks

TheMarex mentioned this issue Mar 26, 2018

Refactor shared memory data layout to use named identifiers #4975

Merged

4 tasks

TheMarex closed this as completed Apr 6, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor mapping of files to memory locations #4952

Refactor mapping of files to memory locations #4952

TheMarex commented Mar 12, 2018

TheMarex commented Mar 13, 2018

TheMarex commented Apr 6, 2018

Refactor mapping of files to memory locations #4952

Refactor mapping of files to memory locations #4952

Comments

TheMarex commented Mar 12, 2018

TheMarex commented Mar 13, 2018

Storage

Memory

Data organization

Impact

TheMarex commented Apr 6, 2018