Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[redux] gzip decompression inside vtquery #95

Merged
merged 3 commits into from
Jun 27, 2018
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions binding.gyp
Original file line number Diff line number Diff line change
Expand Up @@ -63,6 +63,9 @@
'./src/module.cpp',
'./src/vtquery.cpp'
],
'link_settings': {
'libraries': [ '-lz' ]
},
'ldflags': [
'-Wl,-z,now',
],
Expand Down
1 change: 1 addition & 0 deletions scripts/install_deps.sh
Original file line number Diff line number Diff line change
Expand Up @@ -20,3 +20,4 @@ install spatial-algorithms 0.1.0
install boost 1.65.1
install cheap-ruler 2.5.3
install vector-tile f4728da
install gzip-hpp 0.1.0
2 changes: 1 addition & 1 deletion scripts/setup.sh
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
set -eu
set -o pipefail

export MASON_RELEASE="${MASON_RELEASE:-d7b7c17}"
export MASON_RELEASE="${MASON_RELEASE:-bca9fe4}"
export MASON_LLVM_RELEASE="${MASON_LLVM_RELEASE:-5.0.0}"

PLATFORM=$(uname | tr A-Z a-z)
Expand Down
50 changes: 40 additions & 10 deletions src/vtquery.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,8 @@

#include <algorithm>
#include <exception>
#include <gzip/decompress.hpp>
#include <gzip/utils.hpp>
#include <iostream>
#include <map>
#include <mapbox/geometry/algorithms/closest_point.hpp>
Expand All @@ -28,9 +30,12 @@ const char* getGeomTypeString(int enumVal) {
return GeomTypeStrings[enumVal];
}

using materialized_prop_type = std::pair<std::string, mapbox::feature::value>;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for using std::pair here. What's supposed to be stored here? materialized_prop_type makes me think a name value pair representing a type of property. Also just curious why you use the term "materialized"?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like your comments later on in this PR answer my second question: de2e131#diff-d293222b917c4d9edf75c1288538554fR375

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that comment is the intended documentation for this. Basically I think we need to copy the data to safely use it outside of the scope of the std::string that represents the original decompressed tile buffer. I used materialized as a fancy way to describe this copy. Other terms could be standalone_prop_type or persistent_prop_type or something.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] Since T const& instead of const T& is the style used throughout this code file, it might make sense to use this style for pointers as well. So instead of const char* and static const char* you would use char const* and static char const* respectively. It looks weird to me but it's consistent!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alliecrevier not quite sure which chunk of code this comment refers to. Perhaps you meant to place against another line?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah must be

vtquery/src/vtquery.cpp

Lines 28 to 29 in f98b2b6

static const char* GeomTypeStrings[] = {"point", "linestring", "polygon", "unknown"};
const char* getGeomTypeString(int enumVal) {
(which I could not see in the github comment display since it only shows the first couple lines above the comment).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to not touch this code right now since the differences are a little above my knowledge to feel comfortable with. Let's discuss in a ticket: #96


/// main storage item for returning to the user
struct ResultObject {
std::vector<vtzero::property> properties_vector;
std::vector<materialized_prop_type> properties_vector_materialized;
std::string layer_name;
mapbox::geometry::point<double> coordinates;
double distance;
Expand All @@ -39,6 +44,7 @@ struct ResultObject {
uint64_t id;

ResultObject() : properties_vector(),
properties_vector_materialized(),
layer_name(),
coordinates(0.0, 0.0),
distance(std::numeric_limits<double>::max()),
Expand Down Expand Up @@ -150,11 +156,9 @@ struct property_value_visitor {
};

/// used to create the final v8 (JSON) object to return to the user
void set_property(vtzero::property const& property,
void set_property(materialized_prop_type const& property,
v8::Local<v8::Object>& properties_obj) {

auto val = vtzero::convert_property_value<mapbox::feature::value, mapbox::vector_tile::detail::property_value_mapping>(property.value());
mapbox::util::apply_visitor(property_value_visitor{properties_obj, std::string(property.key())}, val);
mapbox::util::apply_visitor(property_value_visitor{properties_obj, property.first}, property.second);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

}

GeomType get_geometry_type(vtzero::feature const& f) {
Expand Down Expand Up @@ -268,11 +272,24 @@ struct Worker : Nan::AsyncWorker {
// query point lng/lat geometry.hpp point (used for distance calculation later on)
mapbox::geometry::point<double> query_lnglat{data.longitude, data.latitude};

// for each tile
gzip::Decompressor decompressor;
std::string uncompressed;
std::vector<std::string> buffers;
std::vector<std::tuple<vtzero::vector_tile, std::int32_t, std::int32_t, std::int32_t>> tiles;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

int32_t tells me things would break if 64 bits were used for an int is that a correct assumption?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great question. How about you ticket this question separately because I've been feeling like there may be some loose ends here. I've not fully digested #88 and #89 and we should probably recap if those were correct and if anything remains to be more careful about.

tiles.reserve(data.tiles.size());

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

for (auto const& tile_ptr : data.tiles) {
TileObject const& tile_obj = *tile_ptr;
vtzero::vector_tile tile{tile_obj.data};

if (gzip::is_compressed(tile_obj.data.data(), tile_obj.data.size())) {
decompressor.decompress(uncompressed, tile_obj.data.data(), tile_obj.data.size());
buffers.emplace_back(std::move(uncompressed));
tiles.emplace_back(vtzero::vector_tile(buffers.back()), tile_obj.z, tile_obj.x, tile_obj.y);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is buffers needed as midpoint? why can't you do tiles.push_back(vtzero::vector_tile(uncompressed))? I think I might be missing some kind of optimization reasoning behind this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You've noticed the critical thing here. So, vtzero is written with top performance in mind. This means that it cannot/does not/refuses to accept the cost of copying or re-allocation internally. Instead, by design, it expects that the memory you pass it will stay alive for as long as vtzero needs to work on it. What this means is that the vtzero calls at

vtquery/src/vtquery.cpp

Lines 381 to 382 in de2e131

auto val = vtzero::convert_property_value<mapbox::feature::value, mapbox::vector_tile::detail::property_value_mapping>(property.value());
feature.properties_vector_materialized.emplace_back(std::string(property.key()), std::move(val));
need the buffer passed to the original vtzero::vector_tile object to still be alive. Otherwise they will reference dangling pointers internally. While this is dangerous, its by design for top performance. As long as the data passed to vtzero stays in scope (and is controlled outside of vtzero) we can have perfectly correct code and top performance without copies.

So, the std::vector<std::string> buffers; is my best attempt to deliver vtzero this promise. The std::vector<std::string> buffers; is at the correct scope such that the data it holds is perfectly alive and in memory for as long as vtzero needs it. More info at mapbox/vtzero#36 and mapbox/vtzero@a42889d

} else {
tiles.emplace_back(vtzero::vector_tile(tile_obj.data), tile_obj.z, tile_obj.x, tile_obj.y);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why don't you do tile_obj.data.data() here like you did above?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because tile_obj.data is a protozero::data_view (and vtzero::data_view is the same thing because of https://github.com/mapbox/vtzero/blob/3484299f270d3f1b3f8233bfbb7f3321ab7d1c0d/include/vtzero/types.hpp#L43). So in this case vtzero's vector_tile constructor is fancy enough to know about this type and has a custom constructor to save us the typing (this one https://github.com/mapbox/vtzero/blob/3484299f270d3f1b3f8233bfbb7f3321ab7d1c0d/include/vtzero/vector_tile.hpp#L67-L70). But gzip-hpp is not so smart (because we did not want to depend directly on protozero in gzip-hpp to keep it light).

Overall a string_view or data_view concept is so useful I think we should plan to either:

@joto has been thinking that 🍊 is ideal and that is the reason that protozero has https://github.com/mapbox/protozero/blob/a44efc34e5a86e93c06390aa19c89f8e115971f6/include/protozero/data_view.hpp#L29. The idea is that you can use protozero's data view until you want to opt-into using the standard, if you have it.

}
}
// for each tile
for (auto& tile_obj : tiles) {
vtzero::vector_tile& tile = std::get<0>(tile_obj);
while (auto layer = tile.next_layer()) {

// check if this is a layer we should query
Expand All @@ -282,8 +299,11 @@ struct Worker : Nan::AsyncWorker {
}

std::uint32_t extent = layer.extent();
std::int32_t tile_obj_z = std::get<1>(tile_obj);
std::int32_t tile_obj_x = std::get<2>(tile_obj);
std::int32_t tile_obj_y = std::get<3>(tile_obj);
// query point in relation to the current tile the layer extent
mapbox::geometry::point<std::int64_t> query_point = utils::create_query_point(data.longitude, data.latitude, extent, tile_obj.z, tile_obj.x, tile_obj.y);
mapbox::geometry::point<std::int64_t> query_point = utils::create_query_point(data.longitude, data.latitude, extent, tile_obj_z, tile_obj_x, tile_obj_y);

while (auto feature = layer.next_feature()) {
auto original_geometry_type = get_geometry_type(feature);
Expand All @@ -306,7 +326,7 @@ struct Worker : Nan::AsyncWorker {

// if distance from the query point is greater than 0.0 (not a direct hit) so recalculate the latlng
if (cp_info.distance > 0.0) {
ll = utils::convert_vt_to_ll(extent, tile_obj.z, tile_obj.x, tile_obj.y, cp_info);
ll = utils::convert_vt_to_ll(extent, tile_obj_z, tile_obj_x, tile_obj_y, cp_info);
meters = utils::distance_in_meters(query_lnglat, ll);
}

Expand Down Expand Up @@ -352,6 +372,16 @@ struct Worker : Nan::AsyncWorker {
} // end tile.layer.feature loop
} // end tile.layer loop
} // end tile loop
// Here we create "materialized" properties. We do this here because, when reading from a compressed
// buffer, it is unsafe to touch `feature.properties_vector` once we've left this loop.
// That is because the buffer may represent uncompressed data that is not in scope outside of Execute()
for (auto& feature : results_queue_) {
feature.properties_vector_materialized.reserve(feature.properties_vector.size());
for (auto const& property : feature.properties_vector) {
auto val = vtzero::convert_property_value<mapbox::feature::value, mapbox::vector_tile::detail::property_value_mapping>(property.value());
feature.properties_vector_materialized.emplace_back(std::string(property.key()), std::move(val));

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should you use push_back here or do you prefer emplace_back for all cases?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great question, this was has always been confusing to me. My sense is that both work, but one will be slightly faster if it can reduce allocations. The hope here is that emplace_back forwards the arguments such that they are not allocated once and then moved. But in reading about this in the past I've not felt 100% confident in this assertion and it may depend on the C++ version and library. Seems like https://abseil.io/tips/112 is relevant here.

}
}
} catch (const std::exception& e) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] Since T const& instead of const T& is the style used throughout this code file, it might make sense to use this style for exceptions as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice 👁 . Fixed in f98b2b6

SetErrorMessage(e.what());
}
Expand Down Expand Up @@ -384,7 +414,7 @@ struct Worker : Nan::AsyncWorker {

// create properties object
v8::Local<v8::Object> properties_obj = Nan::New<v8::Object>();
for (auto const& prop : feature.properties_vector) {
for (auto const& prop : feature.properties_vector_materialized) {
set_property(prop, properties_obj);
}

Expand Down