Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[redux] gzip decompression inside vtquery #95

Merged
merged 3 commits into from
Jun 27, 2018
Merged

Conversation

springmeyer
Copy link
Contributor

#73 got stale and was hard to merge after a few other things landed (like #89), so I've re-created the PR from scratch. As a next step would love PR review from @alliecrevier.

Copy link

@sssoleileraaa sssoleileraaa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we could chat in person about your use of emplace_back and buffers, otherwise looks good to me!

@@ -28,9 +30,12 @@ const char* getGeomTypeString(int enumVal) {
return GeomTypeStrings[enumVal];
}

using materialized_prop_type = std::pair<std::string, mapbox::feature::value>;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for using std::pair here. What's supposed to be stored here? materialized_prop_type makes me think a name value pair representing a type of property. Also just curious why you use the term "materialized"?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like your comments later on in this PR answer my second question: de2e131#diff-d293222b917c4d9edf75c1288538554fR375

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that comment is the intended documentation for this. Basically I think we need to copy the data to safely use it outside of the scope of the std::string that represents the original decompressed tile buffer. I used materialized as a fancy way to describe this copy. Other terms could be standalone_prop_type or persistent_prop_type or something.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] Since T const& instead of const T& is the style used throughout this code file, it might make sense to use this style for pointers as well. So instead of const char* and static const char* you would use char const* and static char const* respectively. It looks weird to me but it's consistent!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alliecrevier not quite sure which chunk of code this comment refers to. Perhaps you meant to place against another line?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah must be

vtquery/src/vtquery.cpp

Lines 28 to 29 in f98b2b6

static const char* GeomTypeStrings[] = {"point", "linestring", "polygon", "unknown"};
const char* getGeomTypeString(int enumVal) {
(which I could not see in the github comment display since it only shows the first couple lines above the comment).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to not touch this code right now since the differences are a little above my knowledge to feel comfortable with. Let's discuss in a ticket: #96


auto val = vtzero::convert_property_value<mapbox::feature::value, mapbox::vector_tile::detail::property_value_mapping>(property.value());
mapbox::util::apply_visitor(property_value_visitor{properties_obj, std::string(property.key())}, val);
mapbox::util::apply_visitor(property_value_visitor{properties_obj, property.first}, property.second);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

feature.properties_vector_materialized.reserve(feature.properties_vector.size());
for (auto const& property : feature.properties_vector) {
auto val = vtzero::convert_property_value<mapbox::feature::value, mapbox::vector_tile::detail::property_value_mapping>(property.value());
feature.properties_vector_materialized.emplace_back(std::string(property.key()), std::move(val));

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should you use push_back here or do you prefer emplace_back for all cases?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great question, this was has always been confusing to me. My sense is that both work, but one will be slightly faster if it can reduce allocations. The hope here is that emplace_back forwards the arguments such that they are not allocated once and then moved. But in reading about this in the past I've not felt 100% confident in this assertion and it may depend on the C++ version and library. Seems like https://abseil.io/tips/112 is relevant here.

gzip::Decompressor decompressor;
std::string uncompressed;
std::vector<std::string> buffers;
std::vector<std::tuple<vtzero::vector_tile, std::int32_t, std::int32_t, std::int32_t>> tiles;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

int32_t tells me things would break if 64 bits were used for an int is that a correct assumption?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great question. How about you ticket this question separately because I've been feeling like there may be some loose ends here. I've not fully digested #88 and #89 and we should probably recap if those were correct and if anything remains to be more careful about.

std::string uncompressed;
std::vector<std::string> buffers;
std::vector<std::tuple<vtzero::vector_tile, std::int32_t, std::int32_t, std::int32_t>> tiles;
tiles.reserve(data.tiles.size());

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

buffers.emplace_back(std::move(uncompressed));
tiles.emplace_back(vtzero::vector_tile(buffers.back()), tile_obj.z, tile_obj.x, tile_obj.y);
} else {
tiles.emplace_back(vtzero::vector_tile(tile_obj.data), tile_obj.z, tile_obj.x, tile_obj.y);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why don't you do tile_obj.data.data() here like you did above?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because tile_obj.data is a protozero::data_view (and vtzero::data_view is the same thing because of https://github.com/mapbox/vtzero/blob/3484299f270d3f1b3f8233bfbb7f3321ab7d1c0d/include/vtzero/types.hpp#L43). So in this case vtzero's vector_tile constructor is fancy enough to know about this type and has a custom constructor to save us the typing (this one https://github.com/mapbox/vtzero/blob/3484299f270d3f1b3f8233bfbb7f3321ab7d1c0d/include/vtzero/vector_tile.hpp#L67-L70). But gzip-hpp is not so smart (because we did not want to depend directly on protozero in gzip-hpp to keep it light).

Overall a string_view or data_view concept is so useful I think we should plan to either:

@joto has been thinking that 🍊 is ideal and that is the reason that protozero has https://github.com/mapbox/protozero/blob/a44efc34e5a86e93c06390aa19c89f8e115971f6/include/protozero/data_view.hpp#L29. The idea is that you can use protozero's data view until you want to opt-into using the standard, if you have it.

if (gzip::is_compressed(tile_obj.data.data(), tile_obj.data.size())) {
decompressor.decompress(uncompressed, tile_obj.data.data(), tile_obj.data.size());
buffers.emplace_back(std::move(uncompressed));
tiles.emplace_back(vtzero::vector_tile(buffers.back()), tile_obj.z, tile_obj.x, tile_obj.y);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is buffers needed as midpoint? why can't you do tiles.push_back(vtzero::vector_tile(uncompressed))? I think I might be missing some kind of optimization reasoning behind this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You've noticed the critical thing here. So, vtzero is written with top performance in mind. This means that it cannot/does not/refuses to accept the cost of copying or re-allocation internally. Instead, by design, it expects that the memory you pass it will stay alive for as long as vtzero needs to work on it. What this means is that the vtzero calls at

vtquery/src/vtquery.cpp

Lines 381 to 382 in de2e131

auto val = vtzero::convert_property_value<mapbox::feature::value, mapbox::vector_tile::detail::property_value_mapping>(property.value());
feature.properties_vector_materialized.emplace_back(std::string(property.key()), std::move(val));
need the buffer passed to the original vtzero::vector_tile object to still be alive. Otherwise they will reference dangling pointers internally. While this is dangerous, its by design for top performance. As long as the data passed to vtzero stays in scope (and is controlled outside of vtzero) we can have perfectly correct code and top performance without copies.

So, the std::vector<std::string> buffers; is my best attempt to deliver vtzero this promise. The std::vector<std::string> buffers; is at the correct scope such that the data it holds is perfectly alive and in memory for as long as vtzero needs it. More info at mapbox/vtzero#36 and mapbox/vtzero@a42889d

src/vtquery.cpp Outdated
auto val = vtzero::convert_property_value<mapbox::feature::value, mapbox::vector_tile::detail::property_value_mapping>(property.value());
feature.properties_vector_materialized.emplace_back(std::string(property.key()), std::move(val));
}
}
} catch (const std::exception& e) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] Since T const& instead of const T& is the style used throughout this code file, it might make sense to use this style for exceptions as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice 👁 . Fixed in f98b2b6

@springmeyer springmeyer merged commit 378659e into master Jun 27, 2018
@springmeyer springmeyer deleted the gzip-decode branch June 27, 2018 19:00
@springmeyer
Copy link
Contributor Author

Thanks for your review and approval @alliecrevier. Just merged and then realized I missed tests, so created #97

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants