-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[redux] gzip decompression inside vtquery #95
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we could chat in person about your use of emplace_back and buffers
, otherwise looks good to me!
@@ -28,9 +30,12 @@ const char* getGeomTypeString(int enumVal) { | |||
return GeomTypeStrings[enumVal]; | |||
} | |||
|
|||
using materialized_prop_type = std::pair<std::string, mapbox::feature::value>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 for using std::pair
here. What's supposed to be stored here? materialized_prop_type
makes me think a name value pair representing a type of property. Also just curious why you use the term "materialized"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like your comments later on in this PR answer my second question: de2e131#diff-d293222b917c4d9edf75c1288538554fR375
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that comment is the intended documentation for this. Basically I think we need to copy the data to safely use it outside of the scope of the std::string
that represents the original decompressed tile buffer. I used materialized
as a fancy way to describe this copy. Other terms could be standalone_prop_type
or persistent_prop_type
or something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] Since T const& instead of const T& is the style used throughout this code file, it might make sense to use this style for pointers as well. So instead of const char*
and static const char*
you would use char const*
and static char const*
respectively. It looks weird to me but it's consistent!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alliecrevier not quite sure which chunk of code this comment refers to. Perhaps you meant to place against another line?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah must be
Lines 28 to 29 in f98b2b6
static const char* GeomTypeStrings[] = {"point", "linestring", "polygon", "unknown"}; | |
const char* getGeomTypeString(int enumVal) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going to not touch this code right now since the differences are a little above my knowledge to feel comfortable with. Let's discuss in a ticket: #96
|
||
auto val = vtzero::convert_property_value<mapbox::feature::value, mapbox::vector_tile::detail::property_value_mapping>(property.value()); | ||
mapbox::util::apply_visitor(property_value_visitor{properties_obj, std::string(property.key())}, val); | ||
mapbox::util::apply_visitor(property_value_visitor{properties_obj, property.first}, property.second); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
feature.properties_vector_materialized.reserve(feature.properties_vector.size()); | ||
for (auto const& property : feature.properties_vector) { | ||
auto val = vtzero::convert_property_value<mapbox::feature::value, mapbox::vector_tile::detail::property_value_mapping>(property.value()); | ||
feature.properties_vector_materialized.emplace_back(std::string(property.key()), std::move(val)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should you use push_back
here or do you prefer emplace_back
for all cases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great question, this was has always been confusing to me. My sense is that both work, but one will be slightly faster if it can reduce allocations. The hope here is that emplace_back
forwards the arguments such that they are not allocated once and then moved. But in reading about this in the past I've not felt 100% confident in this assertion and it may depend on the C++ version and library. Seems like https://abseil.io/tips/112 is relevant here.
gzip::Decompressor decompressor; | ||
std::string uncompressed; | ||
std::vector<std::string> buffers; | ||
std::vector<std::tuple<vtzero::vector_tile, std::int32_t, std::int32_t, std::int32_t>> tiles; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
int32_t
tells me things would break if 64 bits were used for an int
is that a correct assumption?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
std::string uncompressed; | ||
std::vector<std::string> buffers; | ||
std::vector<std::tuple<vtzero::vector_tile, std::int32_t, std::int32_t, std::int32_t>> tiles; | ||
tiles.reserve(data.tiles.size()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
buffers.emplace_back(std::move(uncompressed)); | ||
tiles.emplace_back(vtzero::vector_tile(buffers.back()), tile_obj.z, tile_obj.x, tile_obj.y); | ||
} else { | ||
tiles.emplace_back(vtzero::vector_tile(tile_obj.data), tile_obj.z, tile_obj.x, tile_obj.y); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why don't you do tile_obj.data.data()
here like you did above?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because tile_obj.data
is a protozero::data_view
(and vtzero::data_view
is the same thing because of https://github.com/mapbox/vtzero/blob/3484299f270d3f1b3f8233bfbb7f3321ab7d1c0d/include/vtzero/types.hpp#L43). So in this case vtzero
's vector_tile
constructor is fancy enough to know about this type and has a custom constructor to save us the typing (this one https://github.com/mapbox/vtzero/blob/3484299f270d3f1b3f8233bfbb7f3321ab7d1c0d/include/vtzero/vector_tile.hpp#L67-L70). But gzip-hpp
is not so smart (because we did not want to depend directly on protozero
in gzip-hpp to keep it light).
Overall a string_view
or data_view
concept is so useful I think we should plan to either:
- 🍒 break out the
data_view
implemention in protozero and start using it everywhere. Things like gzip-hpp would then have a new dep - 🍎 use someone else's string view like https://abseil.io/tips/1
- 🍊 or wait until we can use C++17 everywhere and use https://en.cppreference.com/w/cpp/header/string_view
@joto has been thinking that 🍊 is ideal and that is the reason that protozero
has https://github.com/mapbox/protozero/blob/a44efc34e5a86e93c06390aa19c89f8e115971f6/include/protozero/data_view.hpp#L29. The idea is that you can use protozero
's data view until you want to opt-into using the standard, if you have it.
if (gzip::is_compressed(tile_obj.data.data(), tile_obj.data.size())) { | ||
decompressor.decompress(uncompressed, tile_obj.data.data(), tile_obj.data.size()); | ||
buffers.emplace_back(std::move(uncompressed)); | ||
tiles.emplace_back(vtzero::vector_tile(buffers.back()), tile_obj.z, tile_obj.x, tile_obj.y); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is buffers
needed as midpoint? why can't you do tiles.push_back(vtzero::vector_tile(uncompressed))
? I think I might be missing some kind of optimization reasoning behind this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You've noticed the critical thing here. So, vtzero
is written with top performance in mind. This means that it cannot/does not/refuses to accept the cost of copying or re-allocation internally. Instead, by design, it expects that the memory you pass it will stay alive for as long as vtzero
needs to work on it. What this means is that the vtzero calls at
Lines 381 to 382 in de2e131
auto val = vtzero::convert_property_value<mapbox::feature::value, mapbox::vector_tile::detail::property_value_mapping>(property.value()); | |
feature.properties_vector_materialized.emplace_back(std::string(property.key()), std::move(val)); |
vtzero::vector_tile
object to still be alive. Otherwise they will reference dangling pointers internally. While this is dangerous, its by design for top performance. As long as the data passed to vtzero stays in scope (and is controlled outside of vtzero) we can have perfectly correct code and top performance without copies.
So, the std::vector<std::string> buffers;
is my best attempt to deliver vtzero
this promise. The std::vector<std::string> buffers;
is at the correct scope such that the data it holds is perfectly alive and in memory for as long as vtzero needs it. More info at mapbox/vtzero#36 and mapbox/vtzero@a42889d
src/vtquery.cpp
Outdated
auto val = vtzero::convert_property_value<mapbox::feature::value, mapbox::vector_tile::detail::property_value_mapping>(property.value()); | ||
feature.properties_vector_materialized.emplace_back(std::string(property.key()), std::move(val)); | ||
} | ||
} | ||
} catch (const std::exception& e) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] Since T const&
instead of const T&
is the style used throughout this code file, it might make sense to use this style for exceptions as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice 👁 . Fixed in f98b2b6
Thanks for your review and approval @alliecrevier. Just merged and then realized I missed tests, so created #97 |
#73 got stale and was hard to merge after a few other things landed (like #89), so I've re-created the PR from scratch. As a next step would love PR review from @alliecrevier.