-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce memory usage for raster source handling. #5572
Conversation
…eeking the file).
include/extractor/raster_source.hpp
Outdated
storage::io::FileReader file_reader(filepath, storage::io::FileReader::HasNoFingerprint); | ||
std::string buf; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please use the whole word buffer
. In previous version it was buffer
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, Thanks for your review and comment.
I will fix as your comment. (buf --> buffer)
include/extractor/raster_source.hpp
Outdated
storage::io::FileReader file_reader(filepath, storage::io::FileReader::HasNoFingerprint); | ||
std::string buf; | ||
buf.resize(xdim * 11); // INT32_MAX = 2147483647 = 10 chars + 1 white space = 11 | ||
BOOST_ASSERT(xdim * 11 <= buf.size()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this check relevant? Can buf.size() < xdim * 11
without exception at previous line?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your comment.
In original code, I found below code.
buffer.resized(file_reader.GetSize())
My understanding is "to expand the buffer size to same with raster file size."
By this pull request, I changed from whole size buffer to line buffer to reduce the memory usage.
So, just in case (and respect to original code), I expand the buffer size to "maximum size of one line (logically)".
Maybe this code is not needed or there is better way.
I'm glad if you could give me some advice.
Best Regards,
Tomonobu Saito.
include/extractor/raster_source.hpp
Outdated
std::vector<std::string> result; | ||
boost::split( | ||
result, buf, boost::is_any_of(" \r\n\0"), boost::algorithm::token_compress_on); | ||
// boost::split(result, buf, boost::is_any_of(" \r\n\0")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need this comment in the final code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your checking. As you said, this comment line is not needed.
I will remove soon.
include/extractor/raster_source.hpp
Outdated
result, buf, boost::is_any_of(" \r\n\0"), boost::algorithm::token_compress_on); | ||
// boost::split(result, buf, boost::is_any_of(" \r\n\0")); | ||
unsigned int x = 0; | ||
BOOST_FOREACH (std::string s, result) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[optional] Use range based loop from 11 standard. for (const string & s: result)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, thanks for your suggestion.
I will use 11 style loop. Soon, I will modify.
}; | ||
|
||
// << singletone >> RasterCache | ||
class RasterCache |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you design this class as a singleton?
It seems that it will be used only inside the RasterContainer
. And RasterContainer
has one instance in Lua scripting environment. I think you can make it as a regular class or even implement this logic in RasterContainer
class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi,
Thanks for your check and comment.
This is most important point of my pull request.
In original code, LoadedSources and LoadedSourcePaths are owned by RasterContainer class.
osrm-extract create more than one RasterContainer instances, and this causes more than one time to load same raster file (with rasterbot.lua).
My first idea is to change LoadesSources and LoadedSourcePaths to static.
But this cases "double free" problem. see error log at below URL, you can find "double free or corrption (fasttop)" error.
https://travis-ci.org/Project-OSRM/osrm-backend/jobs/593429031?utm_medium=notification&utm_source=github_status
This is the reason I introduce the singletone pattern.
Singletone can solve these two problems.
- Multiple load of same raster file.
- Double free error.
Best Regards,
Tomonobu Saito
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi!
Thank you for the detailed explanation.
It seems that RasterContainer binds directly to the lua functions https://github.com/Project-OSRM/osrm-backend/blob/master/src/extractor/scripting_environment_lua.cpp#L188
So isn't it a lua problem that the profile creates several containers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi,
Thanks for your comment.
This is usage of raster:load
in the lua file (rasterbot.lua).
https://github.com/Project-OSRM/osrm-backend/blob/master/profiles/rasterbot.lua
I think we cannot avoid to create more than one RasterContainer instance, since setup()
is called more than one time during osrm-extract (according to checking the log of osrm-extract).
But, just in case, I will check the possibility to avoid creating multiple RasterContainer by lua file.
Then, I will update the comment.
Best Regards,
Tomonobu Saito.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi,
The instance of RasterContainer is created at here.
https://github.com/Project-OSRM/osrm-backend/blob/master/include/extractor/scripting_environment_lua.hpp#L38
So, we cannot control the creation of RasterContainer instance in lua files (such as rasterbot.lua).
Best Regards,
Tomonobu Saito.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi,
I briefly checked scripting_environment_lua.{hpp|cpp}.
I think one LuaScriptingContext instance is created for every threads.
So, the number of instance of LuaScriptingContext is equal to the number of threads osrm-extract uses.
RasterContainer is member of LuaScriptingContext.
There is no doubt the number of RasterContainer is equal to the number of LuaScriptingContext.
This cannot control in lua file (such as rasterbot.lua).
If we want to avoid multiple load of same raster file, there are two strategies.
A) Use singletone pattern to reuse same resource from multiple thread. (This is my strategy.)
B) Share single LuaScriptingContext by all threads.
I think B has very big impact for current architecture design.
Best Regards,
Tomonobu Saito.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi,
thank you for this research. Now I am not against A) strategy. Can you add a briefly comment before this class to avoid confusion of future readers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi,
Thanks for your checking and comment.
Yes, I will add a comment then commit again.
include/storage/io.hpp
Outdated
input_stream.getline(reinterpret_cast<char *>(dest), count * sizeof(T)); | ||
for (std::size_t n = ios.gcount(); n < count; ++n) | ||
{ | ||
reinterpret_cast<char *>(dest)[n] = '\0'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we init string with this value on resize()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your comment. I got it. I will update my code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The dest is not std::string but template T. So, I could not apply resize().
instead of that, I modified to use memset.
memset(dest, 0, count * sizeof(T));
Is this satisfy your intention?
Hi, https://travis-ci.org/Project-OSRM/osrm-backend/jobs/607969210?utm_medium=notification&utm_source=github_status I think this failure is not because of my code. Best Regards, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
HI! Thank you for your answers.
I have several additional questions regarding this PR I hope you will not be difficult to resolve them.
unsigned int x = 0; | ||
for (const auto &s : result) | ||
{ | ||
if (x < xdim) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi!
Do we need this check if we will cover it with BOOST_ASSERT(x == xdim);
? Maybe this check can also be assertion?
I'm not confident with the file format and maybe it is normal to have more values in row than we expect in header: then assertion is not valid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi,
Thanks for your comment.
I afraid the difference of width (= the number of pixels) between actual raster file and setting in lua.
Especially, if (width of actual file) < (setting in lua) then we will face buffer overflow reading.
As you know, the result is not defined (beyond of control).
To avoid this situation, I put the if (x < xdim)
checking, just in case.
If you prefer to remove this checking, I can remove.
Which do you prefer?
a) leave if (x < xdim)
just in case.
b) remove. we believe user's setting in lua :)
Best Regards,
Tomonobu Saito
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
HI! Thank you for the explanation.
Now I understand that we got xdim setting from the lua profile and it can be a source of mistakes.
I think we should keep this check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi,
Thanks for your comment.
OK, I keep this modification.
}; | ||
|
||
// << singletone >> RasterCache | ||
class RasterCache |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi!
Thank you for the detailed explanation.
It seems that RasterContainer binds directly to the lua functions https://github.com/Project-OSRM/osrm-backend/blob/master/src/extractor/scripting_environment_lua.cpp#L188
So isn't it a lua problem that the profile creates several containers?
include/extractor/raster_source.hpp
Outdated
#include <storage/io.hpp> | ||
|
||
#include <iostream> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems the header is not used. Is this a left-over from debugging?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your check and comment.
I will remove it after confirmation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, Mr. DennisOSRM,
Thanks for your strong support.
May I ask you to proceed reviewing?
I have already modified the code and committed.
Best Regards,
Tomonobu Saito
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good from my side 👍
include/extractor/raster_source.hpp
Outdated
#include <iterator> | ||
#include <list> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems the header is not used. Is this a left-over from debugging?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your check and comment.
I will remove it after confirmation.
include/storage/io.hpp
Outdated
input_stream.seekg(position, std::ios::beg); | ||
|
||
if (fingerprint == FingerprintFlag::VerifyFingerprint) | ||
catch (boost::filesystem::filesystem_error &ex) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
consider adding const
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, thanks for your comment.
I will add const
(since ex should be read-only).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your contribution.
Hi, |
Hi, Mr. DennisOSRM, |
- Changes from 5.22.0 - Build: - FIXED: pessimistic calls to std::move [Project-OSRM#5560](Project-OSRM#5561) - Features: - ADDED: new API parameter - `snapping=any|default` to allow snapping to previously unsnappable edges [Project-OSRM#5361](Project-OSRM#5361) - ADDED: keepalive support to the osrm-routed HTTP server [Project-OSRM#5518](Project-OSRM#5518) - ADDED: flatbuffers output format support [Project-OSRM#5513](Project-OSRM#5513) - ADDED: Global 'skip_waypoints' option [Project-OSRM#5556](Project-OSRM#5556) - FIXED: Install the libosrm_guidance library correctly [Project-OSRM#5604](Project-OSRM#5604) - FIXED: Http Handler can now deal witch optional whitespace between header-key and -value [Project-OSRM#5606](Project-OSRM#5606) - Routing: - CHANGED: allow routing past `barrier=arch` [Project-OSRM#5352](Project-OSRM#5352) - CHANGED: default car weight was reduced to 2000 kg. [Project-OSRM#5371](Project-OSRM#5371) - CHANGED: default car height was reduced to 2 meters. [Project-OSRM#5389](Project-OSRM#5389) - FIXED: treat `bicycle=use_sidepath` as no access on the tagged way. [Project-OSRM#5622](Project-OSRM#5622) - FIXED: fix table result when source and destination on same one-way segment. [Project-OSRM#5828](Project-OSRM#5828) - FIXED: fix occasional segfault when swapping data with osrm-datastore and using `exclude=` [Project-OSRM#5844](Project-OSRM#5844) - FIXED: fix crash in MLD alternative search if source or target are invalid [Project-OSRM#5851](Project-OSRM#5851) - Misc: - CHANGED: Reduce memory usage for raster source handling. [Project-OSRM#5572](Project-OSRM#5572) - CHANGED: Add cmake option `ENABLE_DEBUG_LOGGING` to control whether output debug logging. [Project-OSRM#3427](Project-OSRM#3427) - CHANGED: updated extent of Hong Kong as left hand drive country. [Project-OSRM#5535](Project-OSRM#5535) - FIXED: corrected error message when failing to snap input coordinates [Project-OSRM#5846](Project-OSRM#5846) - Infrastructure - REMOVED: STXXL support removed as STXXL became abandonware. [Project-OSRM#5760](Project-OSRM#5760)
Issue
This PR solve Issue #5571
osrm-extract try to read (load) whole raster file before parse it.
This consume very large memory when raster file is large (such as 40960 x 24576).
To reduce the memory usage, this PR change reading method
from whole file to line by line.
osrm-extract had the cache system for raster sources.
If this works correct, from second time to load raster source,
first raster source (on memory) is reused and no more memory is required.
But unfortunately, this system is not worked by some bug.
I fix it.
Tasklist
No need Tasklist
Requirements / Relations
No relations.