Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce memory usage for raster source handling. #5572

Merged
merged 16 commits into from
Nov 14, 2019

Conversation

Tomonobu3110
Copy link

@Tomonobu3110 Tomonobu3110 commented Oct 3, 2019

Issue

This PR solve Issue #5571

  1. Reading line by line instead of whole file.

osrm-extract try to read (load) whole raster file before parse it.
This consume very large memory when raster file is large (such as 40960 x 24576).
To reduce the memory usage, this PR change reading method
from whole file to line by line.

  1. Enable Raster Source cache system

osrm-extract had the cache system for raster sources.
If this works correct, from second time to load raster source,
first raster source (on memory) is reused and no more memory is required.
But unfortunately, this system is not worked by some bug.
I fix it.

Tasklist

No need Tasklist

  • update relevant Wiki pages : No new feature added by this PR.
  • add tests (see testing documentation : No new feature added by this PR.

Requirements / Relations

No relations.

storage::io::FileReader file_reader(filepath, storage::io::FileReader::HasNoFingerprint);
std::string buf;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please use the whole word buffer. In previous version it was buffer.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, Thanks for your review and comment.
I will fix as your comment. (buf --> buffer)

storage::io::FileReader file_reader(filepath, storage::io::FileReader::HasNoFingerprint);
std::string buf;
buf.resize(xdim * 11); // INT32_MAX = 2147483647 = 10 chars + 1 white space = 11
BOOST_ASSERT(xdim * 11 <= buf.size());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this check relevant? Can buf.size() < xdim * 11 without exception at previous line?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your comment.
In original code, I found below code.
buffer.resized(file_reader.GetSize())
My understanding is "to expand the buffer size to same with raster file size."
By this pull request, I changed from whole size buffer to line buffer to reduce the memory usage.
So, just in case (and respect to original code), I expand the buffer size to "maximum size of one line (logically)".
Maybe this code is not needed or there is better way.
I'm glad if you could give me some advice.

Best Regards,
Tomonobu Saito.

std::vector<std::string> result;
boost::split(
result, buf, boost::is_any_of(" \r\n\0"), boost::algorithm::token_compress_on);
// boost::split(result, buf, boost::is_any_of(" \r\n\0"));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this comment in the final code?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your checking. As you said, this comment line is not needed.
I will remove soon.

result, buf, boost::is_any_of(" \r\n\0"), boost::algorithm::token_compress_on);
// boost::split(result, buf, boost::is_any_of(" \r\n\0"));
unsigned int x = 0;
BOOST_FOREACH (std::string s, result)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[optional] Use range based loop from 11 standard. for (const string & s: result)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, thanks for your suggestion.
I will use 11 style loop. Soon, I will modify.

};

// << singletone >> RasterCache
class RasterCache
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you design this class as a singleton?
It seems that it will be used only inside the RasterContainer. And RasterContainer has one instance in Lua scripting environment. I think you can make it as a regular class or even implement this logic in RasterContainer class.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi,
Thanks for your check and comment.

This is most important point of my pull request.
In original code, LoadedSources and LoadedSourcePaths are owned by RasterContainer class.
osrm-extract create more than one RasterContainer instances, and this causes more than one time to load same raster file (with rasterbot.lua).
My first idea is to change LoadesSources and LoadedSourcePaths to static.
But this cases "double free" problem. see error log at below URL, you can find "double free or corrption (fasttop)" error.
https://travis-ci.org/Project-OSRM/osrm-backend/jobs/593429031?utm_medium=notification&utm_source=github_status
This is the reason I introduce the singletone pattern.
Singletone can solve these two problems.

  • Multiple load of same raster file.
  • Double free error.

Best Regards,
Tomonobu Saito

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi!
Thank you for the detailed explanation.

It seems that RasterContainer binds directly to the lua functions https://github.com/Project-OSRM/osrm-backend/blob/master/src/extractor/scripting_environment_lua.cpp#L188
So isn't it a lua problem that the profile creates several containers?

Copy link
Author

@Tomonobu3110 Tomonobu3110 Nov 8, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi,
Thanks for your comment.

This is usage of raster:load in the lua file (rasterbot.lua).
https://github.com/Project-OSRM/osrm-backend/blob/master/profiles/rasterbot.lua

I think we cannot avoid to create more than one RasterContainer instance, since setup() is called more than one time during osrm-extract (according to checking the log of osrm-extract).
But, just in case, I will check the possibility to avoid creating multiple RasterContainer by lua file.
Then, I will update the comment.

Best Regards,
Tomonobu Saito.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi,

The instance of RasterContainer is created at here.
https://github.com/Project-OSRM/osrm-backend/blob/master/include/extractor/scripting_environment_lua.hpp#L38

So, we cannot control the creation of RasterContainer instance in lua files (such as rasterbot.lua).

Best Regards,
Tomonobu Saito.

Copy link
Author

@Tomonobu3110 Tomonobu3110 Nov 8, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi,

I briefly checked scripting_environment_lua.{hpp|cpp}.
I think one LuaScriptingContext instance is created for every threads.
So, the number of instance of LuaScriptingContext is equal to the number of threads osrm-extract uses.
RasterContainer is member of LuaScriptingContext.
There is no doubt the number of RasterContainer is equal to the number of LuaScriptingContext.
This cannot control in lua file (such as rasterbot.lua).

If we want to avoid multiple load of same raster file, there are two strategies.
A) Use singletone pattern to reuse same resource from multiple thread. (This is my strategy.)
B) Share single LuaScriptingContext by all threads.
I think B has very big impact for current architecture design.

Best Regards,
Tomonobu Saito.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi,

thank you for this research. Now I am not against A) strategy. Can you add a briefly comment before this class to avoid confusion of future readers?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi,
Thanks for your checking and comment.
Yes, I will add a comment then commit again.

input_stream.getline(reinterpret_cast<char *>(dest), count * sizeof(T));
for (std::size_t n = ios.gcount(); n < count; ++n)
{
reinterpret_cast<char *>(dest)[n] = '\0';
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we init string with this value on resize()?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your comment. I got it. I will update my code.

Copy link
Author

@Tomonobu3110 Tomonobu3110 Nov 6, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The dest is not std::string but template T. So, I could not apply resize().
instead of that, I modified to use memset.
memset(dest, 0, count * sizeof(T));
Is this satisfy your intention?

@Tomonobu3110
Copy link
Author

Tomonobu3110 commented Nov 6, 2019

Hi,

https://travis-ci.org/Project-OSRM/osrm-backend/jobs/607969210?utm_medium=notification&utm_source=github_status
apt-get install failed

I think this failure is not because of my code.
Is there anyone who knows how can I kick build again?
Or do I need to do dummy commit to kick the build system?

Best Regards,
Tomonobu Saito

@Tomonobu3110 Tomonobu3110 requested a review from gardster November 6, 2019 04:17
Copy link
Contributor

@gardster gardster left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HI! Thank you for your answers.
I have several additional questions regarding this PR I hope you will not be difficult to resolve them.

unsigned int x = 0;
for (const auto &s : result)
{
if (x < xdim)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi!
Do we need this check if we will cover it with BOOST_ASSERT(x == xdim); ? Maybe this check can also be assertion?
I'm not confident with the file format and maybe it is normal to have more values in row than we expect in header: then assertion is not valid.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi,
Thanks for your comment.

I afraid the difference of width (= the number of pixels) between actual raster file and setting in lua.
Especially, if (width of actual file) < (setting in lua) then we will face buffer overflow reading.
As you know, the result is not defined (beyond of control).

To avoid this situation, I put the if (x < xdim) checking, just in case.
If you prefer to remove this checking, I can remove.

Which do you prefer?
a) leave if (x < xdim) just in case.
b) remove. we believe user's setting in lua :)

Best Regards,
Tomonobu Saito

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HI! Thank you for the explanation.
Now I understand that we got xdim setting from the lua profile and it can be a source of mistakes.
I think we should keep this check.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi,
Thanks for your comment.
OK, I keep this modification.

};

// << singletone >> RasterCache
class RasterCache
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi!
Thank you for the detailed explanation.

It seems that RasterContainer binds directly to the lua functions https://github.com/Project-OSRM/osrm-backend/blob/master/src/extractor/scripting_environment_lua.cpp#L188
So isn't it a lua problem that the profile creates several containers?

include/extractor/raster_source.hpp Show resolved Hide resolved
#include <storage/io.hpp>

#include <iostream>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems the header is not used. Is this a left-over from debugging?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your check and comment.
I will remove it after confirmation.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, Mr. DennisOSRM,
Thanks for your strong support.
May I ask you to proceed reviewing?
I have already modified the code and committed.
Best Regards,
Tomonobu Saito

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good from my side 👍

#include <iterator>
#include <list>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems the header is not used. Is this a left-over from debugging?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your check and comment.
I will remove it after confirmation.

input_stream.seekg(position, std::ios::beg);

if (fingerprint == FingerprintFlag::VerifyFingerprint)
catch (boost::filesystem::filesystem_error &ex)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consider adding const

Copy link
Author

@Tomonobu3110 Tomonobu3110 Nov 8, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, thanks for your comment.
I will add const (since ex should be read-only).

Copy link
Contributor

@gardster gardster left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your contribution.

@Tomonobu3110
Copy link
Author

Hi,
Thanks for your comment and approval !
I wait this PR is merged to main branch.

@Tomonobu3110
Copy link
Author

Hi, Mr. DennisOSRM,
Thanks for approval !

@gardster gardster merged commit 0b139ff into Project-OSRM:master Nov 14, 2019
datendelphin added a commit to fossgis-routing-server/osrm-backend that referenced this pull request Nov 19, 2020
  - Changes from 5.22.0
    - Build:
      - FIXED: pessimistic calls to std::move [Project-OSRM#5560](Project-OSRM#5561)
    - Features:
      - ADDED: new API parameter - `snapping=any|default` to allow snapping to previously unsnappable edges [Project-OSRM#5361](Project-OSRM#5361)
      - ADDED: keepalive support to the osrm-routed HTTP server [Project-OSRM#5518](Project-OSRM#5518)
      - ADDED: flatbuffers output format support [Project-OSRM#5513](Project-OSRM#5513)
      - ADDED: Global 'skip_waypoints' option [Project-OSRM#5556](Project-OSRM#5556)
      - FIXED: Install the libosrm_guidance library correctly [Project-OSRM#5604](Project-OSRM#5604)
      - FIXED: Http Handler can now deal witch optional whitespace between header-key and -value [Project-OSRM#5606](Project-OSRM#5606)
    - Routing:
      - CHANGED: allow routing past `barrier=arch` [Project-OSRM#5352](Project-OSRM#5352)
      - CHANGED: default car weight was reduced to 2000 kg. [Project-OSRM#5371](Project-OSRM#5371)
      - CHANGED: default car height was reduced to 2 meters. [Project-OSRM#5389](Project-OSRM#5389)
      - FIXED: treat `bicycle=use_sidepath` as no access on the tagged way. [Project-OSRM#5622](Project-OSRM#5622)
      - FIXED: fix table result when source and destination on same one-way segment. [Project-OSRM#5828](Project-OSRM#5828)
      - FIXED: fix occasional segfault when swapping data with osrm-datastore and using `exclude=` [Project-OSRM#5844](Project-OSRM#5844)
      - FIXED: fix crash in MLD alternative search if source or target are invalid [Project-OSRM#5851](Project-OSRM#5851)
    - Misc:
      - CHANGED: Reduce memory usage for raster source handling. [Project-OSRM#5572](Project-OSRM#5572)
      - CHANGED: Add cmake option `ENABLE_DEBUG_LOGGING` to control whether output debug logging. [Project-OSRM#3427](Project-OSRM#3427)
      - CHANGED: updated extent of Hong Kong as left hand drive country. [Project-OSRM#5535](Project-OSRM#5535)
      - FIXED: corrected error message when failing to snap input coordinates [Project-OSRM#5846](Project-OSRM#5846)
    - Infrastructure
      - REMOVED: STXXL support removed as STXXL became abandonware. [Project-OSRM#5760](Project-OSRM#5760)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants