Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flat Loops Shouldn't Break Opposing Edge Index #2385

Merged
merged 13 commits into from
Jun 10, 2020
Merged

Flat Loops Shouldn't Break Opposing Edge Index #2385

merged 13 commits into from
Jun 10, 2020

Conversation

kevinkreiser
Copy link
Member

@kevinkreiser kevinkreiser commented May 19, 2020

Loops strike again! So here is another problem of handling a data edge case (we seem to be hitting a lot of these lately). In this case we have a "flat loop": https://www.openstreetmap.org/way/747414277#map=19/42.56012/-70.96623

Where you can see the way goes from node 6993574477 to 6993574481 and back to 6993574477. The tile building code should then force a graph node to happen on node 6993574481 which should mean two separate pairs of edges; one pair from 6993574477 to 6993574481 and another pair from 6993574481 back to 6993574477. The last step in tile building should then link up the opposing edge indices between the two pairs of edges between those nodes. However something is going wrong here. For some reason this code cant differentiate between the two pairs and ends up creating edges whose opposing edges dont match.

Specifically one invarient in the code should be should be that: edge == opposing(opposing(edge))

However this way breaks that.. The failing test confirms it but I still need to figure out why it cant differentiate. It may be the case that we will have to collapse flat loops as a special case in the parsing logic.

So yeah in graph validator we go hook up edges with their opposing edges by matching them based on their begin and end nodes and their properties. In this case these cases the code will see two potential matching opposing edges and it always takes the last one. Because of this two edges end up picking the same opposing edge and sharing it. We could potentially devise a way to make the tie braker scenario work (pick the one that wasnt already picked) but instead what I did was make it so that we dont duplicate edges in the first place.

@@ -138,7 +138,8 @@ void ConstructEdges(const OSMData& osmdata,
auto way_node = *way_nodes[current_way_node_index];
const auto way = *ways[way_node.way_index];
const auto first_way_node_index = current_way_node_index;
const auto last_way_node_index = first_way_node_index + way.node_count() - 1;
const auto last_way_node_index =
first_way_node_index + way.node_count() - way_node.way_shape_node_index - 1;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because we can visit a way twice now (when it doubles back) we need to worry about not starting at the 0th node of a way

@@ -159,7 +160,7 @@ void ConstructEdges(const OSMData& osmdata,
// Remember this edge starts here
Edge prev_edge = Edge{0};
Edge edge = Edge::make_edge(way_node.way_index, current_way_node_index, way);
edge.attributes.way_begin = true;
edge.attributes.way_begin = way_node.way_shape_node_index == 0;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could go either way on this. @gknisely what is preferable, i know this has implications for lane stuff

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

without testing I am not sure.

size += 1;
// remember what edge this node will end, its complicated by the fact that we delay adding the
// edge until the next iteration of the loop, ie once the edge becomes prev_edge
uint32_t end_of = static_cast<uint32_t>(edges.size() + prev_edge.is_valid());
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i tried to make this easier to understand. let me know if the comment makes sense

@@ -200,27 +201,40 @@ void ConstructEdges(const OSMData& osmdata,
prev_edge.attributes.way_prior = true;
}

if (!edge.attributes.way_begin)
// We should add the previous edge now that we know its done
if (prev_edge.is_valid())
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again i tried to make this more clear. if we have a prev_edge finished and waiting to get pushed on then we do so now because we are about to overwrite it. ie queue the next one

way_node = next_way_node;
++current_way_node_index;
doubled_back = current_way_node_index != last_way_node_index;
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the above block is where we end seeing and deciding to skip over doubled back sections of a way

// backed over itself and we need to skip it
if (current_way_node_index == last_way_node_index || doubled_back) {
edges.push_back(prev_edge);
current_way_node_index += !doubled_back;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the trick. if we doubled back we want to treat it like we are processing a new way where the doubling back ends (ie splits off again). in the case that we are doubling back the index is already fast forwarded (from the loop above) to the right one so we dont need to increment in that case. in the other case where we want the next way we do need to increment.

OSMNode osm_node{node};
auto inserted = loop_nodes_.insert(std::make_pair(node, i));
osm_node.duplicate_ =
!inserted.second || (i != 0 && i != nodes.size() - 1 && nodes[i - 1] == nodes[i + 1]);
Copy link
Member Author

@kevinkreiser kevinkreiser May 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when parsing ways, we mark any nodes that a way references as "duplicate" if they were already referenced by the way once or more befor OR if they were a turn around (the point at which it doubles back)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit for readability: can you pull out (i != 0 && i != nodes.size() - 1 && nodes[i - 1] == nodes[i + 1]) into a variable and give it a meaningful name?

@@ -288,6 +288,8 @@ struct graph_callback : public OSMPBF::Callback {
sequence<OSMWayNode>::iterator element = (*way_nodes_)[current_way_node_index_];
while (current_way_node_index_ < way_nodes_->size() &&
(way_node = element = (*way_nodes_)[current_way_node_index_]).node.osmid_ == osmid) {
// we need to keep the duplicate flag that way parsing set
n.duplicate_ = way_node.node.duplicate_;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because the way parser was only setting the nodeid for the nodes it referenced you can see below we just overwrite the whole thing with what we parsed from the nodes tags. however now that the way callback also writes this single bit saying its a duplicate we need to copy that forward before we overwrite the node data

@kevinkreiser
Copy link
Member Author

Currently we also track any ways that are loops in a file that is output when building tiles called loops.txt. We arent doing anything with it at the moment it was just useful to look at problematic ways. Now that the ways will no longer be problematic can I remove this? Other projects that interface better with map roulette et al are more suited for this anyway in my opinion.

* @return true if the edge has at least 1 shape point
*/
bool is_valid() const {
return attributes.llcount > 0;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in the loop where we create edges in the graph, we only add the edge in the next iteration of the loop. because of this there is always a previous edge that we are tracking but we only want to add it if its not the first previous edge (ie not a real edge). this little function helps make that easier to see. when you initialize an edge without actual data you end up with an edge that has no shape (nodes associated) yet so this function uses that to decide whether the edge is valid or not

@kevinkreiser kevinkreiser added this to the Sprint 10 - Left Turn milestone May 20, 2020
@@ -1828,18 +1830,6 @@ struct graph_callback : public OSMPBF::Callback {
bss_nodes_.reset(bss_nodes);
}

// Output list of wayids that have loops
void output_loops() {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i removed this because its no longer useful now that the code successfully handles loops. we can let other projects more suited to data cleanup handle calling out degeneracies in the data

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we pluck out a few previously loop ways and verify we can route on them successfully now?

danpaz
danpaz previously approved these changes May 20, 2020
Copy link
Collaborator

@danpaz danpaz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just a readability nit and a question.

check_opposing(map);
}

TEST(loops, split_lolli) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love these test case names 😍

@@ -1828,18 +1830,6 @@ struct graph_callback : public OSMPBF::Callback {
bss_nodes_.reset(bss_nodes);
}

// Output list of wayids that have loops
void output_loops() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we pluck out a few previously loop ways and verify we can route on them successfully now?

OSMNode osm_node{node};
auto inserted = loop_nodes_.insert(std::make_pair(node, i));
osm_node.duplicate_ =
!inserted.second || (i != 0 && i != nodes.size() - 1 && nodes[i - 1] == nodes[i + 1]);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit for readability: can you pull out (i != 0 && i != nodes.size() - 1 && nodes[i - 1] == nodes[i + 1]) into a variable and give it a meaningful name?


void check_opposing(const gurka::map& map) {
// then we test the invariant that any edge should be the opposing edge of its opposing edge
// edge == opposing(opposing(edge))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if we should also include this assertion somewhere in data processing to catch bad data in development, what do you think?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we kind of do that already! its here: https://github.com/valhalla/valhalla/blob/master/src/mjolnir/graphvalidator.cc#L164-L179

it basically says if im assigning an opp index to an edge that already got one that means there were duplicates

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But that's only logging a debug statement to the console right? If violating the invariant edge == opposing(opposing(edge)) leads to invalid memory access when querying the service, it seems like we should consider that a fatal error at data processing time.

Copy link
Member

@gknisely gknisely left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a major change. We need to RAD test this in NA and EU.

@coxchapman coxchapman modified the milestones: Sprint 10 - Left Turn, Sprint 11 - Left Turn May 26, 2020
@mookerji mookerji self-assigned this May 27, 2020
@mookerji mookerji assigned kevinkreiser and unassigned mookerji May 27, 2020
@kevinkreiser
Copy link
Member Author

I have to harden the tests on this one: https://www.openstreetmap.org/way/759927840#map=19/51.35764/4.63997

The updated logic does not handle the case where two nodes have been seen before but werent adjacent which means we lose these edges. Specifically we need to track unique pairs of nodes to know whether or not they should be marked as duplicated.

nodes[i + 1] == nodes[inserted.first->second - 1];
bool unflattening = i > 0 && inserted.first->second < nodes.size() - 1 &&
nodes[i - 1] == nodes[inserted.first->second + 1];
osm_node.flat_loop_ = flattening || unflattening;
Copy link
Member Author

@kevinkreiser kevinkreiser Jun 3, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in a previous revision of this pr this got triggered whenever we found the point at which at double back occured or when a node was not unique. it turns out we need to allow nodes to be not unique in the case of legitimate self intersections. so instead we harden to just the case where the going and outgoing paths at a non unique node are the same (ie is a flat loop). the cool thing is that this work also for the node where the double back occurs even though its not duplicated it has the same ingoing and outgoing path. in this case path means neighboring node ids.

danpaz
danpaz previously approved these changes Jun 3, 2020
Copy link
Collaborator

@danpaz danpaz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code looks good, approved pending RAD.

@kevinkreiser
Copy link
Member Author

@gknisely can i get another look. I RAD'd the planet. Because of the time it takes for the planet to complete the planet file used between runs was different. I used a test request file with 14k auto routes in it. Of those there were 166 differences. I reviewed manually 50 of them. And they all ended up being data edits that either: changed the shape slightly so simple diffing failed, changed connectivity slightly so that a detour was taken.

@coxchapman coxchapman modified the milestones: Sprint 11 - Left Turn, Sprint 12 - Left Turn Jun 9, 2020
@kevinkreiser kevinkreiser merged commit 56802f1 into master Jun 10, 2020
@kevinkreiser kevinkreiser deleted the flat_loop branch August 19, 2020 13:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants