-
Notifications
You must be signed in to change notification settings - Fork 92
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ensure that subnets are dissociated from routers in the ManageIQ inventory when their interfaces are removed on the OSP side #182
Conversation
# Always default this to nil so that if a interface that connects this | ||
# to a router are removed, then the association is properly broken when | ||
# the subnet is refreshed. | ||
subnet.network_router = nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just quick question @mansam, where is the code which connects subnet to a network router?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hm, this might not be the best. I think that targeted refresh would be then nullifying the relation (unless it always refreshes together with the interface)
Best would be to just define dependent => :nullify on the relation? That way if router is deleted, the relation disappears. But we probably can't do that on the network_port? Hm we could do custom disconnect method that would nullify it when port is being deleted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't seem like we could do dependent => :nullify since the port is the thing that gets deleted, but the relationship that has to change is the subnet -> router relationship. I looked into using a custom destroy hook, but save_inventory_multi
uses delete
so the destroy hook never gets called.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right, save_inventory_multi is for old refresh (that doesn't have targeted refresh)
for having custom delete method you need to specify it like https://github.com/Ladas/manageiq/blob/0815e847bd63282129eee90869bf9bf321c0224d/app/models/manager_refresh/inventory_collection_default.rb#L7
then it expects the instance method on the model, https://github.com/Ladas/manageiq/blob/2a1d3e3a5941a74aafe65663547047b255031ba1/app/models/vm_or_template.rb#L648
it "removes subnet link to router if the interface is deleted" do | ||
setup_mocked_collector | ||
|
||
EmsRefresh.refresh(@ems) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we will also need targeted refresh spec, checking refresh of the subnet is not nullifying the relation
4f7d63d
to
3b2706e
Compare
def disconnect_port | ||
# Some ports link subnets to routers, so | ||
# sever that association if the port is removed | ||
cloud_subnets.each do |subnet| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These might be the wrong subnets? I think we fill these in only for Vm's network ports?
We should look at belongs_to :device, :polymorphic => true
, that relations for subnets is being fixed in https://github.com/ManageIQ/manageiq-providers-openstack/pull/188/files (could you verify it actually returns subnet here?)
Also we should check if we can conventionalize this by subnet.network_router = nil, only if the router is the router mentioned in the NetowrkPort (we might not store this). To avoid nullifying when the subnet was just connected somewhere else.
Hm, this approach is getting more complex than it should.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe we should even remodel the relation, to not use subnet.network_router
, but to just use the network_port and sending the query through it (i wound need to check how this affects other providers)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The router gets returned out of find_device_object
correctly there. Both parser.network_ports
and find_device_object
use the network_port.fixed_ips
to associate subnets so I was thinking those were the correct subnets. The tests pass correctly, at least. (though perhaps this catches unrelated subnets? I don't have a good mental model of all the circumstances in which subnets get attached to ports and routers.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right so there is NetworkPort which is interface of a VM (Vm is the device) and there is a NetworkPort that is interface of router. I think the network_port.cloud_subnets works only for Vm's interfaces.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does appear to work in my test case for this PR, so that must not be the case that it only works on ports that are for VMs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hm, the spec looks correct. So it must work :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Though a spec case where this might break is that the targeted refresh contains both removing of old interface and adding new one. That might cause CloudSubnet.first.network_router_id to be nil, while it should point to a new router. Could you add this spec scenario?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, I'll write another spec.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wrote a spec for the situation where the port is removed and a new one is created in the same refresh. network_router_id
becomes nil as you suspected, but I don't understand why exactly.
I inspected what was happening when the custom delete is called, and it looks like at that time the replacement NetworkPort doesn't exist yet. So if the old NetworkPort is the only one in the database at the time, wouldn't the new NetworkPort have to be created afterwards and set the id again?
|
||
it "targeted refresh shouldn't remove subnet link to router unless the interface is deleted" do | ||
setup_mocked_collector | ||
::Settings.ems_refresh.openstack_network.allow_targeted_refresh = true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
btw. there is a helper for stubbing settings
Does the direct assignment work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hm, the ::Settings.ems_refresh.openstack_network.allow_targeted_refresh = false
might bite us, since specs run in parallel in CI. So we should use the stub that is isolated to 1 spec
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The direct assignment does work, but you're right that there's risk of cross contamination since the tests are in parallel. I'll adjust to use the settings stub.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right, probably do another cleanup PR, since it's in all specs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, Nice specs
2303d7a
to
6755c7f
Compare
This comment got buried by github after I touched the code the review was on, so I am repeating it here: @Ladas I wrote a spec for the situation where the port is removed and a new one is created in the same refresh. network_router_id becomes nil as you suspected, but I don't understand why exactly. I inspected what was happening when the custom delete is called, and it looks like at that time the replacement NetworkPort doesn't exist yet. So if the old NetworkPort is the only one in the database at the time, wouldn't the new NetworkPort have to be created afterwards and set the id again? The timing is confusing to me, and if this isn't possible to solve with a custom delete method then I'm pretty stumped on how to fix this bug. |
@mansam right, so this is because of the ordering in the graph. The subnet table is processed first and is storing the new relation to new router. After that, the network_ports are processed, where part of it we delete the network port and touch a subnet that was already saved correctly. |
# Some ports link subnets to routers, so | ||
# sever that association if the port is removed | ||
cloud_subnets.each do |subnet| | ||
subnet.network_router = nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So to correctly cover the latest spec. We would need info on network_port, to what router it relates to (ems_ref). Then here, we could check that if it's still the same router, we can nullify it. If it's a different router we should not touch it.
But it's kind of clunky (and it will not work in parallel). I think in the end, we should look into how to change the modeling. So that 1 model doesn't have to touch relations of other models. So e.g. doing the actual relation through NetworkPort and deleting the subnet.network_router foreign key.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The subnet gets network_router set on it as part of parsing network ports. I take your comment about timing mean that the whole inventory is collected and parsed, and then subnets are saved before network ports, resulting in the property being cleared?
We can't compare whether the router is the same, because the user could delete and recreate the same router relationship. That would shrink the edge case though. I agree that the appropriate long term solution is to fix the DB modeling, but a fix that touches the DB is probably too high impact for this part of the release cycle.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can't compare whether the router is the same, because the user could delete and recreate the same router relationship
@mansam @aufi that sounds like very rare, race condition kind of situation that we can fix with the more elegant solution involving DB Schema change. Would the current simpler fix cover 99.99% of use cases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, just comparing will probably get to 99% cases being correct. And the 1% will be fixed by next full refresh.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we could also just set the test as pending, the current state is like 95% correct, where the 5% will be fixed by full refresh. Setting/changing router interfaces is a very rare operation, comparing to others, so it should not be an issue.
6755c7f
to
4d147e9
Compare
4d147e9
to
81dbf4e
Compare
Checked commits mansam/manageiq-providers-openstack@22febb8~...81dbf4e with ruby 2.3.3, rubocop 0.52.0, haml-lint 0.20.0, and yamllint 1.10.0 |
@mansam the CI failure looks unrelated |
The code looks good to me, I'm afraid of failing tests (re-triggered test and still failing). |
@aufi the failing tests are being fixed in ManageIQ/manageiq#16824, I think you can merge all PRs, that have only these 3 failures |
@Ladas Ok, thanks for info. LGTM, merging! |
…uring-refresh Ensure that subnets are dissociated from routers in the ManageIQ inventory when their interfaces are removed on the OSP side (cherry picked from commit 5acca17) Fixes https://bugzilla.redhat.com/show_bug.cgi?id=1535155
Gaprindashvili backport details:
|
Defaults the
network_router
field on subnets tonil
during refresh. If the interface linking subnet to router still exists on the OSP side, the association will be restored while updating ports.Fixes https://bugzilla.redhat.com/show_bug.cgi?id=1449260