Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Patching front port to virtual chassis member interface is broken in UI #11478

Closed
cs-1 opened this issue Jan 12, 2023 · 29 comments · Fixed by #13296
Closed

Patching front port to virtual chassis member interface is broken in UI #11478

cs-1 opened this issue Jan 12, 2023 · 29 comments · Fixed by #13296
Assignees
Labels
severity: low Does not significantly disrupt application functionality, or a workaround is available status: accepted This issue has been accepted for implementation type: bug A confirmed report of unexpected behavior in the application

Comments

@cs-1
Copy link

cs-1 commented Jan 12, 2023

NetBox version

v3.4.2

Python version

3.8

Steps to Reproduce

  1. Create an empty rack (default settings with 42 HUs works).
  2. Add a device with 1 front port to rack.
  3. Create a virtual chassis with at least two members (pos. 1, 2,...) with at least one interface each, no master chassis (!), add members to rack.
  4. Attach a cable from front port to an interface of VC member at pos. 2 or higher.

Expected Behavior

The cable should be attached between front port and VC member's interface. Only the interfaces of selected VC member are shown.

Observed Behavior

The interface selection always show all interfaces of all VC members even if selected VC member is no master. This can lead to confusion because you don't expect to see other VC members' interfaces.

We've made the following observations.

  • This issue only occurs for VC members, not for single devices in the same rack.
  • If VC member's interface X is already patched, interface X is always marked as not available to connect for all other VC members.
  • Patching from the VC member's interface view works which suggests that this is indeed a VC related problem.
  • When in the cable creation UI and when selecting any VC member, scrolling through the interface selector will always list interfaces of all VC members, even if selected VC member isn't a master chassis (or, as in our case, no VC member is configured as master chassis).

I believe that this is an UI issue. My guess is that in /dcim/cables/add, the Interface selector always displays VC member 1's interfaces instead the actual selected VC member.

@cs-1 cs-1 added the type: bug A confirmed report of unexpected behavior in the application label Jan 12, 2023
@cs-1 cs-1 changed the title Patching front port to virtual chassis interface is broken Patching front port to virtual chassis interface is broken in UI Jan 12, 2023
@cs-1 cs-1 changed the title Patching front port to virtual chassis interface is broken in UI Patching front port to virtual chassis member interface is broken in UI Jan 12, 2023
@DanSheps
Copy link
Member

DanSheps commented Jan 20, 2023

So what is happening is we are actually getting all interfaces from the VC when querying this method (if you search for chassis specific interfaces, you will see them). I will need to confirm that this is the intended result, I don't believe so though.

@cs-1
Copy link
Author

cs-1 commented Jan 23, 2023

Hi Dan! The behaviour was different in earlier versions. Regarding the intended result: the observed behaviour would be correct for a master chassis where all interfaces of members of the same VC should be listed. For non-master chassis it should only list the interfaces of that particular chassis. (We don't use the master chassis functionality in our scenario.)

@cs-1
Copy link
Author

cs-1 commented Jan 23, 2023

Quick P.S.: the problem also exists in v3.4.3.

@lp-2
Copy link

lp-2 commented Jan 30, 2023

hi @DanSheps
Could you please tell me the exact path in the source code where the problem is located? I would like to try to fix it.

Thanks in advance

@kkthxbye-code
Copy link
Contributor

Seems to be caused by: #10743

Changing the API to always return all VC interfaces when filtering on device id seems like an error, though I get why it was needed for LAG assignment. Not sure what the correct solution is here. Reverting the change would break LAG assignment again I assume.

@cs-1
Copy link
Author

cs-1 commented Feb 13, 2023

The change definitely breaks what used to work in the UI. Does it make sense to discuss this with @jeremystretch ? @lp-2 is currently trying to figure out a solution but we probably don't have enough insight into the design of NetBox to tell whether our proposed solution is acceptable (if we find one, that is).

@kkthxbye-code
Copy link
Contributor

@cs-1 - A couple of things:

Cable is actually always attached to VC member 1's interface with the same number.

This is not the case for me. When attaching the cable, all interfaces show up but it's still possible to attach them to the right device, it's just impossible to know which device is which if the interfaces are named the same.

but we probably don't have enough insight into the design of NetBox to tell whether our proposed solution is acceptable (if we find one, that is).

I guess one option is to:

Wether this makes sense in a modelling sense I can't tell you, I know next to nothing about VC's.

@cs-1
Copy link
Author

cs-1 commented Feb 13, 2023

This is not the case for me. When attaching the cable, all interfaces show up but it's still possible to attach them to the right device, it's just impossible to know which device is which if the interfaces are named the same.

Yes, same here. But that's exactly the problem. In VC scenarios, interface numbers "y" can be the same on each individual VC member. Only the VC chassis position "x" gives your an identification "x/y" of an interface. There's typically no need to add "x" to each chassis member's interface name, it's redundant information and it also makes using "Device Type" templating difficult to use because in these templates, interfaces are numerated as if not part of a VC (y = 1..48).

These two options sound reasonable. I tend to think that refixing the LAG select field issue is the most plausible solution. It's not logical to get all interfaces of all VC members when querying only a single, non-master VC member.

@kkthxbye-code kkthxbye-code added the status: needs owner This issue is tentatively accepted pending a volunteer committed to its implementation label Feb 13, 2023
@kkthxbye-code
Copy link
Contributor

Yes, same here. But that's exactly the problem. In VC scenarios, interface numbers "y" can be the same on each individual VC member. Only the VC chassis position "x" gives your an identification "x/y" of an interface. There's typically no need to add "x" to each chassis member's interface name, it's redundant information and it also makes using "Device Type" templating difficult to use because in these templates, interfaces are numerated as if not part of a VC (y = 1..48).

I understand the issue, just wanted to clarify that what I quoted from the original issue wasn't correct, as it makes it seem like regardless of which interface you pick it is always connected to the lowest index VC member. If you don't mind, please update the issue text for posterity.

It's not logical to get all interfaces of all VC members when querying only a single, non-master VC member.

I agree, and I'm sure it was unintended, and thus a bug. Let me know if you want to try to contribute a fix yourself :)

@cs-1
Copy link
Author

cs-1 commented Feb 13, 2023

I understand the issue, just wanted to clarify that what I quoted from the original issue wasn't correct, as it makes it seem like regardless of which interface you pick it is always connected to the lowest index VC member. If you don't mind, please update the issue text for posterity.

Oops! Yes, I'll fix the description. Sorry for the confusion.

I agree, and I'm sure it was unintended, and thus a bug. Let me know if you want to try to contribute a fix yourself :)

Yes, that's what we try to figure out whether we can help to fix it. @lp-2 is currently looking into the code.

@lp-2
Copy link

lp-2 commented Feb 16, 2023

Seems to be caused by: #10743

Changing the API to always return all VC interfaces when filtering on device id seems like an error, though I get why it was needed for LAG assignment. Not sure what the correct solution is here. Reverting the change would break LAG assignment again I assume.

I looked at the code in the /dcim/filtersets.py file. In the filter_device_id function, you could add an optional passing parameter that sets the filter if_master to true for the virtual chassis and false for the LAG setting,
but I couldn't find a call to this function, as it seems to be abstracted away by MultiValueNumberFilter.

@kkthxbye-code
Copy link
Contributor

The filterset doesn't really know if it's being used by the cable form or the interface form, as the fields are DynamicModelChoiceFields which just call the API to get the data. As I said the solution is probably to add a toggle to the interface API signifying if LAG members should be included for non-master members and then the new toggle could be added to query_params for the lag field of the InterfaceForm.

@cs-1
Copy link
Author

cs-1 commented Feb 16, 2023

We'd like to help solve this but our Django knowledge is still a little limited. Could you be so kind to point us to the InterfaceForm that you speak of? The idea you mention is what we have in mind but we simply don't seem to find the right "entry point" in the code where to implement it. :) Thanks for your patience, we're slowly but surely getting the hang of it. :)

@kkthxbye-code
Copy link
Contributor

This is the problematic lag field on the InterfaceForm:

https://github.com/netbox-community/netbox/blob/develop/netbox/dcim/forms/model_forms.py#L1353-L1361

You can pass API parameters to the query_params dict. Not sure of the best way to modify the filter/api though.

@cs-1
Copy link
Author

cs-1 commented Feb 16, 2023

Thanks for the pointer. We looked at the code and we can say for sure that the patch in #10743 needs to be reverted because of its side-effects. We're currently trying to figure out how to modify the queryset / query_params to get all cross-VC interfaces in case a device is part of a VC. But we're pretty sure that modifying

vc_interface_ids += device.vc_interfaces(if_master=False).values_list('id', flat=True)
with if_master=False is not the correct way to do this.

@DanSheps
Copy link
Member

DanSheps commented Feb 17, 2023

Yes, same here. But that's exactly the problem. In VC scenarios, interface numbers "y" can be the same on each individual VC member. Only the VC chassis position "x" gives your an identification "x/y" of an interface. There's typically no need to add "x" to each chassis member's interface name, it's redundant information and it also makes using "Device Type" templating difficult to use because in these templates, interfaces are numerated as if not part of a VC (y = 1..48).

Not sure what platform you are using, but on Cisco, switch(or fex or whatever)/slot/interface are all critical to proper interface naming and you shouldn't be skipping on them. Yes, you have to rename them after, but you could easily handle that with a plugin that functions on a trigger or something and a bulk rename is not difficult. Not sure what other platforms do, TBH, but I imagine they are similar in that it is a integral part of the name.

Now, if you are using VC for it's non-intended purpose (MC-LAG, VPC, etc), then that is, unfortunately not something we can account for.

@cs-1
Copy link
Author

cs-1 commented Feb 20, 2023

We don't use VC in a non-intended purpose, see below. What I'm saying is that the patch in #10743 renders the purpose of a master chassis in a VC obsolete because it will turn every chassis into a master chassis when using the filter_device_id() function. That clearly doesn't make sense and it has side effects for every part of NetBox where filter_device_id() is called. In model_forms.py for example, it makes no sense: when attaching a cable, NetBox lets you select a specific device so that NetBox's UI only shows you interfaces on that particular device. With the current patch for filter_device_id(), every interface of a VC is listed instead of only the interface of the selected device (which may be part of a VC). So #10743 breaks the selection of a single device in the UI. This was not the behaviour of the UI before the patch was introduced. This has nothing to do with interface naming in VCs. It's only about breaking the UI when adding a cable (and maybe other places where filter_device_id() is used which we haven't found yet). Note that @arthanson himself says in in #10743 that he doesn't know if the patch may have side effects.

RE: VCs (though this is independent of what's stated above): we do use the VC function as intended. We chose not to make a chassis a master chassis in NetBox because our configuration management will look at every member of a VC to determine the interface configuration to use. The "x/y" notation of an interface is derived from x = position number of VC member in NetBox / y = interface number (as it is done by most switch vendors; x will change when changing the position of a chassis in a stack which is analogue to changing the VC position of a chassis in NetBox). I don't see how that is a problem. We don't use MC-LAG or VPC in that context. That way we don't need to do any interface renaming and can simply use Device Types without changes. I can't find NetBox documentation that states that this is an unintended approach. Using a master chassis and the functionality that comes with it is clearly optional in the docs.

@cs-1
Copy link
Author

cs-1 commented Mar 7, 2023

@kkthxbye-code sorry for the confusion in #11838
My main concern was that changing the filterset may have unintended effects elsewhere, not only in this cable adding UI issue. I'll patiently wait for the UI issue to be fixed. We tried to fix the cable UI issue ourself but found no other way than to revert the filterset.py patch and to patch the LAG issue in model_forms.py rather than changing the UI for adding cables. However, we're not familiar enough with the code yet to find a good fix. Sorry for the fuss, wasn't my intention.

@amk1969
Copy link

amk1969 commented Mar 20, 2023

This bug is leading to unexpected outcomes in the API as well as GUI. Compare results of
https://demo.netbox.dev/dcim/interfaces/?device=mysampledevice2 and
https://demo.netbox.dev/dcim/interfaces/?device_id=110

The proposal from kkthxbye-code appears very reasonable. Allow an additional parameter to the query where the caller can specify whether all interfaces of the virtual-chassis or of the member only should be returned. Defaulting to True on the master and False on the rest. Challenge comes when not specified and devices from different VCs are requested but then it can return 400 error for "ambiguous query".
LAG selection or other GUI/API use cases can select what is appropriate for them.

@thefreakquency
Copy link

I understand that one day it was decided to go a different path, but to me, all this would have been easier to deal with if the complete listing of interfaces would happen on the VC instead of the master member.

In our use case, I have to model a device without any physical interfaces (including power) to represent the stack/VC. This non-racked "device" contains only the virtual interfaces (VLAN interfaces, loopbacks, lags, etc) of the stack. Then I add the physical members to the stack and rack them. This way, each member only holds their physical interfaces. Ideally, the information I put in the device I create first to represent the stack would lie under the VC.

Putting all interfaces under the master member of VC is like mixing the concept of virtual and physical device.

Not sure if it would be too much of a breaking change to consider this again...

@troeger
Copy link

troeger commented Mar 21, 2023

@thefreakquency Totally agreed, here is my argument for having interface listings on virtual chassis ...

#10953 (comment)

@cs-1
Copy link
Author

cs-1 commented Mar 21, 2023

@thefreakquency Yes, that's the reason why we completely avoid using the "master chassis" option in NetBox because it's no logical that a physical chassis seems to have more interfaces than it actually has. Not using the master chassis option may create other problems but for us that approach works. @troeger I agree what you write in the comment that you linked to. That being said, I think it would be a good idea to start a discussion on the Discussions page. My hope still is that the patch that causes unwanted side effects (on top of the UI issues described here) will be reverted.

@DanSheps
Copy link
Member

Totally agreed, here is my argument for having interface listings on virtual chassis ...

Put in a FR for this, this somewhat makes sense but we still need to include master interfaces for lags and other devices in the API queries so this likely won't solve the problem you are seeking to resolve.

In our use case

I understand you have a specific use case, however from your description you are likely, for lack of a better word, abusing how the virtual chassis and devices are suppose to be used.

My hope still is that the patch that causes unwanted side effects (on top of the UI issues described here) will be reverted.

This isn't going to happen. As mentioned, it needs to be properly resolved and not just reverted to the previous bad state.

Personally, my view is this:

All members queried should return all members of the stack no matter which member you are quering as this is how it behaves in real life with a shared management and control plane, which is what VirtualChassis is for.

In an instance where there is separate management and control, you should not be using VirtualChassis.

@thefreakquency
Copy link

thefreakquency commented Mar 30, 2023

@DanSheps Having all interfaces under each members would be a really good idea, as long we are able to have a management IP on the VC. Stacked switches or firewalls can have both separate & shared management.

In an ideal world full of unicorns, the VC should represent the virtual instance of that switch, which has management. A few more items would be also needed to represent the non-physical characteristics of a stack/cluster, like:

  • Status,
  • Role,
  • Platform
  • Tags,
  • Services

This would replicate real-life scenario where you document & manage both the physical device (Rack, serial number, interfaces and anything tangible), and the virtual stack (Configuration, management, etc...). Each physical device could have it's own primary management IP, as it is sometime possible to manage each member individually, while having a Stack management IP).
This is not technically a separate management and control per se. They share the control plane as they are aggregated in one logical instance (VC), but have separate physical characteristics.

As an example, in Juniper world, each physical management IP would likely be on me0 or fxp0 of each device, and the shared management would be on vme0. In Fortinet, you would assign an IP to mgmt port of each device, while having a cluster management IP.

@DanSheps
Copy link
Member

Stacked switches or firewalls can have both separate & shared management.

Can you give me an example of this? I have never met a stacked switch that has a separate MP. I have also never met a stacked firewall.

@DanSheps
Copy link
Member

If we changed the serializer to provide the device name as part of the interface lookup, would that suffice for most people?

@cs-1
Copy link
Author

cs-1 commented Jun 20, 2023

If we changed the serializer to provide the device name as part of the interface lookup, would that suffice for most people?

Yes, that would be a very useful and a good solution.

@jeremystretch jeremystretch added the severity: low Does not significantly disrupt application functionality, or a workaround is available label Jun 23, 2023
@DanSheps DanSheps added status: accepted This issue has been accepted for implementation and removed status: needs owner This issue is tentatively accepted pending a volunteer committed to its implementation labels Jul 11, 2023
@DanSheps DanSheps self-assigned this Jul 11, 2023
DanSheps added a commit that referenced this issue Jul 11, 2023
@DanSheps
Copy link
Member

After discussion in the maintainers meeting, we are going to shift the direction:

  1. Restore the existing functionality
  2. Add a new "vc" attribute to return all VC interfaces on all devices if selected.

@jeremystretch
Copy link
Member

Because this results in a breaking change to the device and device_id filters for interfaces, it's being introduced in NetBox v3.6.0. The virtual_chassis_member and virtual_chassis_member_id filters introduced in PR #13296 can be used to replicate the previous behavior.

jeremystretch added a commit that referenced this issue Aug 30, 2023
…aces (#13296)

* Add `vc_interfaces` flag to control interface queryset

* Fix test failure

* Add new filters instead of using undocumented query params

* Cleanup filterset, add test

* Rename filter and re-introduce virtual_chassis filtering method (required)

* Fix test

* Adjust tests to more accurately provide coverage

* Add breaking change note

* Misc cleanup

---------

Co-authored-by: Jeremy Stretch <jstretch@netboxlabs.com>
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Nov 29, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
severity: low Does not significantly disrupt application functionality, or a workaround is available status: accepted This issue has been accepted for implementation type: bug A confirmed report of unexpected behavior in the application
Projects
None yet
8 participants