Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[HLD] SONiC GNMI Server Interface Design #996

Merged
merged 22 commits into from
Sep 9, 2022

Conversation

Signed-off-by: Gang Lv ganglv@microsoft.com
@ganglyu
Copy link
Contributor Author

ganglyu commented May 6, 2022

@qiluo-msft would you please review this PR?

@qiluo-msft
Copy link
Contributor


# 1 Feature Overview

SONiC should provide a gNMI/gNOI server interface where the client can communicate with SONiC (server) through the interface using their respective gRPC libraries based on ConfigDB/ApplDB/StateDB/CountersDB schema or SONiC Yang schema.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like you are opening up gNMI to write directly to APPDB (any DB) without any application - this may result in data consistency issues? for security concerns, someone bypasses some checks and directly operates the DB. what is the plan to mitigate?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gNMI can only update table in ApplDB with Yang models.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@madhupalu The requirement comes from DASH project. The gNMI server will only write to limited tables in ApplDB, and these tables must have Yang model defined. In this use case, gNMI play the same role as a RestAPI server.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good.

# 1 Feature Overview

SONiC should provide a gNMI/gNOI server interface where the client can communicate with SONiC (server) through the interface using their respective gRPC libraries based on ConfigDB/ApplDB/StateDB/CountersDB schema or SONiC Yang schema.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will gNMI expose the capability info like which table can be changed, which one can not?

Copy link
Contributor Author

@ganglyu ganglyu Jul 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there's no Yang models for a table, this table can't be changed.
And we can also follow RFC to use "config false" tag, please refer to link.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good with it.


### 1.2.2 Container

All the introduced features will be part of the sonic-gnmi package installed in sonic-gnmi container.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like you are introducing new container - Why introduce a new container instead of reusing telemetry? Telemetry container has been used for config purposes by some applications today, can telemetry container be renamed and continue to be used? If a new container is required, what is the backward compatible/migration plan?

Copy link
Collaborator

@venkatmahalingam venkatmahalingam Jun 29, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We did not have a consensus to ahead with the two containers approach one for config and another one for get and subscription in the community call 6/28, please revisit the design and explain why cant we combine all the gNMI operations in one container (today it is telemetry container but can be renamed as gNMI container), this helps to avoid code duplication.

@bhagatyj mentioned in the call today that there is a need for telemetry container restart for new paths addition, if you could explain why cant we do this on-the-fly without container restart, would be helpful to understand the rationale better.
@prvattem Please add if I have missed to capture anything.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with the above comments, we should keep a single gnmi server in SONiC, maintaining two gnmi servers in the same NOS unnecessarily divide the code and future contributions. If there are enhancement or capability that needs to be addressed, it should be directed towards the current gnmi server. The gnmi server in telemetry container was introduced 2018, there are active PR contributions in github

Copy link

@mm0930 mm0930 Jul 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The gnmi container (Telemetry) already exists today. Is there a reason why the existing gnmi server does not fit / cannot be extended for the intended functions ?
Introducing duplicate containers in the architecture comes with a lot of overhead the community will pay in terms of maintenance.

  • The developer community will be divided on which one to contribute to. Does the developer need to ensure both work ? This is an overhead on multiple contributions.
  • If it's not mandated that both should work, there will be wide divergence on what features will be supported through which container
  • There will be code duplication. Why should the same NOS have two GNMI servers?
    If there is something deficient in the existing container kindly list it clearly, and articulate why the existing container cannot be enhanced (if required) to meet the requirements.

All the introduced features will be part of the sonic-gnmi package installed in sonic-gnmi container.

# 2 Error Handling
supervisord will autorestart gnmi service if it exits after it has successfully started up.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does the design handles do config apply errors?

Copy link
Contributor Author

@ganglyu ganglyu Jun 29, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At first, SONiC GNMI server is on par with SONiC CLI, it will update ConfigDB.
For incremental configuration, GNMI will invoke GCU (generic config updater), and GCU can take care of manual service restart and verify services have absorbed change correctly. And then GNMI can report any error detected by GCU.

https://github.com/sonic-net/SONiC/blob/master/doc/config-generic-update-rollback/SONiC_Generic_Config_Update_and_Rollback_Design.md#3115-change-applier

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There needs to be an application to handle SAI errors for the configs from controller via gNMI server, what's the plan handles this use-case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now, gNMI can't handle SAI errors.
GCU has implemented a framework to take care of manual service restart and verify services have absorbed change correctly. So gNMI server with GCU has better error handling than SONiC CLI.

### 1.2.2 Container

All the introduced features will be part of the sonic-gnmi package installed in sonic-gnmi container.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this feature controlled via build time flags?

Copy link

@chrispsommers chrispsommers Jun 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would strongly suggest you not embed .png files into specs, but instead use editable SVG files, which are compact; act as source; and are directly viewable in a browser or GitHub. PNGs are not editable, so it's not possible to maintain drawings down the road unless you store a separate source file. PNG files are larger as well as appear
blurry. We've switched over to using .svg almost exclusively in the https://github.com/Azure/DASH project. Check out https://www.diagrams.net/. BTW you can export Visio as SVG to do a one-time conversion.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this feature controlled via build time flags?

Not now, any suggestion?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can add a build time flag similar to other features ex: sonic_mgmt_framework

@bhagatyj
Copy link
Contributor

bhagatyj commented Jun 29, 2022 via email


# 1 Feature Overview

SONiC should provide a gNMI/gNOI server interface where the client can communicate with SONiC (server) through the interface using their respective gRPC libraries based on ConfigDB/ApplDB/StateDB/CountersDB schema or SONiC Yang schema.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please explain the need of a requirement to write directly to APPL_DB / STATE_DB. This is against SONIC configuration model where any configuration is written into CONFIG_DB only. Such a bypassing is not correct.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was discussed in the meeting. There are some usecases such as route add/deletes (for e.g vnet routes) which shall be programmed by an external controller. These routes are not intended to be persistent across reboots. It is essentially extending the existing RestAPI model to gNMI.

Copy link
Collaborator

@venkatmahalingam venkatmahalingam Jul 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@prsunny We need to selectively allow (e.g VNET routes, DASH objects in APPL_DB) to control via REST/gNMI server, exposing DB objects without any restriction via north bound interfaces would be a big security concern. I like to see some design for this in this HLD or in a separate HLD.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@prsunny I don't think we can make infrastructure changes to allow specific need for a select-isolated deployment. Tomorrow someone might comeup with a need to directly write to ASIC_DB for effecting shutdown of all port-channels quickly. I don't think these things should be allowed. I don't agree that 'some external entity' wants to write to internal database tables of SONIC' is a requirements that needs to be fulfilled.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with Venkat that it shall be only selectively allowed. @bandaru-viswanath, i tend to disagree that this is infrastructure change. For your comment, if there is an external orchagent, may be it is a valid scenario of writing directly to ASIC_DB. But currently that's not the case. As mentioned, this is not a new design as APP_DB means application DB and depending on where the application sits (like BGP etc), it can act as the producer. Why should we restrict that application must be localized. Also this is not a persitent config. So config_db is out of scope.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with Venkat that it shall be only selectively allowed

Please suggest how it can be selectively enabled, without hardcoding specific details in the container. The only place this can be done is in the GNMI container, and if we do so, we will be hardcoding some specific usecase/deployment requirements into UI.

i tend to disagree that this is infrastructure change.

IMO, it is an infra item to allow access to a DB otherwise considered below user level.

For your comment, if there is an external orchagent, may be it is a valid scenario of writing directly to ASIC_DB. But currently that's not the case. As mentioned, this is not a new design as APP_DB means application DB and depending on where the application sits (like BGP etc), it can act as the producer. Why should we restrict that application must be localized.

I am not arguing whether it is technically feasible. I see and know that it is quite possible the way you described. I wouldn't also be surprised if some non-community SONIC adaptations already do so. I am arguing that it is not the definition of SONiC in terms of how the databases are positioned in the architecture. So that sense, at least to me, its a deviation from the SONiC philosophy and warrants a bigger discussion in community.

Would community accept a PR to enhance this design to write something to ASIC_DB because it is their use case ?

Also this is not a persitent config. So config_db is out of scope.

Its should not be that it has to be APPL_DB because config-db is out of scope. That, at least to me, is not the right argument. It has to be APPL_DB, because that the right place.

Copy link
Contributor

@prsunny prsunny Jul 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please suggest how it can be selectively enabled, without hardcoding specific details in the container. The only place this can be done is in the GNMI container, and if we do so, we will be hardcoding some specific usecase/deployment requirements into UI.

As suggested above, this can be done using Yang.

Yes, APPL_DB is the right place. Im just saying config_db is out of scope for programming something that is not intended to be persistent. I'm not sure on how positioning of APPL database in architecture warrants that only config_db can be modified for write operations. My point is APPL_DB is for application data (irrespective of whether it is local or remote). From an architecture point, it is intended to be written by various different applications (E.g PINS write to APP_DB directly, SDN controller programs Vnet routes directly to APP_DB directly via RestAPI). Its not a new design. Agree with you on the position of ASIC_DB.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a simple clarification: there is no requirement to write directly to STATE_DB, in this HLD, the gNMI server will read from StateDB, like a telemetry server.

* Set and get RPCs must be supported. Customer will use get RPC to retrieve configurations including FRR and VNET route, and use set RPC to apply new configurations.
* SetRequest message must support to incrementally update configuration and fully update configurations.
* Data models can be SONiC Yang models or CONFIG_DB schema.
* Ability to configure/read different DBs - ApplDB, ConfigDB, StateDB, CountersDB etc.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see the comment above, writing directly to APPL_DB/STATE_DB/COUNTERS_DB is against the general norm for SONIC management model.

Copy link
Contributor

@qiluo-msft qiluo-msft Jul 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should rewrite this line to

  • Ability to read different DBs - ApplDB, ConfigDB, StateDB, CountersDB etc.
  • Ability to write different DBs - ApplDB, ConfigDB

Understanding that you still have concern on writing ApplDB. We can keep it in other discussion threads.


### 1.2.2 Container

All the introduced features will be part of the sonic-gnmi package installed in sonic-gnmi container.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no need for a new GNMI server. There is already one GNMI container which runs the server. Please enhance the existing container instead of creating a new one.

| ---- | ---- |
| replace {<br>&ensp;path {<br>&ensp;&ensp;elem {name: "CONFIG_DB"} elem {name: "DEVICE_NEIGHBOR"} elem {name: "Ethernet96"}<br>&ensp;}<br>}<br>replace {<br>&ensp;path {<br>&ensp;&ensp;elem {name: "CONFIG_DB"} elem {name: "DEVICE_NEIGHBOR"} elem {name: "Ethernet8"} elem {name: "port"}<br>&ensp;}<br>&ensp;val { json_ietf_val: "eth1" }<br>} | [<br>&ensp;{<br>&ensp;&ensp;"op": "remove", "path": "/DEVICE_NEIGHBOR/Ethernet96"}, <br>&ensp;&ensp;{ "op": "add", "path": "/DEVICE_NEIGHBOR/Ethernet8/port", "value": "eth1"}<br>] |

#### 1.2.1.5 Full Configurations

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hows the simultaneous configuration from CLI and GNMI protected/serialized ?

Copy link
Contributor Author

@ganglyu ganglyu Jul 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Totally agree. Simultaneous configuration from CLI and GNMI should be protected.
We have plan to update CLI implementation, and it's not covered in this design.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, please enhance the HLD to add appropriate protection.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bandaru-viswanath this is a great question! Currently the HLD of gNMI server does not protect against confliction with CLI, similar to one CLI does not protect against confliction with another CLI today.

In the future vision, there should be a centralized service inside SONiC responsible for all kinds of configuration. The requests could come from CLI/gNMI/RestAPI or SONiC applications. And the request should be serialized if there is confliction. The gNMI server or a sub service under gNMI server may be a candidate for such centralized service.

### 1.1.1 Phase 1 Requirements
* Set and get RPCs must be supported. Customer will use get RPC to retrieve configurations including FRR and VNET route, and use set RPC to apply new configurations.
* SetRequest message must support to incrementally update configuration and fully update configurations.
* Data models can be SONiC Yang models or CONFIG_DB schema.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Current SONIC-Yang models are flat and are not meant for Northbound configurations. How are the dependencies resolved in flat-models ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you explain a little bit more on "SONIC-Yang models are flat and are not meant for Northbound configurations"? What is "flat"?

This gNMI server depends on "Generic Config Updater" design, which could figure out the dependencies from SONiC Yang models. ref: https://github.com/sonic-net/SONiC/blob/master/doc/config-generic-update-rollback/SONiC_Generic_Config_Update_and_Rollback_Design.md#stage-2-json-patch-ordering

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@qiluo-msft Please see the hierarchical yang models here. https://www.openconfig.net/projects/models/ . It shows how data is defined in a hierarchy and also cross referenced. They are useful as NB yang models because of the hierarchy and referencing.

SONIC yang models are flat, and are really meant only for validating config data.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation!
For your original question "How are the dependencies resolved in flat-models ?" SONiC Yang models use must to express the dependencies, for example https://github.com/Azure/sonic-buildimage/blob/a4b9838231b50018193d7ac2927067f0a03b8582/src/sonic-yang-models/yang-models/sonic-vlan.yang#L118. And it could be used for configuration.

@ganglyu
Copy link
Contributor Author

ganglyu commented Jul 4, 2022

@madhupalu @venkatmahalingam @adyeung @bandaru-viswanath @bhagatyj

SONiC telemetry is using prefix target to identify target database, and we can use the same method to support backward compatibility.

  • If prefix target is ConfigDB or ApplDB, gNMI server should follow current design for set request.
  • If prefix target is empty, gNMI server should invoke translib API for set request.

Now we have two gNMI server, one for read and one for configure. When SONiC telemetry and SONiC gNMI are stable in the future, we can use this prefix target solution to merge them to a single container.

@ganglyu ganglyu marked this pull request as draft July 12, 2022 23:49
2. User could use SONiC CLI command "config apply-patch" to incrementally update configurations systematically and validate target configuration using SONiC Yang models. With generic_config_updater, all operations in a 'config apply-patch' command are processed in a single transaction that will either succeed or fail as a whole.
- "config apply-patch" supports ConfigDB, and Yang models.
- The green arrow indicates this process, and dotted arrow indicates Yang validation.
3. SONiC CLI commands can use "config reload" to fully update configurations and validate target configuration using SONiC Yang models.
Copy link
Contributor

@qiluo-msft qiluo-msft Aug 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update

To be more accurate:
fully update configurations -> erase original and fully reload configurations from a file #Closed

madhupalu
madhupalu previously approved these changes Aug 25, 2022
@zhangyanzhao
Copy link
Collaborator

@ganglyu can this PR be merged?

@ganglyu
Copy link
Contributor Author

ganglyu commented Aug 26, 2022

@ganglyu can this PR be merged?

we are still waiting for community review

* In the authentication of client certificate, after the certificate chain is validated by TLS, it will further check if the common name of the end-entity certificate is in the trusted common name list of the server config.

#### 1.2.1.9 ACL
GNMI server will not support OpenConfig Yang models, so ACL configuration will use CONFIG_DB schema and SONiC Yang Models.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GNMI server already supports OpenConfig YANG models via Management framework. This statement is not right and needs to be removed.

* Data models can be SONiC Yang models or CONFIG_DB schema.
* Ability to read different DBs - ApplDB, ConfigDB, StateDB, CountersDB etc.
* Ability to write different DBs - ApplDB, ConfigDB.
* Configurations must be verified using YANG models even it is in CONFIG_DB schema.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How are we planning to achieve this? Are we planning to integrate with the python validator(In which case validation of the bulking configurations will become time-consuming due to loading and unloading of pythong files)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For CONFIG_DB, we can accept time-consuming configurations. Validation is more important to us.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Validation can be done by other means(go lang based validator that will not be time consuming) why integrate with python validator knowing that it is going to be time-consuming.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to use generic config updater to support incremental configuration update.
For now, generic config updater is implemented with Python and it's time consuming, and we can improve generic config updater in the future.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@anand-kumar-subramanian Could you provide some sample of "bulking configurations will become time-consuming" in your mind. If the "bulking configurations" is in one SetRequest, the "loading and unloading of python files" will happen only once and it will not be a significant performance issue. If "bulking configurations" is in multiple SetRequest, it is better to ask client to combine them into one.

@qiluo-msft
Copy link
Contributor

@adyeung @bandaru-viswanath @bhagatyj Could you review again? We believe already resolved all your comments.

@qiluo-msft
Copy link
Contributor

/easycla

@qiluo-msft qiluo-msft merged commit ebde65a into sonic-net:master Sep 9, 2022
// For none-DB data
OTHERS = 100;
// For mixed schema
MIXED_SCHEMA = 101;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MIXED_SCHEMA

The MIXED_SCHEMA is not necessary. The most industry keep Target as empty. We also need to refine the "SetRequest flow chart"

@zhangyanzhao
Copy link
Collaborator

@qiluo-msft can you please double check if this feature has been included in 202211 branch? Thanks.

@qiluo-msft
Copy link
Contributor

Yes, this feature has been included in 202211 branch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.