OpenNMS Drift deployment in Kubernetes.
For learning purposes, Helm
charts and operators
are avoided for this solution on the main components, except the Ingress Controller and Cert-Manager. In the future, that might change to take advantage of these technologies. Nevertheless, the content of this repository is not intended for production environments, as it was designed for learning and testing purposes only.
This deployment contains a fully distributed version of all OpenNMS components and features, with high availability in mind when possible.
There are some additional features available in this particular solution, like Hasura, Cassandra Reaper and Kafka Manager (or CMAK
). All of them are optional (added for learning purposes).
- Install the latest kubectl binary. We'll be using the embedded kustomize to apply the manifests. For troubleshooting purposes, you could install its standalone version.
- Install the jq command.
NOTE: Depending on the chosen platform, additional requirements might be needed. Check the respective
README
files for more information.
WARNING: Please note that all the manifests were verified for Kubernetes 1.20. If you're going to use a newer version, please adjust the API versions of the manifests. In particular,
batch/v1beta1
forCrobJobs
in elasticsearch.curator.yaml, andpolicy/v1beta1
forPodDisruptionBudget
in zookeeper.yaml. Similarly, if you're planing to use a version is older than 1.20, make sure to do the same fornetworking.k8s.io/v1
in external-access.yaml.
Proceed with the preferred cluster technology:
- Using Kops on AWS.
- Using EKS on AWS.
- Using GKE on Google Compute Platform.
- Using AKS on Microsoft Azure.
- Using Minikube on your machine (with restrictions).
To facilitate the process, everything is done through kustomize
.
To update the default settings, find the common-settings
under configMapGenerator
inside kustomization.yaml.
To update the default passwords, find the onms-passwords
under secretGenerator
inside kustomization.yaml.
Each cluster technology explains how to deploy the manifests.
As part of the deployment, some complementary RBAC permissions will be added if there is a need for adding operators and/or administrators to the OpenNMS namespace. Check namespace.yaml for more details.
Use the following to check whether or not all the resources have been created:
kubectl get all --namespace opennms
This deployment already contains Minions inside the opennms namespace for monitoring devices within the cluster. To have Minions outside the Kubernetes cluster, they should use the following resources to connect to OpenNMS and the dependent applications.
For instance, for AWS
using the domain aws.agalue.net
, the resources should be:
- OpenNMS Core:
https://onms.aws.agalue.net/opennms
- GRPC:
grpc.aws.agalue.net:443
For example:
kubectl get secret minion-cert -n opennms -o json | jq -r '.data["tls.crt"]' | base64 --decode > client.pem
kubectl get secret minion-cert -n opennms -o json | jq -r '.data["tls.key"]' | base64 --decode > client-key.pem
openssl pkcs8 -topk8 -nocrypt -in client-key.pem -out client-pkcs8_key.pem
docker run --name minion \
-e OPENNMS_HTTP_USER=admin \
-e OPENNMS_HTTP_PASS=admin \
-p 8201:8201 \
-p 1514:1514/udp \
-p 1162:1162/udp \
-p 8877:8877/udp \
-p 11019:11019 \
-v $(pwd)/client.pem:/opt/minion/etc/client.pem \
-v $(pwd)/client-pkcs8_key.pem:/opt/minion/etc/client-key.pem \
-v $(pwd)/minion.yaml:/opt/minion/minion-config.yaml \
opennms/minion:28.1.1 -c
IMPORTANT: Make sure to use the same version as OpenNMS. The above contemplates using a custom content for the
INSTANCE_ID
(see minion.yaml). Make sure it matches the content of kustomization.yaml.
WARNING: Make sure to use your own Domain and Location, and verify that the URLs for OpenNMS and GRPC are correct.
CRITICAL: If you're planning to use the UDP Listeners (Telemetry, Flows, SNMP Traps, Syslog), and you're going to use Docker, make sure to do it on a server running Linux, not a VM, Docker for Mac or Docker for Windows, because of the reasons explained here.
When troubleshoting mTLS with gRPC, the following can help:
curl -o ipc.proto https://raw.githubusercontent.com/OpenNMS/opennms/master/core/ipc/grpc/common/src/main/proto/ipc.proto 2>/dev/null
grpcurl -v --key client-key.pem --cert client.pem --proto ipc.proto grpc.aws.agalue.net:443 OpenNMSIpc/RpcStreaming
A correct output would look like this:
Resolved method descriptor:
// Streams RPC messages between OpenNMS and Minion.
rpc RpcStreaming ( stream .RpcResponseProto ) returns ( stream .RpcRequestProto );
Request metadata to send:
(empty)
Response headers received:
(empty)
Response trailers received:
content-length: 0
content-type: application/grpc
date: Tue, 26 Oct 2021 20:24:31 GMT
strict-transport-security: max-age=15724800; includeSubDomains
Sent 0 requests and received 0 responses
If there you don't specify the client certificate and key, you'll get:
Resolved method descriptor:
// Streams RPC messages between OpenNMS and Minion.
rpc RpcStreaming ( stream .RpcResponseProto ) returns ( stream .RpcRequestProto );
Request metadata to send:
(empty)
Response trailers received:
(empty)
Sent 0 requests and received 0 responses
ERROR:
Code: Internal
Message: Bad Request: HTTP status code 400; transport: received the unexpected content-type "text/html"
Which is what's expected according to the Ingress Nginx documentation.
When using AWS using my domain:
- OpenNMS Core:
https://onms.aws.agalue.net/opennms/
(for administrative tasks) - OpenNMS UI:
https://onmsui.aws.agalue.net/opennms/
(for users/operators) - Grafana:
https://grafana.aws.agalue.net/
- Kibana:
https://kibana.aws.agalue.net/
(remember to enable monitoring) - Kafka Manager:
https://kafka-manager.aws.agalue.net/
(make sure to register the cluster usingzookeeper.opennms.svc.cluster.local:2181/kafka
for theCluster Zookeeper Hosts
, and enable SASL similar to all the clients) - Hasura GraphQL API:
https://hasura.aws.agalue.net/v1alpha1/graphql
- Hasura GraphQL Console:
https://hasura.aws.agalue.net/console
- Jaeger UI:
https://tracing.aws.agalue.net/
- Cassandra Reaper:
https://cassandra-reaper.aws.agalue.net/webui/
WARNING: Make sure to use your own Domain.
- Add Network Policies to control the communication between components (for example, only OpenNMS needs access to PostgreSQL and Cassandra; other components should not access those resources). A network manager like Calico is required.
- Design a solution to manage OpenNMS Configuration files (the
/opt/opennms/etc
directory), or use an existing one like ksync. - Add support for Cluster Autoscaler.
- Add support for monitoring through Prometheus using Prometheus Operator. Expose the UI (including Grafana) through the Ingress controller.
- Expose the Kubernetes Dashboard through the Ingress controller.
- Explore Helm, and potentially add support for it.
- Improve State Management
- Explore a solution for Cassandra to reattach nodes and scale up or down; or migrate to use existing operators like k8ssandra
- Replace Cassandra with Cotex, using the TSS Plugin.
- Explore a solution for PostgreSQL to manage HA like Postgres Operator, or Crunchy Data Operator
- Explore a
Kafka
solution like Strimzi, an operator that supports encryption and authentication.