Skip to content
This repository has been archived by the owner on Dec 8, 2023. It is now read-only.

OpenNMS Drift Deployment in Kubernetes for testing and learning purposes

Notifications You must be signed in to change notification settings

agalue/opennms-drift-kubernetes

Repository files navigation

OpenNMS Drift in Kubernetes

OpenNMS Drift deployment in Kubernetes.

Diagram

For learning purposes, Helm charts and operators are avoided for this solution on the main components, except the Ingress Controller and Cert-Manager. In the future, that might change to take advantage of these technologies. Nevertheless, the content of this repository is not intended for production environments, as it was designed for learning and testing purposes only.

This deployment contains a fully distributed version of all OpenNMS components and features, with high availability in mind when possible.

There are some additional features available in this particular solution, like Hasura, Cassandra Reaper and Kafka Manager (or CMAK). All of them are optional (added for learning purposes).

Minimum Requirements

  • Install the latest kubectl binary. We'll be using the embedded kustomize to apply the manifests. For troubleshooting purposes, you could install its standalone version.
  • Install the jq command.

NOTE: Depending on the chosen platform, additional requirements might be needed. Check the respective README files for more information.

WARNING: Please note that all the manifests were verified for Kubernetes 1.20. If you're going to use a newer version, please adjust the API versions of the manifests. In particular, batch/v1beta1 for CrobJobs in elasticsearch.curator.yaml, and policy/v1beta1 for PodDisruptionBudget in zookeeper.yaml. Similarly, if you're planing to use a version is older than 1.20, make sure to do the same for networking.k8s.io/v1 in external-access.yaml.

Cluster Configuration

Proceed with the preferred cluster technology:

  • Using Kops on AWS.
  • Using EKS on AWS.
  • Using GKE on Google Compute Platform.
  • Using AKS on Microsoft Azure.
  • Using Minikube on your machine (with restrictions).

Deployment

To facilitate the process, everything is done through kustomize.

To update the default settings, find the common-settings under configMapGenerator inside kustomization.yaml.

To update the default passwords, find the onms-passwords under secretGenerator inside kustomization.yaml.

Each cluster technology explains how to deploy the manifests.

As part of the deployment, some complementary RBAC permissions will be added if there is a need for adding operators and/or administrators to the OpenNMS namespace. Check namespace.yaml for more details.

Use the following to check whether or not all the resources have been created:

kubectl get all --namespace opennms

Minion

This deployment already contains Minions inside the opennms namespace for monitoring devices within the cluster. To have Minions outside the Kubernetes cluster, they should use the following resources to connect to OpenNMS and the dependent applications.

For instance, for AWS using the domain aws.agalue.net, the resources should be:

  • OpenNMS Core: https://onms.aws.agalue.net/opennms
  • GRPC: grpc.aws.agalue.net:443

For example:

kubectl get secret minion-cert -n opennms -o json | jq -r '.data["tls.crt"]' | base64 --decode > client.pem
kubectl get secret minion-cert -n opennms -o json | jq -r '.data["tls.key"]' | base64 --decode > client-key.pem
openssl pkcs8 -topk8 -nocrypt -in client-key.pem -out client-pkcs8_key.pem

docker run --name minion \
 -e OPENNMS_HTTP_USER=admin \
 -e OPENNMS_HTTP_PASS=admin \
 -p 8201:8201 \
 -p 1514:1514/udp \
 -p 1162:1162/udp \
 -p 8877:8877/udp \
 -p 11019:11019 \
 -v $(pwd)/client.pem:/opt/minion/etc/client.pem \
 -v $(pwd)/client-pkcs8_key.pem:/opt/minion/etc/client-key.pem \
 -v $(pwd)/minion.yaml:/opt/minion/minion-config.yaml \
 opennms/minion:28.1.1 -c

IMPORTANT: Make sure to use the same version as OpenNMS. The above contemplates using a custom content for the INSTANCE_ID (see minion.yaml). Make sure it matches the content of kustomization.yaml.

WARNING: Make sure to use your own Domain and Location, and verify that the URLs for OpenNMS and GRPC are correct.

CRITICAL: If you're planning to use the UDP Listeners (Telemetry, Flows, SNMP Traps, Syslog), and you're going to use Docker, make sure to do it on a server running Linux, not a VM, Docker for Mac or Docker for Windows, because of the reasons explained here.

When troubleshoting mTLS with gRPC, the following can help:

curl -o ipc.proto https://raw.githubusercontent.com/OpenNMS/opennms/master/core/ipc/grpc/common/src/main/proto/ipc.proto 2>/dev/null
grpcurl -v --key client-key.pem --cert client.pem --proto ipc.proto grpc.aws.agalue.net:443 OpenNMSIpc/RpcStreaming

A correct output would look like this:

Resolved method descriptor:
// Streams RPC messages between OpenNMS and Minion.
rpc RpcStreaming ( stream .RpcResponseProto ) returns ( stream .RpcRequestProto );

Request metadata to send:
(empty)

Response headers received:
(empty)

Response trailers received:
content-length: 0
content-type: application/grpc
date: Tue, 26 Oct 2021 20:24:31 GMT
strict-transport-security: max-age=15724800; includeSubDomains
Sent 0 requests and received 0 responses

If there you don't specify the client certificate and key, you'll get:

Resolved method descriptor:
// Streams RPC messages between OpenNMS and Minion.
rpc RpcStreaming ( stream .RpcResponseProto ) returns ( stream .RpcRequestProto );

Request metadata to send:
(empty)

Response trailers received:
(empty)
Sent 0 requests and received 0 responses
ERROR:
  Code: Internal
  Message: Bad Request: HTTP status code 400; transport: received the unexpected content-type "text/html"

Which is what's expected according to the Ingress Nginx documentation.

Users Resources

When using AWS using my domain:

  • OpenNMS Core: https://onms.aws.agalue.net/opennms/ (for administrative tasks)
  • OpenNMS UI: https://onmsui.aws.agalue.net/opennms/ (for users/operators)
  • Grafana: https://grafana.aws.agalue.net/
  • Kibana: https://kibana.aws.agalue.net/ (remember to enable monitoring)
  • Kafka Manager: https://kafka-manager.aws.agalue.net/ (make sure to register the cluster using zookeeper.opennms.svc.cluster.local:2181/kafka for the Cluster Zookeeper Hosts, and enable SASL similar to all the clients)
  • Hasura GraphQL API: https://hasura.aws.agalue.net/v1alpha1/graphql
  • Hasura GraphQL Console: https://hasura.aws.agalue.net/console
  • Jaeger UI: https://tracing.aws.agalue.net/
  • Cassandra Reaper: https://cassandra-reaper.aws.agalue.net/webui/

WARNING: Make sure to use your own Domain.

Future Enhancements

  • Add Network Policies to control the communication between components (for example, only OpenNMS needs access to PostgreSQL and Cassandra; other components should not access those resources). A network manager like Calico is required.
  • Design a solution to manage OpenNMS Configuration files (the /opt/opennms/etc directory), or use an existing one like ksync.
  • Add support for Cluster Autoscaler.
  • Add support for monitoring through Prometheus using Prometheus Operator. Expose the UI (including Grafana) through the Ingress controller.
  • Expose the Kubernetes Dashboard through the Ingress controller.
  • Explore Helm, and potentially add support for it.
  • Improve State Management
    • Explore a solution for Cassandra to reattach nodes and scale up or down; or migrate to use existing operators like k8ssandra
    • Replace Cassandra with Cotex, using the TSS Plugin.
    • Explore a solution for PostgreSQL to manage HA like Postgres Operator, or Crunchy Data Operator
    • Explore a Kafka solution like Strimzi, an operator that supports encryption and authentication.