Skip to content

Commit

Permalink
9501 Kubernetes Monitoring and Troubleshooting (#291)
Browse files Browse the repository at this point in the history
* Addressed the review comments

Addressed the review comments

* Addressed review comments

Addressed review comments

* Added need help

Added need help

* udpate

* Update log-explorer-gui.md

* Update log-explorer-gui.md

* Update log-explorer-gui.md

* Update log-explorer-gui.md

* Update kubernetes-solution-installation.md

* update

* Feedback Implemented

* Lab Ordering Changes and Lab 1 as read only lab

Lab Ordering Changes and Lab 1 as read only lab

* Lab Restructring

Lab Restructring

* Initial Checkin for OCW-24

Initial Checkin for OCW-24

* numbering fixes in manifest file

numbering fixes in manifest file

* updating the manifest json file

updating the manifest json file

* Update loganalytics-connect-kubernetes-cluster.md

* Lab 2 Interactive Analysis

* few minor updates

few minor updates

* Introduction, Lab 2 & 3 changes

Introduction, Lab 2 & 3 changes

* Update interactive-analytics-and-troubleshooting.md

* Update with latest screenshots

Update with latest screenshots

* removing the support lab

removing the support lab

* Update interactive-analytics-and-troubleshooting.md
  • Loading branch information
vikrredd authored Aug 20, 2024
1 parent 1057d87 commit dc249e7
Show file tree
Hide file tree
Showing 33 changed files with 196 additions and 85 deletions.
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
# Kubernetes Monitoring Solution Deployment

## Introduction

In this lab, you'll connect to the existing Kubernetes Cluster for Monitoring & Log Collection.

Estimated Time: 15 minutes

### Objectives

In this lab, you will see step-by-step instructions to:

- Connect to the existing Kubernetes Cluster


## Task 1: Connect to the existing Kubernetes Cluster
<<<GIF TO BE UPLOADED>>>


**Congratulations!** In this lab, you have successfuly completed the following tasks:
- Connect to the existing Kubernetes Cluster

You may now proceed to the [next lab](#next).

## Acknowledgements
* **Author** - Samarthya Sahu, OCI Logging Analytics
* **Contributors** - Vikram Reddy, Santhosh Kumar Vuda , OCI Logging Analytics
* **Last Updated By/Date** - Samarthya Sahu, Aug, 2023
Original file line number Diff line number Diff line change
Expand Up @@ -10,17 +10,15 @@ Estimated Time: 15 minutes

In this lab, you will see step-by-step instructions to:

- Troubleshoot Kubernetes Workload specific issues
- Troubleshoot Kuberntetes Scheduling issues
- Troubleshoot Container specific issues
- Troubleshoot Kubernetes Workload specific issues.
- Troubleshoot Kuberntetes Scheduling issues in workload.


## Task 1: Understanding and troubleshooting ‘issues’ in workloads: in view logs/insights
## Task 1: Understanding and troubleshooting application specific issue in workload.
In this task we will review the pod(s) which has logged events due to the failure in the liveness probe.
A liveness probe in Kubernetes is a diagnostic tool that checks if a container is running and functioning correctly. If a container fails its liveness probe repeatedly, the kubelet restarts the container. You can read more on different types of probe [here] (https://kubernetes.io/docs/concepts/configuration/liveness-readiness-startup-probes/).

To simulate these events, we have configured a liveness probe that performs a simple **cat** command on a directory in the container which is not present.
Thus resulting in the warning events.
To simulate these events, we have configured a liveness probe that performs a simple **cat** command on a directory which is not present in the container. Thus resulting in the warning events.

1. In the **Workload** tab, click on the **Namespaces** filter and select the namespace **demo-livenessprobe**.
![filter-ns-demo-livenessprobe](images/filter-ns-demo-livenessprobe.png)
Expand All @@ -35,13 +33,13 @@ In this lab, you will see step-by-step instructions to:
6. An **Events** pop-up window will be displayed. Click on the expand icon next to the **demo-livenessprobe** namespace.
![events-popup-ns-demo-livenessprobe](images/events-popup-ns-demo-livenessprobe.png)
7. The detailed information of the events will displayed. The information includes,
- Type of the event e.g Warning, Normal, Failed etc.
- Reason due to which the event is logged. For our use case the container has failed liveness probe and Kubernetes treats it as **unhealty**.
- Message of the event which provides the important insight on what caused the liveness probe event failure. For our use case the message states that liveness probe failed as it could not open the directory.
- **Type** of the event e.g Warning, Normal, Failed etc.
- **Reason** due to which the event is logged. For our use case the container has failed liveness probe and Kubernetes treats it with reason as **unhealty**.
- **Message** of the event which provides the important insight on what caused the liveness probe event failure. For our use case the message states that liveness probe failed as it could not open the directory.

![expand-events-ns-demo-livenessprobe](images/expand-events-ns-demo-livenessprobe.png)

## Task 2: Understanding and troubleshooting ‘pod/application’ issues: Pod logs
## Task 2: Understanding and troubleshooting scheduling specific issue in workload.
In this task we will review the pod(s) which has logged events due to the problem in scheduling.
Kubernetes scheduler selects an optimal node to run newly created or unscheduled pods.
You can read more on Kubernetes scheduler [here] (https://kubernetes.io/docs/concepts/scheduling-eviction/kube-scheduler/).
Expand All @@ -58,26 +56,25 @@ In this lab, you will see step-by-step instructions to:
4. The details of the **Workload** such as **Namespace**, **Name**, **Status** & **Age** will be displayed.
![expand-workload-detail-demo-scheduling](images/expand-workload-detail-demo-scheduling.png)
5. The **Pods by Workload** section will display red polygon(s) corresponding to the number of pods in a failed state.
![ods-by-workloads-ns-demo-scheduling.png](images/pods-by-workloads-ns-demo-scheduling.png)
![pods-by-workloads-ns-demo-scheduling.png](images/pods-by-workloads-ns-demo-scheduling.png)
6. Scroll down to the events section. You will now see the events specific to the namespace **demo-scheduling**.
![events-section-demo-scheduling](images/events-section-demo-scheduling.png)
7. Click on the expand icon in the events section.
![expand-events-section-workloads-tab](images/expand-events-section-workloads-tab-ns-demo-scheduling.png)
8. An **Events** pop-up window will be displayed. Click on the expand icon next to the **demo-scheduling** namespace.
![events-popup-ns-demo-scheduling](images/events-popup-ns-demo-scheduling.png)
9. The detailed information of the events will displayed. The information includes,
- Type of the event e.g Warning, Normal, Failed etc.
- Reason due to which the event is logged. For our use case the container has failed to schedule to due to insufficient resources.
- Message of the event which provides the important insight on what prevented the Kuberentes Scheduler from scheduling the pod. For our use case the message states that Kubernetes has failed to schedule a pod due to insufficient cpu.
- **Type** of the event e.g Warning, Normal, Failed etc.
- **Reason** due to which the event is logged. For our use case the container has failed to schedule to due to insufficient resources and Kubernetes treats it with reason as **failedscheduling**.
- **Message** of the event which provides the important insight on what prevented the Kuberentes Scheduler from scheduling the pod. For our use case the message states that Kubernetes has failed to schedule a pod due to insufficient cpu.
![expand-events-ns-demo-scheduling](images/expand-events-ns-demo-scheduling.png)

## Task 3: Understanding and correlating metrics in ‘analyze’ view

## Task 3: Excercise
In the **Workload** tab, click on the **Namespaces** filter and select the namespace **demo-volumemount** & understand the issue.

**Congratulations!** In this lab, you have successfuly completed the following tasks:
- Troubleshooted Kubernetes Workload specific issues
- Troubleshooted Kuberntetes Scheduling issues
- Troubleshooted Container specific issues
- Troubleshooted Kubernetes Workload specific issues.
- Troubleshooted Kuberntetes Scheduling issues in workload.

You may now proceed to the [next lab](#next).

Expand Down
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
33 changes: 17 additions & 16 deletions logging-analytics/oke-monitoring-la/introduction/introduction.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,9 @@
# Logging Analytics Overview
## Introduction

> Note : To be updated with ocw-24 introduction.
Kubernetes provides a highly robust and extremely customizable platform for managing, automatically deploying and scaling containerized workloads. Building a monitoring and troubleshooting system for this entire environment is a very challenging task. Oracle Cloud Infrastructure (OCI) Logging Analytics bridges this monitoring gap by providing a one-click end-to-end Kubernetes monitoring solution for the underlying infrastructure, Kubernetes platform and cloud-native applications.

This live lab will cover setting up end-to-end monitoring solution for a sample Kubernetes cluster (OKE cluster) which has [MuShop] (https://oracle-quickstart.github.io/oci-cloudnative/) (a cloud-native reference application of several Oracle Cloud services) deployed. It also takes you through various visualizations and perform analytics over the collected data from different perspectives.
This live lab will cover setting up end-to-end monitoring for a sample Kubernetes cluster (OKE) which has workloads simulating various production issues. It will also take you through the steps for troubleshooting the issues in the Kubernetes Cluster. Finally you will visualize the data collected from the OKE Cluster through Dashboards.

Estimated Workshop Time: 01 hours 30 minutes

Expand All @@ -16,10 +14,10 @@ Watch the video below for a quick walk-through of the lab.

In this workshop, you will learn how to:

* Install OCI Kubernetes Monitoring Solution to collect Kubernetes & Linux System logs, application/container logs and Kubernetes Objects logs.
* Set up Management Agent to collect Kubernetes metrics and reporting them to OCI Monitoring.
* Connect to the existing Kubernetes to collect logs such as Kubernetes & Linux System logs, application/container logs and Kubernetes Objects logs & monitoring data.
* Understand & visualize the Kubernetes Cluster topology.
![kubernetes-cluster-topology](images/kubernetes-cluster-topology.png)
* Understand the data model of telemetry collected by Kubernetes Monitoring Solution.
* Review the log data for specific MuShop logs in Log Explorer.
* Visualize the data collected from the OKE Cluster through Dashboards like below.
- ### Kubernetes Cluster Summary

Expand All @@ -31,27 +29,30 @@ In this workshop, you will learn how to:

![kubernetes-pods](images/kubernetes-pods.png)
- ### Kubernetes Workloads
![kubernetes-workloads](images/kubernetes-workloads.png)
* Perform advanced analytics to correlate Infrastructure (LBaaS) and K8S Platform telemetry.

![kubernetes-workloads](images/kubernetes-workloads.png)


### Prerequisites

This lab assumes you have:

* Oracle.com SSO account
* Understanding of Logging Analytics concepts
* Understanding of Kubernetes/OKE concepts and helm
* Familiarity with OCI cloud shell and OCI Console
* Oracle.com SSO account.
* Understanding of Logging Analytics concepts.
* Understanding of Kubernetes/OKE concepts.
* Familiarity with OCI Console.


## Learn More

* [Monitor Kubernetes and OKE clusters with OCI Logging Analytics](https://docs.oracle.com/en/solutions/kubernetes-oke-logging-analytics/index.html)
* [MuShop] (https://oracle-quickstart.github.io/oci-cloudnative/)

* [Kubernetes Solution] (https://docs.oracle.com/en-us/iaas/logging-analytics/doc/kubernetes-solution.html)

* [Kubernetes Workloads] (https://kubernetes.io/docs/concepts/workloads/)


## Acknowledgements
* **Author** - Vikram Reddy , OCI Logging Analytics
* **Contributors** - Vikram Reddy, Santhosh Kumar Vuda , OCI Logging Analytics
* **Last Updated By/Date** - Vikram Reddy, Aug, 2023
* **Contributors** - Vikram Reddy, Heena Rahangdale , OCI Logging Analytics
* **Last Updated By/Date** - Vikram Reddy, Aug, 2024
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,15 @@

## Introduction

In this lab, you will use different features of Logging Analytics features to troubleshoot the issues/problems in Kubernetes Cluster (To Be Updated)
In this lab, you will visualize the data collected from the OKE Cluster through Dashboards.

Estimated Time: 15 minutes
### About
In this lab we will be exploring

- Dashboards: A data visualization tool that gathers real-time data from the various tiers of Kubernetes Cluster.
- Widgets: A component that displays the real-time data.

Estimated Time: 10 minutes

### Objectives

Expand All @@ -14,11 +20,78 @@ In this lab, you will see step-by-step instructions to:
- Visualize data in the Log Explorer


## Task 1: Pre-imported dashboards
## Task 1: Visualize the Pre-imported dashboards

In the connect cluster flow, the solution creates dashboards for the target Kubernetes cluster. These dashboards are available in **CW24\_Logging\_Analytics** compartment for this exercise.
> **Note** : For a quick refresher on connect cluster flow [review Lab 1](?lab=log-explorer-gui).
1. To navigate to the Dashboards page use one of the following method.
- From Navigation Menu ![navigation-menu](images/navigation-menu.png) > **Observability & Management** > **Logging Analytics** > **Dashboards**.

OR

- You can use the direct link to land on the **Dashboards** page.
```
<copy>
https://cloud.oracle.com/loganalytics/dashboards?region=us-phoenix-1
</copy>
```
2. Dashboards page will be displayed.
![dashboards-home](images/dashboards-home.png)

3. Switch to the compartment **CW24\_Logging\_Analytics**.
- From the **Compartment** dropdown select the compartment **CW24\_Logging\_Analytics**.
![cw24-la-compartment](images/cw24-la-compartment.png)
- All the Dashboards in the Compartment **CW24\_Logging\_Analytics** will be displayed.
![cw-la-dashboards](images/cw-la-dashboards.png)

4. Click on the **Kubernetes Cluster Summary** dashboard. It will take few seconds for the dashboard widgets to load.
> **Important tip** : Observe the dashboard widgets & values once they are loaded.
![kubernetes-cluster-summary](images/kubernetes-cluster-summary.png)

5. Selecting the OKE cluster to visualize the dashboards data.
- Click on the **Scope Filter** button.
![scope-filter-kubernetes-cluster-summary](images/scope-filter-kubernetes-cluster-summary.png)
- A scope filter panel will be displayed.
![scope-filter-panel-kubernetes-cluster-summary.png](images/scope-filter-panel-kubernetes-cluster-summary.png)
- Select **oke-cw24** cluster in the **Kubernetes Cluster** field.
![oke-cw24-kubernetes-cluster-field](images/oke-cw24-kubernetes-cluster-field.png)

6. You should be able to see the all the widgets displaying the data specific to your OKE Cluster.
> **Important tip** : Observe the dashboard widgets & values once they are loaded. Did you notice the change ?
![kubernetes-cluster-summary-widgets](images/kubernetes-cluster-summary-widgets.png)

7. Scroll down to the **Container Logs** widget in the dashboard.
![container-logs-kubernetes-cluster-summary](images/container-logs-kubernetes-cluster-summary.png)

8. Click on the View Query Icon to view the query used to populate the data in widget.

![view-query-button-kubernetes-cluster-summary](images/view-query-button-kubernetes-cluster-summary.png)
![query-of-container-logs-kubernetes-cluster-summary](images/query-of-container-logs-kubernetes-cluster-summary.png)

After viewing the query, click on **Close** button.

9. **Exercise** Repeat the steps 7 & 8 for the **Events** widget.





## Task 2: Drill-down to log explorer
1. Click on the Punch Out Icon on the Events widget.
![events-logs-punch-out](images/events-logs-punch-out.png)

2. This will take you to the **Pie Chart view** of Log Explorer in context of Kubernetes Cluster Name.
![kubernetes-events-logs-pie-chart-view](images/kubernetes-events-logs-pie-chart-view.png)

3. **Exercise** Explore the different visualizations on Log Explorer page. Read more about [how to use Log Explorer for analyzing and visualizing logs in Logging Analytics.] (https://apexapps.oracle.com/pls/apex/r/dbpm/livelabs/run-workshop?p210_wid=3887)

![visualization-drop-down](images/visualization-drop-down.png)

4. To navigate back to the Kubernetes Cluster Summary page, click on the **Kubernetes Cluster Summary** as highlighted in the image below.
![kubernetes-cluster-summary-nav-back](images/kubernetes-cluster-summary-nav-back.png)

5. Similarly you can explore other widgets in the Kubernetes Cluster Summary and other dashboards.

**Congratulations!** In this lab, you have successfuly completed the following tasks:
- Visualized Pre-imported dashboards
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -15,18 +15,23 @@
{
"title": "Lab 1: Logging Analytics Connect Kubernetes Cluster",
"description": "This live lab outlines the steps required to connect to the Kubernetes Cluster for monitoring",
"filename": "../../connect-cluster-flow/connect-cluster-flow.md"
},
{
"title": "Lab 2: Understanding the topology of the Kubernetes Cluster",
"description": "This live lab outlines the steps for understanding the topology of Kubernetes Cluster",
"filename": "../../loganalytics-connect-kubernetes-cluster/loganalytics-connect-kubernetes-cluster.md"
},
{
"title": "Lab 2: Interactive Analytics and Troubleshooting",
"title": "Lab 3: Interactive Analytics and Troubleshooting",
"description": "This live lab outlines the steps required to troubleshoot the issues occuring in the Kubernetes Cluster",
"filename": "../../interactive-analytics-and-troubleshooting/interactive-analytics-and-troubleshooting.md"
},
{
"title": "Lab 3: Next Steps for more insights",
"title": "Lab 4: Next Steps for more insights",
"description": "This live lab outlines the steps for getting the more insights of the Kubernetes Monitoring",
"filename": "../../next-steps-for-more-details/next-steps-for-more-details.md"
},
},
{
"title": "Need Help?",
"description": "Solutions to Common Problems and Directions for Receiving Live Help",
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -15,27 +15,27 @@
{
"title": "Lab 1: Logging Analytics Connect Kubernetes Cluster",
"description": "This live lab outlines the steps required to connect to the Kubernetes Cluster for monitoring",
"filename": "../../connect-cluster-flow/connect-cluster-flow.md"
},
{
"title": "Lab 2: Understanding the topology of the Kubernetes Cluster",
"description": "This live lab outlines the steps for understanding the topology of Kubernetes Cluster",
"filename": "../../loganalytics-connect-kubernetes-cluster/loganalytics-connect-kubernetes-cluster.md"
},
{
"title": "Lab 2: Interactive Analytics and Troubleshooting",
"title": "Lab 3: Interactive Analytics and Troubleshooting",
"description": "This live lab outlines the steps required to troubleshoot the issues occuring in the Kubernetes Cluster",
"filename": "../../interactive-analytics-and-troubleshooting/interactive-analytics-and-troubleshooting.md"
},
{
"title": "Lab 3: Next Steps for more insights",
"title": "Lab 4: Next Steps for more insights",
"description": "This live lab outlines the steps for getting the more insights of the Kubernetes Monitoring",
"filename": "../../next-steps-for-more-details/next-steps-for-more-details.md"
},
},
{

"title": "Need Help?",
"filename": "https://oracle-livelabs.github.io/common/labs/need-help/need-help-freetier.md"
},
{

"title": "Oracle CloudWorld 2023 - Support",
"filename": "https://oracle-livelabs.github.io/common/support/ocwsupportlab/ocwsupportlab.md"
}
]
}
Loading

0 comments on commit dc249e7

Please sign in to comment.