Skip to content

Commit

Permalink
Add disk monitoring (#233)
Browse files Browse the repository at this point in the history
* Update the server-side api

* In theory, add disk stuff to the front end

* Working as dev environment

* shift config to individual views, tweak the CONTRIB docs, and add an example config

* Update the readme

* Update static/main.js to pass eslint, and create a single style entry

* Correct debugging mis-naming

* Replace missing semicolon..

* fix: Compute disk warning state with config.disk_warning_threshold

* feat: Add model class for keeping resource warnings

* feat: Condition to flash warnings no looks at all computed warnings

* chore: Run lint fix

* chore: Run lint fix again

* chore: Address critical and high dependabot flagged packages

* task: remove console log

Co-authored-by: Michał Krassowski <5832902+krassowski@users.noreply.github.com>

* Update CONTRIBUTING.md

Co-authored-by: Michał Krassowski <5832902+krassowski@users.noreply.github.com>

* Fix typo in CONTRIBUTING.md

Co-authored-by: Michał Krassowski <5832902+krassowski@users.noreply.github.com>

* Fix typo in README.md docs

Co-authored-by: Michał Krassowski <5832902+krassowski@users.noreply.github.com>

* Fix server extension docs language

Co-authored-by: Michał Krassowski <5832902+krassowski@users.noreply.github.com>

* Fix typo regarding disk warning thresholds

Co-authored-by: Michał Krassowski <5832902+krassowski@users.noreply.github.com>

* Catch Exception instead of nothing at all

Co-authored-by: Michał Krassowski <5832902+krassowski@users.noreply.github.com>

* Update docs and delete example server config

---------

Co-authored-by: Ian Stuart <Ian.Stuart@ed.ac.uk>
Co-authored-by: Michał Krassowski <5832902+krassowski@users.noreply.github.com>
  • Loading branch information
3 people authored Jul 31, 2024
1 parent 6f15ef9 commit 41d88a2
Show file tree
Hide file tree
Showing 21 changed files with 471 additions and 165 deletions.
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,8 @@ __pycache__/

# Distribution / packaging
.Python
.direnv
.envrc
env/
build/
develop-eggs/
Expand Down
8 changes: 8 additions & 0 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -127,6 +127,14 @@ JupyterLab v3.0.0
jupyter-resource-usage v0.1.0 enabled OK
```

## Which code creates what content

The stats are created by the server-side code in `jupyter_resource_usage`.

For the jupyterlab 4 / notebook 7 UIs, the code in `packages/labextension` creates and writes the content for both the statusbar and the topbar.

The topbar is defined in the schema, whilst the contents of the statusbar is driven purely by the labextension code.... and labels are defined by their appropriate `*View.tsx` file

## pre-commit

`jupyter-resource-usage` has adopted automatic code formatting so you shouldn't need to worry too much about your code style.
Expand Down
21 changes: 20 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -134,6 +134,23 @@ memory:

![Screenshot with CPU and memory](./doc/statusbar-cpu.png)

### Disk [partition] Usage

`jupyter-resource-usage` can also track disk usage [of a defined partition] and report the `total` and `used` values as part of the `/api/metrics/v1` response.

You enable tracking by setting the `track_disk_usage` trait (disabled by default):

```python
c = get_config()
c.ResourceUseDisplay.track_disk_usage = True
```

The values are from the partition containing the folder in the trait `disk_path` (which defaults to `/home/joyvan`). If this path does not exist, disk usage information is omitted from the display.

Mirroring CPU and Memory, the trait `disk_warning_threshold` signifies when to flag a usage warning, and like the others, it defaults to `0.1` (10% remaining)

![Screenshot with Disk, CPU, and memory](./doc/statusbar_disk.png)

### Disable Prometheus Metrics

There is a [known bug](https://github.com/jupyter-server/jupyter-resource-usage/issues/123) with Prometheus metrics which
Expand All @@ -157,9 +174,11 @@ render the alternative frontend in the topbar.
Users can change the label and refresh rate for the alternative frontend using settings
editor.

(The vertical bars are included by default, to help separate the three indicators.)

## Resources Displayed

Currently the server extension only reports memory usage and CPU usage. Other metrics will be added in the future as needed.
Currently the server extension reports disk usage, memory usage and CPU usage. Other metrics will be added in the future as needed.

Memory usage will show the PSS whenever possible (Linux only feature), and default to RSS otherwise.

Expand Down
Binary file modified doc/settings.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added doc/statusbar_disk.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
14 changes: 14 additions & 0 deletions jupyter_resource_usage/api.py
Original file line number Diff line number Diff line change
Expand Up @@ -75,6 +75,20 @@ async def get(self):

metrics.update(cpu_percent=cpu_percent, cpu_count=cpu_count)

# Optionally get Disk information
if config.track_disk_usage:
try:
disk_info = psutil.disk_usage(config.disk_path)
except Exception:
pass
else:
metrics.update(disk_used=disk_info.used, disk_total=disk_info.total)
limits["disk"] = {"disk": disk_info.total}
if config.disk_warning_threshold != 0:
limits["disk"]["warn"] = (disk_info.total - disk_info.used) < (
disk_info.total * config.disk_warning_threshold
)

self.write(json.dumps(metrics))

@run_on_executor
Expand Down
49 changes: 48 additions & 1 deletion jupyter_resource_usage/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@
from traitlets import Int
from traitlets import List
from traitlets import TraitType
from traitlets import Unicode
from traitlets import Union
from traitlets.config import Configurable

Expand All @@ -27,7 +28,7 @@ def validate(self, obj, value):
keys = list(value.keys())
if "name" in keys:
keys.remove("name")
if all(key in ["kwargs", "attribute"] for key in keys):
if all(key in ["args", "kwargs", "attribute"] for key in keys):
return value
self.error(obj, value)

Expand All @@ -37,6 +38,15 @@ class ResourceUseDisplay(Configurable):
Holds server-side configuration for jupyter-resource-usage
"""

# Needs to be defined early, so the metrics can use it.
disk_path = Union(
trait_types=[Unicode(), Callable()],
default_value="/home/joyvan",
help="""
A path in the partition to be reported on.
""",
).tag(config=True)

process_memory_metrics = List(
trait=PSUtilMetric(),
default_value=[{"name": "memory_info", "attribute": "rss"}],
Expand All @@ -56,6 +66,19 @@ class ResourceUseDisplay(Configurable):
trait=PSUtilMetric(), default_value=[{"name": "cpu_count"}]
)

process_disk_metrics = List(
trait=PSUtilMetric(),
default_value=[],
)

system_disk_metrics = List(
trait=PSUtilMetric(),
default_value=[
{"name": "disk_usage", "args": [disk_path], "attribute": "total"},
{"name": "disk_usage", "args": [disk_path], "attribute": "used"},
],
)

mem_warning_threshold = Float(
default_value=0.1,
help="""
Expand Down Expand Up @@ -123,6 +146,30 @@ def _mem_limit_default(self):
def _cpu_limit_default(self):
return float(os.environ.get("CPU_LIMIT", 0))

track_disk_usage = Bool(
default_value=False,
help="""
Set to True in order to enable reporting of disk usage statistics.
""",
).tag(config=True)

@default("disk_path")
def _disk_path_default(self):
return str(os.environ.get("HOME", "/home/joyvan"))

disk_warning_threshold = Float(
default_value=0.1,
help="""
Warn user with flashing lights when disk usage is within this fraction
total space.
For example, if total size is 10G, `disk_warning_threshold` is 0.1,
we will start warning the user when they use (10 - (10 * 0.1)) G.
Set to 0 to disable warning.
""",
).tag(config=True)

enable_prometheus_metrics = Bool(
default_value=True,
help="""
Expand Down
29 changes: 20 additions & 9 deletions jupyter_resource_usage/metrics.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,10 +13,10 @@ def __init__(self, server_app: ServerApp):
]
self.server_app = server_app

def get_process_metric_value(self, process, name, kwargs, attribute=None):
def get_process_metric_value(self, process, name, args, kwargs, attribute=None):
try:
# psutil.Process methods will either return...
metric_value = getattr(process, name)(**kwargs)
metric_value = getattr(process, name)(*args, **kwargs)
if attribute is not None: # ... a named tuple
return getattr(metric_value, attribute)
else: # ... or a number
Expand All @@ -26,25 +26,28 @@ def get_process_metric_value(self, process, name, kwargs, attribute=None):
except BaseException:
return 0

def process_metric(self, name, kwargs={}, attribute=None):
def process_metric(self, name, args=[], kwargs={}, attribute=None):
if psutil is None:
return None
else:
current_process = psutil.Process()
all_processes = [current_process] + current_process.children(recursive=True)

process_metric_value = lambda process: self.get_process_metric_value(
process, name, kwargs, attribute
process, name, args, kwargs, attribute
)

return sum([process_metric_value(process) for process in all_processes])

def system_metric(self, name, kwargs={}, attribute=None):
def system_metric(self, name, args=[], kwargs={}, attribute=None):
if psutil is None:
return None
else:
# psutil functions will either return...
metric_value = getattr(psutil, name)(**kwargs)
# psutil functions will either raise an error, or return...
try:
metric_value = getattr(psutil, name)(*args, **kwargs)
except:
return None
if attribute is not None: # ... a named tuple
return getattr(metric_value, attribute)
else: # ... or a number
Expand All @@ -63,8 +66,11 @@ def get_metric_values(self, metrics, metric_type):
return metric_values

def metrics(self, process_metrics, system_metrics):
metric_values = self.get_metric_values(process_metrics, "process")
metric_values.update(self.get_metric_values(system_metrics, "system"))
metric_values = {}
if process_metrics:
metric_values.update(self.get_metric_values(process_metrics, "process"))
if system_metrics:
metric_values.update(self.get_metric_values(system_metrics, "system"))

if any(value is None for value in metric_values.values()):
return None
Expand All @@ -80,3 +86,8 @@ def cpu_metrics(self):
return self.metrics(
self.config.process_cpu_metrics, self.config.system_cpu_metrics
)

def disk_metrics(self):
return self.metrics(
self.config.process_disk_metrics, self.config.system_disk_metrics
)
14 changes: 13 additions & 1 deletion jupyter_resource_usage/prometheus.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,14 @@ def __init__(self, metricsloader: PSUtilMetricsLoader):
self.config = metricsloader.config
self.session_manager = metricsloader.server_app.session_manager

gauge_names = ["total_memory", "max_memory", "total_cpu", "max_cpu"]
gauge_names = [
"total_memory",
"max_memory",
"total_cpu",
"max_cpu",
"max_disk",
"current_disk",
]
for name in gauge_names:
phrase = name + "_usage"
gauge = Gauge(phrase, "counter for " + phrase.replace("_", " "), [])
Expand All @@ -34,6 +41,11 @@ async def __call__(self, *args, **kwargs):
if cpu_metric_values is not None:
self.TOTAL_CPU_USAGE.set(cpu_metric_values["cpu_percent"])
self.MAX_CPU_USAGE.set(self.apply_cpu_limit(cpu_metric_values))
if self.config.track_disk_usage:
disk_metric_values = self.metricsloader.disk_metrics()
if disk_metric_values is not None:
self.CURRENT_DISK_USAGE.set(disk_metric_values["disk_usage_used"])
self.MAX_DISK_USAGE.set(disk_metric_values["disk_usage_total"])

def apply_memory_limit(self, memory_metric_values) -> Optional[int]:
if memory_metric_values is None:
Expand Down
Loading

0 comments on commit 41d88a2

Please sign in to comment.