Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Telemetry and Event Logging #233

Closed
wants to merge 19 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 2 additions & 9 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,20 +1,13 @@
MANIFEST
docs/source/operators/events
build
dist
_build
docs/man/*.gz
docs/source/api/generated
docs/source/config.rst
docs/gh-pages
notebook/i18n/*/LC_MESSAGES/*.mo
notebook/i18n/*/LC_MESSAGES/nbjs.json
notebook/static/components
notebook/static/style/*.min.css*
notebook/static/*/js/built/
notebook/static/*/built/
notebook/static/built/
notebook/static/*/js/main.min.js*
notebook/static/lab/*bundle.js
docs/source/events
node_modules
*.py[co]
__pycache__
Expand Down
3 changes: 3 additions & 0 deletions MANIFEST.in
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,9 @@ include setupbase.py
include Dockerfile
graft tools

# Event Schemas
graft jupyter_server/event-schemas

# Documentation
graft docs
exclude docs/\#*
Expand Down
3 changes: 2 additions & 1 deletion docs/doc-requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -8,4 +8,5 @@ prometheus_client
sphinxcontrib_github_alt
sphinxcontrib-openapi
sphinxemoji
git+https://github.com/pandas-dev/pydata-sphinx-theme.git@master
git+https://github.com/pandas-dev/pydata-sphinx-theme.git@master
jupyter_telemetry_sphinxext
3 changes: 2 additions & 1 deletion docs/environment.yml
Original file line number Diff line number Diff line change
Expand Up @@ -13,4 +13,5 @@ dependencies:
- sphinxcontrib_github_alt
- sphinxcontrib-openapi
- sphinxemoji
- git+https://github.com/pandas-dev/pydata-sphinx-theme.git@master
- git+https://github.com/pandas-dev/pydata-sphinx-theme.git@master
- sphinx-jsonschema
13 changes: 13 additions & 0 deletions docs/source/_static/theme_overrides.css
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
/* override table width restrictions */
@media screen and (min-width: 767px) {

.wy-table-responsive table td {
/* !important prevents the common CSS stylesheets from overriding
this as on RTD they are loaded after this stylesheet */
white-space: normal !important;
}

.wy-table-responsive {
overflow: visible !important;
}
}
14 changes: 12 additions & 2 deletions docs/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -70,8 +70,7 @@
'sphinx.ext.mathjax',
'IPython.sphinxext.ipython_console_highlighting',
'sphinxcontrib_github_alt',
'sphinxcontrib.openapi',
'sphinxemoji.sphinxemoji'
'jupyter_telemetry_sphinxext'
]

# Add any paths that contain templates here, relative to this directory.
Expand Down Expand Up @@ -208,6 +207,12 @@
# since it is needed to properly generate _static in the build directory
html_static_path = ['_static']

html_context = {
'css_files': [
'_static/theme_overrides.css', # override wide tables in RTD theme
],
}

# Add any extra paths that contain custom files (such as robots.txt or
# .htaccess) here, relative to this directory. These files are copied
# directly to the root of the documentation.
Expand Down Expand Up @@ -371,3 +376,8 @@

# import before any doc is built, so _ is guaranteed to be injected
import jupyter_server.transutils

# Jupyter telemetry configuration values.
jupyter_telemetry_schema_source = "../jupyter_server/event-schemas" # Path is relative to conf.py
jupyter_telemetry_schema_output = "source/operators/events" # Path is relative to conf.py
jupyter_telemetry_index_title = "Telemetry Event Schemas" # Title of the index page that lists all found schemas.
61 changes: 61 additions & 0 deletions docs/source/eventlog.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
Eventlogging and Telemetry
==========================

The Notebook Server can be configured to record structured events from a running server using Jupyter's `Telemetry System`_. The types of events that the Notebook Server emits are defined by `JSON schemas`_ listed below_ emitted as JSON data, defined and validated by the JSON schemas listed below.


.. _logging: https://docs.python.org/3/library/logging.html
.. _`Telemetry System`: https://github.com/jupyter/telemetry
.. _`JSON schemas`: https://json-schema.org/

Emitting Server Events
----------------------

Event logging is handled by its ``Eventlog`` object. This leverages Python's standing logging_ library to emit, filter, and collect event data.

To begin recording events, you'll need to set two configurations:

1. ``handlers``: tells the EventLog *where* to route your events. This trait is a list of Python logging handlers that route events to
2. ``allows_schemas``: tells the EventLog *which* events should be recorded. No events are emitted by default; all recorded events must be listed here.

Here's a basic example for emitting events from the `contents` service:

.. code-block::

import logging

c.EventLog.handlers = [
logging.FileHandler('event.log'),
]

c.EventLog.allowed_schemas = [
'hub.jupyter.org/server-action'
]

The output is a file, ``"event.log"``, with events recorded as JSON data.

`eventlog` endpoint
-------------------

The Notebook Server provides a public REST endpoint for external applications to validate and log events
through the Server's Event Log.

To log events, send a `POST` request to the `/api/eventlog` endpoint. The body of the request should be a
JSON blog and is required to have the follow keys:

1. `'schema'` : the event's schema ID.
2. `'version'` : the version of the event's schema.
3. `'event'` : the event data in JSON format.

Events that are validated by this endpoint must have their schema listed in the `allowed_schemas` trait listed above.

.. _below:


Server Event schemas
--------------------

.. toctree::
:maxdepth: 2

events/index
3 changes: 2 additions & 1 deletion docs/source/operators/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -12,4 +12,5 @@ These pages are targeted at people using, configuring, and/or deploying multiple
configuring-extensions
migrate-from-nbserver
public-server
security
security
telemetry
61 changes: 61 additions & 0 deletions docs/source/operators/telemetry.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
Telemetry and Eventlogging
==========================

Jupyter Server can be configured to record structured events from a running server using Jupyter's `Telemetry System`_. The types of events that the Server emits are defined by `JSON schemas`_ listed below_ emitted as JSON data, defined and validated by the JSON schemas listed below.


.. _logging: https://docs.python.org/3/library/logging.html
.. _`Telemetry System`: https://github.com/jupyter/telemetry
.. _`JSON schemas`: https://json-schema.org/

Emitting Server Events
----------------------

Event logging is handled by its ``Eventlog`` object. This leverages Python's standing logging_ library to emit, filter, and collect event data.

To begin recording events, you'll need to set two configurations:

1. ``handlers``: tells the EventLog *where* to route your events. This trait is a list of Python logging handlers that route events to
2. ``allows_schemas``: tells the EventLog *which* events should be recorded. No events are emitted by default; all recorded events must be listed here.

Here's a basic example for emitting events from the `contents` service:

.. code-block::

import logging

c.EventLog.handlers = [
logging.FileHandler('event.log'),
]

c.EventLog.allowed_schemas = [
'hub.jupyter.org/server-action'
]

The output is a file, ``"event.log"``, with events recorded as JSON data.

`eventlog` endpoint
-------------------

The Notebook Server provides a public REST endpoint for external applications to validate and log events
through the Server's Event Log.

To log events, send a `POST` request to the `/api/eventlog` endpoint. The body of the request should be a
JSON blog and is required to have the follow keys:

1. `'schema'` : the event's schema ID.
2. `'version'` : the version of the event's schema.
3. `'event'` : the event data in JSON format.

Events that are validated by this endpoint must have their schema listed in the `allowed_schemas` trait listed above.

.. _below:


Server Event schemas
--------------------

.. toctree::
:maxdepth: 2

events/index
2 changes: 1 addition & 1 deletion docs/source/other/full-config.rst
Original file line number Diff line number Diff line change
Expand Up @@ -897,7 +897,7 @@ FileContentsManager.root_dir : Unicode

No description

NotebookNotary.algorithm : 'md5'|'sha3_384'|'sha3_512'|'sha256'|'sha1'|'blake2s'|'sha3_256'|'sha3_224'|'sha384'|'sha512'|'blake2b'|'sha224'
NotebookNotary.algorithm : 'sha1'|'sha3_224'|'blake2s'|'sha384'|'sha224'|'sha3_256'|'sha3_384'|'sha3_512'|'sha512'|'sha256'|'md5'|'blake2b'
Default: ``'sha256'``

The hashing algorithm used to sign notebooks.
Expand Down
4 changes: 4 additions & 0 deletions jupyter_server/base/handlers.py
Original file line number Diff line number Diff line change
Expand Up @@ -204,6 +204,10 @@ def jinja_template_vars(self):
"""User-supplied values to supply to jinja templates."""
return self.settings.get('jinja_template_vars', {})

@property
def eventlog(self):
return self.settings.get('eventlog')

#---------------------------------------------------------------
# URLs
#---------------------------------------------------------------
Expand Down
83 changes: 83 additions & 0 deletions jupyter_server/event-schemas/contentsmanager-actions/v1.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,83 @@
"$id": eventlogging.jupyter.org/notebook/contentsmanager-actions
version: 1
title: Contents Manager activities
personal-data: true
description: |
Record actions on files via the ContentsManager REST API.

The notebook ContentsManager REST API is used by all frontends to retreive,
save, list, delete and perform other actions on notebooks, directories,
and other files through the UI. This is pluggable - the default acts on
the file system, but can be replaced with a different ContentsManager
implementation - to work on S3, Postgres, other object stores, etc.
The events get recorded regardless of the ContentsManager implementation
being used.

Limitations:

1. This does not record all filesystem access, just the ones that happen
explicitly via the notebook server's REST API. Users can (and often do)
trivially access the filesystem in many other ways (such as `open()` calls
in their code), so this is usually never a complete record.
2. As with all events recorded by the notebook server, users most likely
have the ability to modify the code of the notebook server. Unless other
security measures are in place, these events should be treated as user
controlled and not used in high security areas.
3. Events are only recorded when an action succeeds.
type: object
required:
- action
- path
properties:
action:
enum:
- get
- create
- save
- upload
- rename
- copy
- delete
category: unrestricted
description: |
Action performed by the ContentsManager API.

This is a required field.

Possible values:

1. get
Get contents of a particular file, or list contents of a directory.

2. create
Create a new directory or file at 'path'. Currently, name of the
file or directory is auto generated by the ContentsManager implementation.

3. save
Save a file at path with contents from the client

4. upload
Upload a file at given path with contents from the client

5. rename
Rename a file or directory from value in source_path to
value in path.

5. copy
Copy a file or directory from value in source_path to
value in path.

6. delete
Delete a file or empty directory at given path
path:
category: personally-identifiable-information
type: string
description: |
Logical path on which the operation was performed.

This is a required field.
source_path:
category: personally-identifiable-information
type: string
description: |
Source path of an operation when action is 'copy' or 'rename'
23 changes: 21 additions & 2 deletions jupyter_server/serverapp.py
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,8 @@
import warnings
import webbrowser
import urllib
from ruamel.yaml import YAML
from glob import glob

from types import ModuleType
from base64 import encodebytes
Expand Down Expand Up @@ -99,10 +101,19 @@
)
from ipython_genutils import py3compat
from jupyter_core.paths import jupyter_runtime_dir, jupyter_path
from jupyter_telemetry.eventlog import EventLog

from jupyter_server._sysinfo import get_sys_info

from ._tz import utcnow, utcfromtimestamp
from .utils import url_path_join, check_pid, url_escape, urljoin, pathname2url
from .utils import (
url_path_join,
check_pid,
url_escape,
urljoin,
pathname2url,
get_schema_files
)

from jupyter_server.extension.serverextension import (
ServerExtensionApp,
Expand Down Expand Up @@ -279,7 +290,8 @@ def init_settings(self, jupyter_app, kernel_manager, contents_manager,
server_root_dir=root_dir,
jinja2_env=env,
terminals_available=False, # Set later if terminals are available
serverapp=self
serverapp=self,
eventlog=jupyter_app.eventlog
)

# allow custom overrides for the tornado web app.
Expand Down Expand Up @@ -1758,6 +1770,11 @@ def _init_asyncio_patch():
# WindowsProactorEventLoopPolicy is not compatible with tornado 6
# fallback to the pre-3.8 default of Selector
asyncio.set_event_loop_policy(WindowsSelectorEventLoopPolicy())
def init_eventlog(self):
self.eventlog = EventLog(parent=self)
# Register schemas for notebook services.
for file_path in get_schema_files():
self.eventlog.register_schema_file(file_path)

@catch_config_error
def initialize(self, argv=None, find_extensions=True, new_httpserver=True):
Expand Down Expand Up @@ -1788,10 +1805,12 @@ def initialize(self, argv=None, find_extensions=True, new_httpserver=True):
self.init_server_extensions()
# Initialize all components of the ServerApp.
self.init_logging()
self.init_eventlog()
if self._dispatching:
return
self.init_configurables()
self.init_components()
self.init_eventlog()
self.init_webapp()
if new_httpserver:
self.init_httpserver()
Expand Down
Loading