Skip to content
This repository has been archived by the owner on Nov 5, 2022. It is now read-only.
Glenn K. Lockwood edited this page May 24, 2017 · 14 revisions

pytokio

Getting Started

The examples subdirectory will contain Jupyter notebooks that demonstrate the basics of using the pytokio API to retrieve data. Because pytokio uses existing data sources, these notebooks must be run on a system that has access to the data source(s) you want. Specifically,

  • NERSC's Lustre data stored in HDF5 is saved to the NERSC Global File System, so accessing it requires access to an NGF mount (e.g., from https://jupyter.nersc.gov/)
  • The job accounting and diameter data relies on Slurm accounting database access via the sacct command. Thus, it must be run from a system that can run sacct -j 12345 and get the correct job information for job 12345.

Design

pytoktio2-high-level-architecture.png

API Conventions

At present, the pytokio API is not stable and is subject to change. That said, the basic API calls to load a dataset as a DataFrame are unlikely to change rapidly, as the functionality exists and development effort is being focused on new functionality, not refining API naming conventions.

In brief,

  • stateful connectors are classes that wrap other stateful access mechanisms (MySQL, ElasticSearch, etc) and are not meant to be managed directly by pytokio users
  • stateless interfaces (getters, e.g., hdf5.get_dataframe_from_time_range()) are the most convenient way to get data of interest, and this may or may not implement the stateful API under the hood.
  • summarize_ API calls might be backend-specific and produce just high-level summary metrics (scalar quantities) that are easier to feed directly into large-scale correlation metrics

Contributing

Please don't commit directly to master. Instead, make a branch of the form yourname/yournewfeature and issue a pull request so that a second party can review your changes.

Clone this wiki locally