-
Notifications
You must be signed in to change notification settings - Fork 4
Home
The examples subdirectory will contain Jupyter notebooks that demonstrate the basics of using the pytokio API to retrieve data. Because pytokio uses existing data sources, these notebooks must be run on a system that has access to the data source(s) you want. Specifically,
- NERSC's Lustre data stored in HDF5 is saved to the NERSC Global File System, so accessing it requires access to an NGF mount (e.g., from https://jupyter.nersc.gov/)
- The job accounting and diameter data relies on Slurm accounting database access via the
sacct
command. Thus, it must be run from a system that can runsacct -j 12345
and get the correct job information for job12345
.
pytoktio2-high-level-architecture.png
At present, the pytokio API is not stable and is subject to change. That said, the basic API calls to load a dataset as a DataFrame are unlikely to change rapidly, as the functionality exists and development effort is being focused on new functionality, not refining API naming conventions.
In brief,
- stateful connectors are classes that wrap other stateful access mechanisms (MySQL, ElasticSearch, etc) and are not meant to be managed directly by pytokio users
-
stateless interfaces (getters, e.g.,
hdf5.get_dataframe_from_time_range()
) are the most convenient way to get data of interest, and this may or may not implement the stateful API under the hood. - summarize_ API calls might be backend-specific and produce just high-level summary metrics (scalar quantities) that are easier to feed directly into large-scale correlation metrics
Please don't commit directly to master. Instead, make a branch of the form yourname/yournewfeature
and issue a pull request so that a second party can review your changes.