Replies: 5 comments 6 replies
-
The existing databrowser functionality in numbers also needs to be updated in the context of the above -- will make it much more general. In general the data browser provides a filetree + tabbed viewer interface to an Databrowser also manages scripts in toolbars to automate repeated tasks. Need to continue to rethink that in context of evolving overall numbers design, relative to the "notebook" concept, vs. our enhanced "terminal" design etc. A key principle is that you should be able to directly A critical efficiency issue is that managing many small individual files in actual fs space is painful, so using zip automatically would be very useful. golang/go#61232 explains why not So zipfs should be builtin and probably the default way of saving / loading datafs images. vfs is used for zipfs and seems like the key package as a basis for datafs. |
Beta Was this translation helpful? Give feedback.
-
GPU integration via
|
Beta Was this translation helpful? Give feedback.
-
Jax notes
Some key points:
|
Beta Was this translation helpful? Give feedback.
-
can start using the
|
Beta Was this translation helpful? Give feedback.
-
Indexes on demandnew idea: Indexed indexes are nil by default, and Including: Data Dir has:
Data leaf has:
Tables will now have Indexed columns instead of bare, so that they can literally point to the Data items, and this allows easy integration of heterogenous lengths and orders of data types. The nil index in tensor.Indexed means that they can automatically use the dir table index if they don't have one themselves, when passed into When combining Dir tensors into the table, we can automatically set indexes to make them all compatible? Add row needs to work properly for all this -- should be doable. Scalar is 1d with rows = 1, so easy to grow that. might need some special access methods for Scalars? |
Beta Was this translation helpful? Give feedback.
-
A key component for exploratory data analysis (e.g., as in a Jupyter notebook) is a shared namespace to grab and store data. In the existing emer simulations, the estats package provides this functionality, to allow different methods to access float and tensor variables in a global shared namespace.
As far as I can tell, the guts of Jupyter is powered by IPython, which has various "magic" commands prefixed by
%
that manage the global namespace and perform the standard shell-like functionality, (e.g., cosh) https://ipython.readthedocs.io/en/stable/interactive/magics.html The%who
and%whos
magic commands list the global variables.I couldn't find much info about people experiencing conflicts in this global namespace, but it seems inevitable that having just a single monolithic space is going to be bad. Here's something from another related project: https://discourse.julialang.org/t/notebooks-need-modules-i-e-multiple-separate-global-namespaces/68541
Thus, a potentially intuitive and flexible solution is to adopt the filesystem metaphor for organizing the global data variable space, by creating an fs implementation that allows you to read / write data variables as "files" in different directories, to better organize and avoid conflicts.
Another issue with the global data space is dealing with type safety properly. In python this is not an issue, but in Go, we want to maintain type safety.
One solution is to use automatic filename extensions, e.g.,
.float32
.tensor32
etc to label the data, and, somehow, when you access it, you get back a variable of the proper type. I'm not exactly sure if this can be pulled off properly with generics. In estats we just have a bunch of different maps for each different type, and separate named accessors.We could at least have the equivalent of
map[string]any
to back the storage of data elements, and have explicit generic type args for reading and writing, e.g.,myf := ds.Get[float32]("/epoch/sse/mean")
.Set
at least would not require using the type parameter.Per above, we would want to add simple accessors for variables -- definitely don't want to have to deal with io
Read
/Write
at this level. But supporting the fs interface in general would allow things like thefiletree
andFilePicker
(properly updated to use thefs
instead ofos
package) to work in browsing, accessing the variables.The generic
Read / Write
interface for all variables would be handy for directly storing data to actual os files.metadata
Each data "file" can have metadata (
map[string]any
) that would be key for various use-cases as illustrated below. Thefs.FileInfo
interface specifies aSys() any
method that could potentially be used to return this.emer sim logging example
One interesting test case for this is to replace the existing elog functionality in emergent with a corresponding file structure at the different time scales, modes:
This is more flexible and especially the issue of being able to systematically represent the different stats like
mean
,sd
etc is always a problem that this more elegantly solves. In the currentemer
sims, we are very often redundantly storingestat
values in stats and then reading those into the log -- here we just have the one canonical location for every value, and the logging part is automatic based on these values and their metadata (e.g., can "hide" intermediate values that don't need to be saved).We could also use metadata to deal with the ever-present ordering problem, for example by having an automatic
OrderAdded
int counter that tracks the order you add items to a directory. (alternatively could try to use thefs.FileInfo.ModTime()
for this, tracking the exact time when a variable is added, and sorting order by that).The metadata would however be critical for setting the plotting hints in terms of fixed or floating min / max ranges and other variables.
It would be easy to write standard aggregation algorithms that just iterate over variables in a given "directory" and populate corresponding variables at the next level up. One could automatically generate all standard aggregates (mean, sd, sem etc) and use
Func
filter calls to specify which values actually get saved to actual log files and / or plotted.The plot and log
table
s can be efficientlyplan
updated from the current data fs state, so adding and removing items is automatic and efficient, and there is just one function that turns a given fs directory path into a corresponding table with appropriate metadata filter etc.A simple
cp
command could be used to duplicate and save a given snapshot of log data for subsequent comparison, etc.Also want to ensure that it is very easy to use
String()
method on enums to make path names (e.g., a version ofpath
that does this automatically, if the native version does not), so you can e.g., use theetime
enums to avoid limitations of dealing with string path names:path.Join(root, etime.Train, etime.Trial)
etc.Implementation
All of this would be implemented in underlying Go, presumably in
core
as adatafs
package similar to and building on thetensor
package, and should be very compact and simple to write. The as-yet-unnamed numbers scripting language #324 can provide a syntactically simplified interface to it, tbd.At least all the standard shell
cd
ls
cp
etc commands must work directly to navigate the data fs -- this will be a good "opportunity" to implement all of those in underlyingfs
functionality instead of calling out to the/bin/ls
methods etc. Hopefully can find existing Go package that does that?Beta Was this translation helpful? Give feedback.
All reactions