-
Notifications
You must be signed in to change notification settings - Fork 121
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memmap shuffling #216
base: main
Are you sure you want to change the base?
Memmap shuffling #216
Conversation
… small custom dataset
Some notes:
|
Profiling has been completed on this branch: https://github.com/Lewington-pitsos/SAELens/tree/memmap_profiling in the it contains a The overall outcome is that the old script consumes around 26 CPU seconds and 3 GPU seconds, This means that with no pinning we can achieve the following speeds:
More thorough profiling would be required to give us a clearer picture on the tradeoff between pinning and not pinning. memmap-pinning.txt - run with memmap and pinning as implemented on Note that the |
I'm still not 100% sure what would be required to make this PR merge-able by the way |
Huh, pinning slowed it down? I feel like somethings wrong here. Will read up and play with the code a bunch. |
Well, it led to 10x the CPU time but 1/2 the GPU time so possibly? I wasn't sure if a thorough comparison of pinning vs non-pinning was in scope for this PR |
I think pinning maybe only helps when it's facilitating asynchronous work (ie: we're also doing SAE training) which you weren't doing here? So maybe expected? I'm inclined to say sorting this out is out of scope for this PR and probably I'll merge once I've checked the training an SAE with this kind of shuffling as opposed to our last isn't worse (for non-cached activation training). |
I actually had no idea what pinning even was before this PR so I feel underqualified to make a final judgement on it in the context of a system designed to do bleeding edge machine learning training XD |
I'm fairly happy with this PR but want to spend a bit of time on it. Sorry for the delay. Writing the shuffling utility will keep us moving if you've got time. Sorry for the delay here. |
no stress fam, I'm working on a bit of side-research at the moment to do with a possible metric for measuring SAE quality, I'll ping this board if I have time to work on shuffling. What I would basically do is make a shuffler which achieves approximately the same level of randomness as we had prior (but on disk) |
Description
A fork of the
improved_io
branch https://github.com/jbloomAus/SAELens/tree/improved_ioThe overall intention is to load cached activations in the
ActivationStore
via memmap and add code to shuffle those buffers on disk using theCacheActivationsRunner
. At this stage the shuffling is probably not sufficient at this stage and hasn't been altered sinceimproved_io
.This feature probably isn't complete until we have a script which generates and shuffles buffers while also training at the same time and having the training code make use of those buffers.
test_cache_activations_runner_saving
(a memmap was being initialized asfloat32
and then read asfloat16
causing the loaded memmap to seem twice as large as expectedActivationsStore
andCacheActivationsRunner
ActivationsStore.get_batch
use the new memmap strategy instead of the old dataloaderget_batch
functionalityNOTE: with the new
next_batch
functionality we EITHER assume some other process is creating buffers in the cache for us OR generate activations on the fly without caching ever. At some point we will want the second option to build a cache as it goes, but IMO supporting this will be very cumbersome and should be left for future work.Type of change
Please delete options that are not relevant.
You have tested formatting, typing and unit tests (acceptance tests not currently in use)
make check-ci
to check format and linting. (you can runmake format
to format code if needed.)Performance Check.
If you have implemented a training change, please indicate precisely how performance changes with respect to the following metrics:
Please links to wandb dashboards with a control and test group.