Save cached latents as caching progresses #38

SrGonao · 2024-11-06T12:27:26Z

This is a complicated change. At the moment, we are holding all feature activations in RAM while caching, which becomes problematic when dealing with millions of tokens.
I thing the way we want to do this is to use something like huggingface datasets.

kernel-loophole · 2024-11-12T17:59:54Z

@SrGonao would love to work on this ,can you provide more details about it .

SrGonao · 2024-11-12T18:17:03Z

Currently, we do feature caching by keeping the activations in memory, before saving it (https://github.com/EleutherAI/sae-auto-interp/blob/v0.2/sae_auto_interp/features/cache.py#L208-L242). We could potentially keep saving it after X amount of tokens and then merge them at the end. This would allow for people to do longer runs where feature activations don't all fit in memory

kernel-loophole · 2024-11-12T18:22:44Z

okay great .will look into that .how can i test this approach

SrGonao added the good first issue label Nov 6, 2024

kernel-loophole mentioned this issue Nov 18, 2024

Save cached latents as caching progresses #39

Closed

SrGonao added todo and removed good first issue labels Dec 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Save cached latents as caching progresses #38

Save cached latents as caching progresses #38

SrGonao commented Nov 6, 2024 •

edited

Loading

kernel-loophole commented Nov 12, 2024

SrGonao commented Nov 12, 2024

kernel-loophole commented Nov 12, 2024 •

edited

Loading

Save cached latents as caching progresses #38

Save cached latents as caching progresses #38

Comments

SrGonao commented Nov 6, 2024 • edited Loading

kernel-loophole commented Nov 12, 2024

SrGonao commented Nov 12, 2024

kernel-loophole commented Nov 12, 2024 • edited Loading

SrGonao commented Nov 6, 2024 •

edited

Loading

kernel-loophole commented Nov 12, 2024 •

edited

Loading