Skip to content
Compare
Choose a tag to compare
@xzuyn xzuyn released this 13 Jul 16:38
ce50869
  • cpu or disk activation offloading, as well as a "hybrid" offloading method that lets you keep up to x MB of activations on GPU, then past that up to y MB of activations will move to CPU, and anything past that will be moved to disk. sort of overflowing to whatever is fastest whenever needed.
  • unsloth fixed gradient accumulation
  • rouge, google_bleu, and meteor evaluation metrics
  • overall code cleanup
  • rework dataset config method