Skip to content

Releases: helpmefindaname/transformer-smaller-training-vocab

0.2.1

17 Mar 00:27
2a2928f
Compare
Choose a tag to compare

What's Changed

  • Fix saving of reduced models: When the models are saved while being reduced, they now properly set the vocab-size in the config, allowing the used to load the model again, no matter if the context manager is still there or not. @helpmefindaname in #4
  • If the embeddings are frozen (not trainable), the reduced embeddings will also be frozen @helpmefindaname in #4

Full Changelog: 0.2.0...0.2.1

0.2.0

19 Feb 03:23
350bf07
Compare
Choose a tag to compare

What's Changed

Introduces an optional parameter optimizer to reduce_train_vocab. Which can be used to modify the parameter group to exchange changed embeddings to the new ones. An example usage is the following:

  model = ...
  tokenizer = ...
  optimizer = Adam(model.parameters(), lr=...)
  ...

  with reduce_train_vocab(model=model, tokenizer=tokenizer, texts=get_texts_from_dataset(raw_datasets, key="text"), optimizer=optimizer):
     train_with_optimizer(model, tokenizer, optimizer)

  save_model()  # save model at the end to contain the full vocab again.

Full Changelog: 0.1.8...0.2.0

0.1.8

05 Feb 20:58
4cad4de
Compare
Choose a tag to compare

What's Changed

  • Lower dependency requirements to transformers 4.1, torch 1.8 and datasets 2.0 as the package was previously too restrictive

Full Changelog: 0.1.7...0.1.8

0.1.7

06 Jan 13:45
b8e6d92
Compare
Choose a tag to compare

Full Changelog: 0.1.6...0.1.7

0.1.0

06 Jan 12:13
86629a0
Compare
Choose a tag to compare

Initial setup, hello world!

So far support for FastTokenizers, BertTokenizer, RobertaTokenizer and XLMRobertaTokenizer is added.