Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questions on the KD process on CIFAR10 dataset #6

Open
zkf85 opened this issue Jul 29, 2020 · 2 comments
Open

Questions on the KD process on CIFAR10 dataset #6

zkf85 opened this issue Jul 29, 2020 · 2 comments

Comments

@zkf85
Copy link

zkf85 commented Jul 29, 2020

Hi there,

You work is great! Here I have some questions about the Knowledge Distillation process on CIFAR10 dataset in you experiment part.

  1. How many CIFAR10-like images have you generated in order to reach those accuracies is Table 1 in your paper? As we have tried with 3000 or 10000 generated images (with DI,Resnet34,alpha_f = 10) using vanilla KD to distill from Resnet34 to Resnet18 and only reached 25% or 55% validation acc.

  2. We encountered problems when trying ADI.
    In the description of Table 1, it's said "for ADI, we generate one new batch of images every 50 KD iterations and merge the newly generated images into the existing set of generated iamges". Could you please explain more about this? Does the 50KD iteration mean 50 KD epochs? Does the "one new batch of images" mean a batch of like 256 images and merge them into the exisiting generated dataset? Does the KD process have to hang up and wait for the "new-batch-generating" process every 50 KD iteration (epoch if I get it correctly)?

Thanks

@zkf85 zkf85 changed the title Info about the KD process on CIFAR10 Questions about the KD process on CIFAR10 Jul 29, 2020
@zkf85 zkf85 changed the title Questions about the KD process on CIFAR10 Questions on the KD process on CIFAR10 Jul 29, 2020
@zkf85 zkf85 changed the title Questions on the KD process on CIFAR10 Questions on the KD process on CIFAR10 dataset Jul 29, 2020
@pamolchanov
Copy link
Collaborator

Answering your questions:

  1. For these experiments we generated 1000 batches of batch size 256 in total. DI/ADI generate the entire batch of data at once. Most likely hyper parameters are not correct and therefore results are different.

  2. ADI is implemented in the same manner as DI with the difference of considering student model. 50 KD iterations means 50 update steps of the optimizer. Each batch has roughly 195 updates, we generate batches for DI or ADI at updates number 0, 49, 99, 149, having 4 batches generated during one epoch. All newly generated batches are added to the pool. On iteration updates not when batch generation happens (191) we randomly select a batch from those in the pool. Total KD training happens for 250 echoes that leads to 1000 batches in the end. The more we train - the more data there is going to be in the pool. Initially we start with 50 pre-generated batches with DI and replace them if ADI is used with every new batch. We apply random translation +-2 px to every image when we load the batch from the pool.

We agree that sharing code for Cifar KD will be helpful and will try to do it asap.

@shannonjryan
Copy link

Hi @pamolchanov just wanted to check in and see if you still plan on releasing additional code for CIFAR KD (or ImageNet for that matter)?

Thanks for sharing your research!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants