Why using a batch size of 1 when computing the gradients?

Thank you for proposing LESS, which is a great work.

However, I am wondering why do the authors use a batch size of 1 when computing the gradients in get_info.py? What if we set batch_size > 1?

Thanks in advance for your reply.