Thank you for proposing LESS, which is a great work. However, I am wondering why do the authors use a batch size of 1 when computing the gradients in get_info.py? What if we set batch_size > 1? Thanks in advance for your reply.