Issue with training hyperparameter for Step aware preference model training

Hi, I noticed that the details such as gradient accumulation in the base accelerator and deepspeed accelerator for preference model training seem to be different, causing errors. Just wondering but what is the actual gradient accumulation and batch size used during training? 
Thanks in advance