There is no calculate_offload_device_map function anymore, CPU offloading is too slow, but I want to speed up the quantization with gpu.