Hi @thakur-nandan, @nreimers
I am fine-tuning the either cross-encoder/ms-marco-electra-base or cross-encoder/ms-marco-MiniLM-L-12-v2 models on other IR collections (tree-covid or NQ). But the fine-tuned model scores are lower than zero-shot scores. I wonder if there's a domain shift in custom datasets or am I doing the training wrong ? I am using sentence-transformer cross-encoder APIs for training.
Since these pre-trained models trained in certain settings (hyper-parameters and model architecture with loss), are these models sensitive to those settings during fine-tuning as well ?