Skip to content

GDINO - separate BERT into a standalone encoder stage for easy quantization #21

@rhysdg

Description

@rhysdg
  • One thing we do know for certain is that that onnxruntime now have BERT support via their transformer optimizer, it;d be great to pull this out and treat it for either mixed precision or quant.
  • The rest of the network can then become a self contained optimisation problem or remain as a torch eager execution with an AMP context

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions