You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
First, I would like to thank the authors of this paper for releasing their source code.
Is there a plan to use the same approach using a Universal Transformer as base architecture? Would the adaptive computation time (ACT) mechanism transfer to other tasks?
And more importantly, if this new transformer can be used, do you think the gain would be noticeable?
The text was updated successfully, but these errors were encountered:
Hello,
First, I would like to thank the authors of this paper for releasing their source code.
Is there a plan to use the same approach using a Universal Transformer as base architecture? Would the adaptive computation time (ACT) mechanism transfer to other tasks?
And more importantly, if this new transformer can be used, do you think the gain would be noticeable?
The text was updated successfully, but these errors were encountered: