Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More information about the specific implementation of relative attention bias. #181

Open
buaaliyi opened this issue Feb 5, 2025 · 2 comments

Comments

@buaaliyi
Copy link

buaaliyi commented Feb 5, 2025

Hi @jiaqizhai

We can find some previous issues,

#148
#36

which mentioned the specific implementation of relative attention bias.
For the exact rab_{p, t}(i, j) setting in this particular codebase, the relative timespan between token i and j is calculated by timestamp[i] and timestamp[j+1] (not timestamp[j]), furthermore, the relative positional gap between token i and j is calculated by N - (j - i) (not the same like "google text-to-text transformers" paper source codebase, which is j - i). So that I am strange about this rab implementation, but what does it advantages for?

Thank you for more information about the implementation details.

@buaaliyi
Copy link
Author

buaaliyi commented Feb 5, 2025

In addition, what about the rab for ranking task, the same as the above retrieval task or not?

@jiaqizhai
Copy link
Contributor

Hi,

The rab timestamp setting we used for public datasets (retrieval) is exactly what we discussed in #148.

For the interleaved ranking setting, you can either reuse the same code or just j-i - these two should perform similarly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants