Skip to content

Reason using 2 sets of attention weights ? #7

@valtheval

Description

@valtheval

Hello retain team,

Great job ! Thank you for sharing it.
Do you have explanation why you use 2 sets of attention weights (visits and variables) instead of only one for variables ?
With this set you can still get a visit contribution using aggregating method, average or sum of the variable weights of each visit for instance
Thanks in advance for your help

Activity

mp2893

mp2893 commented on Jul 17, 2020

@mp2893
Owner

Hi Valtheval,

Thanks for taking interest in our work.
It's an interesting question as a few other researchers asked the same question to me as well.
You can totally do what you suggested (i.e. using only code-level attention, then aggregating them).
For example, you can just use a single LSTM to encode a sequence of codes (no visits, but just a sequence of codes), and apply attention on top of them. But this way, you lose the visit-level information (i.e. which codes belong to the same visit).

The more interesting alternative would be using the RETAIN architecture, but remove the visit-level attention component. This way, you still tell the model which codes belong to the same visit. I am actually quite curious how this would turn out :)
If you happen to run this experiment, please share your results with everyone.

Best,
Ed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @mp2893@valtheval

        Issue actions

          Reason using 2 sets of attention weights ? · Issue #7 · mp2893/retain