Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi-task settings #48

Open
Mahmedturk opened this issue Jun 16, 2019 · 6 comments
Open

Multi-task settings #48

Mahmedturk opened this issue Jun 16, 2019 · 6 comments

Comments

@Mahmedturk
Copy link

Hi @nreimers

For the multi-task framework, does it always have to be pos and chunking? or it could be any sequence labelling tasks?

@nreimers
Copy link
Member

Hi. You can use it of course for any sequence tagging task. The POS and chunking code is just an example, which you can modify to be used for your use case.

@Mahmedturk
Copy link
Author

I am really confused about the architecture of multi task framework as there is no diagram in the original paper. Could you please explain as to which layers are being shared? In the example that you have given pos is at the lower level because it appears first in the constructor? Each task has its own Bi-LSTM-CRF task specific network, which of the layers are being shared? Can you show this graphically?

@nreimers
Copy link
Member

In the Train_MultiTask.py example, the POS and chunking network both share the embedding layer and one LSTM layer. If you change params like this

params = {'classifier': ['CRF'], 'LSTM-Size': [100, 50], 'dropout': (0.25, 0.25)}

Both networks would share 2 stacked LSTM layers, the first with 100 recurrent units, the second with 50.

In that file, only the CRF is task specific.

If you run the code, the model architecture is also printed. Shared layers have the name shared_..., while task specific layers have the name POS_... and chunking_....

The order in the datasets dict doesn't matter.

In Train_MultiTask_Different_Levels.py, the POS layer uses the output from the first LSTM layer, while chunking has a task specific LSTM layer:

params = {'classifier': ['CRF'], 'LSTM-Size': [100], 'dropout': (0.25, 0.25),
          'customClassifier': {'unidep_pos': ['Softmax'], 'conll2000_chunking': [('LSTM', 50), 'CRF']}}

Both networks have one shared LSTM (LSTM-Size), then pos uses 'Softmax' on top of that shared LSTM layer. Chunking in contrast uses an LSTM with 50 units and then a CRF.

@Mahmedturk
Copy link
Author

OK thanks for the detailed answer. Could also explain the difference between "Train_Multitask" and Train_Multitask_different_levels"?

@nreimers
Copy link
Member

nreimers commented Jul 5, 2019

Have a look at this paper:
https://www.aclweb.org/anthology/P16-2038

Train_Multitask_different_levels implements the ideas from that paper.

@Mahmedturk
Copy link
Author

Thanks for the link.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants