Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training doesn't start #3

Open
v44s opened this issue May 2, 2018 · 0 comments
Open

Training doesn't start #3

v44s opened this issue May 2, 2018 · 0 comments

Comments

@v44s
Copy link

v44s commented May 2, 2018

I am training on default data set provided in multi-text mode. But the training doesn't seem to progress at all. On running, code prints the following and then just waits without proceeding further. On reading the training log, it shows recieved_first_batch as False. I suspect something is wrong with part of the code which supplies the batch.

Using cuDNN version 6021 on context None
Mapped name None to device cuda2: GeForce GTX 1080 Ti (0000:84:00.0)
INFO:main:Model options:
{'additional_excludes': OrderedDict([('es.fr_en', [])]),
'alpha_c': OrderedDict([('es.fr_en', 0.0)]),
'att_dim': 1200,
'attend_merge_act': 'tanh',
'attend_merge_op': 'mean',
'batch_sizes': OrderedDict([('es.fr_en', 80)]),
'bokeh_port': 3333,
'cgs': ['es.fr_en'],
'dec_embed_sizes': OrderedDict([('en', 620)]),
'dec_nhids': OrderedDict([('en', 1000)]),
'dec_rnn_type': 'gru_cond_mCG',
'decay_c': OrderedDict([('es.fr_en', 0.0)]),
'drop_input': OrderedDict([('es.fr_en', 0.0)]),
'dropout': 1.0,
'enc_embed_sizes': OrderedDict([('es', 620), ('fr', 620)]),
'enc_nhids': OrderedDict([('es', 1000), ('fr', 1000)]),
'exclude_encs': OrderedDict([('es', False), ('fr', False)]),
'finish_after': 2000000,
'finit_act': 'tanh',
'finit_code_dim': 500,
'finit_mid_dim': 600,
'hook_samples': 2,
'incremental_dump': True,
'init_merge_act': 'tanh',
'init_merge_op': 'mean',
'lctxproj_act': 'tanh',
'ldecoder_act': 'tanh',
'learning_rate': 0.0002,
'lencoder_act': 'tanh',
'load_accumulators': True,
'log_prob_bs': 10,
'log_prob_freq': 2000,
'log_prob_sets': OrderedDict([('es.fr_en', {'fr': 'data/dev/newstest2011.fr.tok.bpe20k', 'en': 'data/dev/newstest2011.en.tok.bpe20k', 'es': 'data/dev/newstest2011.es.tok.bpe20k'})]),
'min_seq_lens': OrderedDict([('es.fr_en', 0)]),
'multi_latent': True,
'num_decs': 1,
'num_encs': 2,
'plot': False,
'readout_dim': 1000,
'reload': True,
'representation_act': 'linear',
'representation_dim': 1200,
'sampling_freq': 17,
'save_accumulators': True,
'save_freq': 5000,
'saveto': 'esfr2en_mSrc',
'schedule': OrderedDict([('es.fr_en', 1)]),
'seq_len': 50,
'sort_k_batches': 12,
'src_datas': OrderedDict([('es.fr_en', {'fr': 'data/europarl-v7.esfr-en.fr.tok.bpe20k', 'es': 'data/europarl-v7.esfr-en.es.tok.bpe20k'})]),
'src_eos_idxs': OrderedDict([('es', 0), ('fr', 0), ('es.fr', 0)]),
'src_vocab_sizes': OrderedDict([('es', 20624), ('fr', 20335)]),
'src_vocabs': OrderedDict([('es', 'data/europarl-v7.es-en.es.tok.bpe20k.vocab.pkl'), ('fr', 'data/europarl-v7.fr-en.fr.tok.bpe20k.vocab.pkl')]),
'step_clipping': 1,
'step_rule': 'uAdam',
'stream': 'multiCG_stream',
'take_last': True,
'trg_datas': OrderedDict([('es.fr_en', {'en': 'data/europarl-v7.esfr-en.en.tok.bpe20k'})]),
'trg_eos_idxs': OrderedDict([('en', 0)]),
'trg_vocab_sizes': OrderedDict([('en', 20212)]),
'trg_vocabs': OrderedDict([('en', 'data/europarl-v7.fr-en.en.tok.bpe20k.vocab.pkl')]),
'unk_id': 1,
'val_burn_in': 1,
'weight_noise_ff': False,
'weight_noise_rec': False,
'weight_scale': 0.01}
INFO:mcg.stream:Building training stream for cg:[es.fr_en]
INFO:mcg.stream: ... src:[es] - [data/europarl-v7.esfr-en.es.tok.bpe20k]
INFO:mcg.stream: ... src:[fr] - [data/europarl-v7.esfr-en.fr.tok.bpe20k]
INFO:mcg.stream: ... trg:[en] - [data/europarl-v7.esfr-en.en.tok.bpe20k]
INFO:mcg.stream:Building logprob stream for cg:[es.fr_en]
INFO:mcg.stream: ... src:[es] - [data/dev/newstest2011.es.tok.bpe20k]
INFO:mcg.stream: ... src:[fr] - [data/dev/newstest2011.fr.tok.bpe20k]
INFO:mcg.stream: ... trg:[en] - [data/dev/newstest2011.en.tok.bpe20k]
INFO:mcg.models: Encoder-Decoder: building training models
INFO:mcg.models: MultiEncoder: building training models
INFO:mcg.models: ... MultiSourceEncoder [es.fr] building training models
INFO:mcg.models: ... BidirectionalEncoder [es] building training models
INFO:mcg.models: ... ... using [gru] layer
INFO:mcg.models: ... BidirectionalEncoder [fr] building training models
INFO:mcg.models: ... ... using [gru] layer
INFO:mcg.models: MultiDecoder: building training models
INFO:mcg.models: ... using initializer merger [mean] for encoders: ['es', 'fr']
INFO:mcg.models: ... using post-context merger [mean] for encoders: ['es', 'fr']
INFO:mcg.models: ... ... using [gru_cond_mCG_mSrc] layer
INFO:mcg.models: Encoder-Decoder: building sampling models
INFO:mcg.models: MultiEncoder: building sampling models
INFO:mcg.models: ... MultiSourceEncoder [es.fr] building sampling models
INFO:mcg.models: ... BidirectionalEncoder [es] building sampling models
INFO:mcg.models: ... ... using [gru] layer
INFO:mcg.models: ... BidirectionalEncoder [fr] building sampling models
INFO:mcg.models: ... ... using [gru] layer
INFO:mcg.models: MultiDecoder: building sampling models
INFO:mcg.models: ... using initializer merger [mean] for encoders: ['es', 'fr']
INFO:mcg.models:Building f_init for CG[es.fr-en]...
INFO:mcg.models: ... using post-context merger [mean] for encoders: ['es', 'fr']
INFO:mcg.models: ... ... using [gru_cond_mCG_mSrc] layer
INFO:mcg.models:Building f_next for decoder[es.fr-en]..
INFO:mcg.models:Parameter shapes for computation graph[es.fr_en]
INFO:mcg.models: (1000,) : 9
INFO:mcg.models: (1000, 1000) : 6
INFO:mcg.models: (620, 1000) : 6
INFO:mcg.models: (2000,) : 5
INFO:mcg.models: (620, 2000) : 5
INFO:mcg.models: (1000, 2000) : 5
INFO:mcg.models: (1200, 1200) : 3
INFO:mcg.models: (1200,) : 3
INFO:mcg.models: (2000, 1200) : 2
INFO:mcg.models: (1200, 1000) : 2
INFO:mcg.models: (500, 600) : 1
INFO:mcg.models: (1200, 1) : 1
INFO:mcg.models: (20212,) : 1
INFO:mcg.models: (1000, 20212) : 1
INFO:mcg.models: (20624, 620) : 1
INFO:mcg.models: (20212, 620) : 1
INFO:mcg.models: (620, 1200) : 1
INFO:mcg.models: (20335, 620) : 1
INFO:mcg.models: (1200, 2000) : 1
INFO:mcg.models: (1200, 500) : 1
INFO:mcg.models: (1000, 1200) : 1
INFO:mcg.models: (600, 1000) : 1
INFO:mcg.models:Total number of parameters for computation graph[es.fr_en]: 58
INFO:mcg.models:Parameter names for computation graph[es.fr_en]:
INFO:mcg.models: (20212,) : ff_logit_en_b
INFO:mcg.models: (1000, 20212) : ff_logit_en_W
INFO:mcg.models: (1000,) : ff_logit_ctx_en_b
INFO:mcg.models: (1200, 1000) : ff_logit_ctx_en_W
INFO:mcg.models: (1200,) : ctx_embedder_fr_b
INFO:mcg.models: (2000, 1200) : ctx_embedder_fr_W
INFO:mcg.models: (1000, 1000) : encoder_r_fr_Ux
INFO:mcg.models: (1000, 2000) : encoder_r_fr_U
INFO:mcg.models: (20335, 620) : Wemb_fr
INFO:mcg.models: (1000,) : encoder_r_fr_bx
INFO:mcg.models: (620, 1000) : encoder_r_fr_Wx
INFO:mcg.models: (2000,) : encoder_r_fr_b
INFO:mcg.models: (620, 2000) : encoder_r_fr_W
INFO:mcg.models: (1000, 1000) : encoder_fr_Ux
INFO:mcg.models: (1000, 2000) : encoder_fr_U
INFO:mcg.models: (1000,) : encoder_fr_bx
INFO:mcg.models: (620, 1000) : encoder_fr_Wx
INFO:mcg.models: (2000,) : encoder_fr_b
INFO:mcg.models: (620, 2000) : encoder_fr_W
INFO:mcg.models: (1200,) : ctx_embedder_es_b
INFO:mcg.models: (2000, 1200) : ctx_embedder_es_W
INFO:mcg.models: (1000, 1000) : encoder_r_es_Ux
INFO:mcg.models: (1000, 2000) : encoder_r_es_U
INFO:mcg.models: (20624, 620) : Wemb_es
INFO:mcg.models: (1000,) : encoder_r_es_bx
INFO:mcg.models: (620, 1000) : encoder_r_es_Wx
INFO:mcg.models: (2000,) : encoder_r_es_b
INFO:mcg.models: (620, 2000) : encoder_r_es_W
INFO:mcg.models: (1000, 1000) : encoder_es_Ux
INFO:mcg.models: (1000, 2000) : encoder_es_U
INFO:mcg.models: (1000,) : encoder_es_bx
INFO:mcg.models: (620, 1000) : encoder_es_Wx
INFO:mcg.models: (2000,) : encoder_es_b
INFO:mcg.models: (620, 2000) : encoder_es_W
INFO:mcg.models: (1200,) : decoder_en_b_att
INFO:mcg.models: (1200, 1200) : decoder_en_Le_att
INFO:mcg.models: (1000, 1000) : decoder_en_Ux
INFO:mcg.models: (1200, 1000) : decoder_en_Wcx
INFO:mcg.models: (1200, 2000) : decoder_en_Wc
INFO:mcg.models: (1200, 1200) : decoder_en_Wp_att
INFO:mcg.models: (1200, 1) : decoder_en_U_att
INFO:mcg.models: (1200, 1200) : decoder_en_Ld_att
INFO:mcg.models: (1000, 1200) : decoder_en_Wd_dec
INFO:mcg.models: (1000, 2000) : decoder_en_U
INFO:mcg.models: (20212, 620) : Wemb_dec_en
INFO:mcg.models: (1000,) : ff_init_en_c
INFO:mcg.models: (600, 1000) : ff_init_en_U
INFO:mcg.models: (500, 600) : ff_init_en_U_shared
INFO:mcg.models: (1200, 500) : ff_init_en_W_shared
INFO:mcg.models: (620, 1200) : decoder_en_Wi_dec
INFO:mcg.models: (1000,) : decoder_en_bx
INFO:mcg.models: (620, 1000) : decoder_en_Wx
INFO:mcg.models: (2000,) : decoder_en_b
INFO:mcg.models: (620, 2000) : decoder_en_W
INFO:mcg.models: (1000,) : ff_logit_prev_en_b
INFO:mcg.models: (620, 1000) : ff_logit_prev_en_W
INFO:mcg.models: (1000,) : ff_logit_lstm_en_b
INFO:mcg.models: (1000, 1000) : ff_logit_lstm_en_W
INFO:mcg.models:Total number of parameters for computation graph[es.fr_en]: 58
INFO:mcg.models:Total number of excluded parameters for CG[es.fr_en]: [0]
INFO:mcg.models:Training parameter from CG[es.fr_en]: ff_logit_en_b
INFO:mcg.models:Training parameter from CG[es.fr_en]: ff_logit_en_W
INFO:mcg.models:Training parameter from CG[es.fr_en]: ff_logit_ctx_en_b
INFO:mcg.models:Training parameter from CG[es.fr_en]: ff_logit_ctx_en_W
INFO:mcg.models:Training parameter from CG[es.fr_en]: ctx_embedder_fr_b
INFO:mcg.models:Training parameter from CG[es.fr_en]: ctx_embedder_fr_W
INFO:mcg.models:Training parameter from CG[es.fr_en]: encoder_r_fr_Ux
INFO:mcg.models:Training parameter from CG[es.fr_en]: encoder_r_fr_U
INFO:mcg.models:Training parameter from CG[es.fr_en]: Wemb_fr
INFO:mcg.models:Training parameter from CG[es.fr_en]: encoder_r_fr_bx
INFO:mcg.models:Training parameter from CG[es.fr_en]: encoder_r_fr_Wx
INFO:mcg.models:Training parameter from CG[es.fr_en]: encoder_r_fr_b
INFO:mcg.models:Training parameter from CG[es.fr_en]: encoder_r_fr_W
INFO:mcg.models:Training parameter from CG[es.fr_en]: encoder_fr_Ux
INFO:mcg.models:Training parameter from CG[es.fr_en]: encoder_fr_U
INFO:mcg.models:Training parameter from CG[es.fr_en]: encoder_fr_bx
INFO:mcg.models:Training parameter from CG[es.fr_en]: encoder_fr_Wx
INFO:mcg.models:Training parameter from CG[es.fr_en]: encoder_fr_b
INFO:mcg.models:Training parameter from CG[es.fr_en]: encoder_fr_W
INFO:mcg.models:Training parameter from CG[es.fr_en]: ctx_embedder_es_b
INFO:mcg.models:Training parameter from CG[es.fr_en]: ctx_embedder_es_W
INFO:mcg.models:Training parameter from CG[es.fr_en]: encoder_r_es_Ux
INFO:mcg.models:Training parameter from CG[es.fr_en]: encoder_r_es_U
INFO:mcg.models:Training parameter from CG[es.fr_en]: Wemb_es
INFO:mcg.models:Training parameter from CG[es.fr_en]: encoder_r_es_bx
INFO:mcg.models:Training parameter from CG[es.fr_en]: encoder_r_es_Wx
INFO:mcg.models:Training parameter from CG[es.fr_en]: encoder_r_es_b
INFO:mcg.models:Training parameter from CG[es.fr_en]: encoder_r_es_W
INFO:mcg.models:Training parameter from CG[es.fr_en]: encoder_es_Ux
INFO:mcg.models:Training parameter from CG[es.fr_en]: encoder_es_U
INFO:mcg.models:Training parameter from CG[es.fr_en]: encoder_es_bx
INFO:mcg.models:Training parameter from CG[es.fr_en]: encoder_es_Wx
INFO:mcg.models:Training parameter from CG[es.fr_en]: encoder_es_b
INFO:mcg.models:Training parameter from CG[es.fr_en]: encoder_es_W
INFO:mcg.models:Training parameter from CG[es.fr_en]: decoder_en_b_att
INFO:mcg.models:Training parameter from CG[es.fr_en]: decoder_en_Le_att
INFO:mcg.models:Training parameter from CG[es.fr_en]: decoder_en_Ux
INFO:mcg.models:Training parameter from CG[es.fr_en]: decoder_en_Wcx
INFO:mcg.models:Training parameter from CG[es.fr_en]: decoder_en_Wc
INFO:mcg.models:Training parameter from CG[es.fr_en]: decoder_en_Wp_att
INFO:mcg.models:Training parameter from CG[es.fr_en]: decoder_en_U_att
INFO:mcg.models:Training parameter from CG[es.fr_en]: decoder_en_Ld_att
INFO:mcg.models:Training parameter from CG[es.fr_en]: decoder_en_Wd_dec
INFO:mcg.models:Training parameter from CG[es.fr_en]: decoder_en_U
INFO:mcg.models:Training parameter from CG[es.fr_en]: Wemb_dec_en
INFO:mcg.models:Training parameter from CG[es.fr_en]: ff_init_en_c
INFO:mcg.models:Training parameter from CG[es.fr_en]: ff_init_en_U
INFO:mcg.models:Training parameter from CG[es.fr_en]: ff_init_en_U_shared
INFO:mcg.models:Training parameter from CG[es.fr_en]: ff_init_en_W_shared
INFO:mcg.models:Training parameter from CG[es.fr_en]: decoder_en_Wi_dec
INFO:mcg.models:Training parameter from CG[es.fr_en]: decoder_en_bx
INFO:mcg.models:Training parameter from CG[es.fr_en]: decoder_en_Wx
INFO:mcg.models:Training parameter from CG[es.fr_en]: decoder_en_b
INFO:mcg.models:Training parameter from CG[es.fr_en]: decoder_en_W
INFO:mcg.models:Training parameter from CG[es.fr_en]: ff_logit_prev_en_b
INFO:mcg.models:Training parameter from CG[es.fr_en]: ff_logit_prev_en_W
INFO:mcg.models:Training parameter from CG[es.fr_en]: ff_logit_lstm_en_b
INFO:mcg.models:Training parameter from CG[es.fr_en]: ff_logit_lstm_en_W
INFO:mcg.models:Total number of parameters will be trained for CG[es.fr_en]: [58]
INFO:mcg.algorithm:Initializing the training algorithm [es.fr_en]
INFO:mcg.algorithm:...computing gradient
INFO:mcg.algorithm:...clipping gradients
INFO:mcg.algorithm:...building optimizer
INFO:mcg.algorithm: took: 65.878868103 seconds
/home1/debajyoty/codes/dl4mt-multi-src.old/mcg/models.py:1123: UserWarning: theano.function was asked to create a function computing outputs given certain inputs, but the provided input variable at index 6 is not part of the computational graph needed to compute the outputs: src_selector.
To make this warning into an error, you can pass the parameter on_unused_input='raise' to theano.function. To disable it completely, use on_unused_input='ignore'.
outputs=cost, on_unused_input='warn')
/home1/debajyoty/codes/dl4mt-multi-src.old/mcg/models.py:1123: UserWarning: theano.function was asked to create a function computing outputs given certain inputs, but the provided input variable at index 7 is not part of the computational graph needed to compute the outputs: trg_selector.
To make this warning into an error, you can pass the parameter on_unused_input='raise' to theano.function. To disable it completely, use on_unused_input='ignore'.
outputs=cost, on_unused_input='warn')
INFO:mcg.algorithm:Entered the main loop


BEFORE FIRST EPOCH

Training status:
batch_interrupt_received: False
epoch_interrupt_received: False
epoch_started: True
epochs_done: 0
iterations_done: 0
received_first_batch: False
training_started: True
Log records from the iteration 0:
time_initialization: 1.69277191162e-05

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant