You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am training on default data set provided in multi-text mode. But the training doesn't seem to progress at all. On running, code prints the following and then just waits without proceeding further. On reading the training log, it shows recieved_first_batch as False. I suspect something is wrong with part of the code which supplies the batch.
Using cuDNN version 6021 on context None
Mapped name None to device cuda2: GeForce GTX 1080 Ti (0000:84:00.0)
INFO:main:Model options:
{'additional_excludes': OrderedDict([('es.fr_en', [])]),
'alpha_c': OrderedDict([('es.fr_en', 0.0)]),
'att_dim': 1200,
'attend_merge_act': 'tanh',
'attend_merge_op': 'mean',
'batch_sizes': OrderedDict([('es.fr_en', 80)]),
'bokeh_port': 3333,
'cgs': ['es.fr_en'],
'dec_embed_sizes': OrderedDict([('en', 620)]),
'dec_nhids': OrderedDict([('en', 1000)]),
'dec_rnn_type': 'gru_cond_mCG',
'decay_c': OrderedDict([('es.fr_en', 0.0)]),
'drop_input': OrderedDict([('es.fr_en', 0.0)]),
'dropout': 1.0,
'enc_embed_sizes': OrderedDict([('es', 620), ('fr', 620)]),
'enc_nhids': OrderedDict([('es', 1000), ('fr', 1000)]),
'exclude_encs': OrderedDict([('es', False), ('fr', False)]),
'finish_after': 2000000,
'finit_act': 'tanh',
'finit_code_dim': 500,
'finit_mid_dim': 600,
'hook_samples': 2,
'incremental_dump': True,
'init_merge_act': 'tanh',
'init_merge_op': 'mean',
'lctxproj_act': 'tanh',
'ldecoder_act': 'tanh',
'learning_rate': 0.0002,
'lencoder_act': 'tanh',
'load_accumulators': True,
'log_prob_bs': 10,
'log_prob_freq': 2000,
'log_prob_sets': OrderedDict([('es.fr_en', {'fr': 'data/dev/newstest2011.fr.tok.bpe20k', 'en': 'data/dev/newstest2011.en.tok.bpe20k', 'es': 'data/dev/newstest2011.es.tok.bpe20k'})]),
'min_seq_lens': OrderedDict([('es.fr_en', 0)]),
'multi_latent': True,
'num_decs': 1,
'num_encs': 2,
'plot': False,
'readout_dim': 1000,
'reload': True,
'representation_act': 'linear',
'representation_dim': 1200,
'sampling_freq': 17,
'save_accumulators': True,
'save_freq': 5000,
'saveto': 'esfr2en_mSrc',
'schedule': OrderedDict([('es.fr_en', 1)]),
'seq_len': 50,
'sort_k_batches': 12,
'src_datas': OrderedDict([('es.fr_en', {'fr': 'data/europarl-v7.esfr-en.fr.tok.bpe20k', 'es': 'data/europarl-v7.esfr-en.es.tok.bpe20k'})]),
'src_eos_idxs': OrderedDict([('es', 0), ('fr', 0), ('es.fr', 0)]),
'src_vocab_sizes': OrderedDict([('es', 20624), ('fr', 20335)]),
'src_vocabs': OrderedDict([('es', 'data/europarl-v7.es-en.es.tok.bpe20k.vocab.pkl'), ('fr', 'data/europarl-v7.fr-en.fr.tok.bpe20k.vocab.pkl')]),
'step_clipping': 1,
'step_rule': 'uAdam',
'stream': 'multiCG_stream',
'take_last': True,
'trg_datas': OrderedDict([('es.fr_en', {'en': 'data/europarl-v7.esfr-en.en.tok.bpe20k'})]),
'trg_eos_idxs': OrderedDict([('en', 0)]),
'trg_vocab_sizes': OrderedDict([('en', 20212)]),
'trg_vocabs': OrderedDict([('en', 'data/europarl-v7.fr-en.en.tok.bpe20k.vocab.pkl')]),
'unk_id': 1,
'val_burn_in': 1,
'weight_noise_ff': False,
'weight_noise_rec': False,
'weight_scale': 0.01}
INFO:mcg.stream:Building training stream for cg:[es.fr_en]
INFO:mcg.stream: ... src:[es] - [data/europarl-v7.esfr-en.es.tok.bpe20k]
INFO:mcg.stream: ... src:[fr] - [data/europarl-v7.esfr-en.fr.tok.bpe20k]
INFO:mcg.stream: ... trg:[en] - [data/europarl-v7.esfr-en.en.tok.bpe20k]
INFO:mcg.stream:Building logprob stream for cg:[es.fr_en]
INFO:mcg.stream: ... src:[es] - [data/dev/newstest2011.es.tok.bpe20k]
INFO:mcg.stream: ... src:[fr] - [data/dev/newstest2011.fr.tok.bpe20k]
INFO:mcg.stream: ... trg:[en] - [data/dev/newstest2011.en.tok.bpe20k]
INFO:mcg.models: Encoder-Decoder: building training models
INFO:mcg.models: MultiEncoder: building training models
INFO:mcg.models: ... MultiSourceEncoder [es.fr] building training models
INFO:mcg.models: ... BidirectionalEncoder [es] building training models
INFO:mcg.models: ... ... using [gru] layer
INFO:mcg.models: ... BidirectionalEncoder [fr] building training models
INFO:mcg.models: ... ... using [gru] layer
INFO:mcg.models: MultiDecoder: building training models
INFO:mcg.models: ... using initializer merger [mean] for encoders: ['es', 'fr']
INFO:mcg.models: ... using post-context merger [mean] for encoders: ['es', 'fr']
INFO:mcg.models: ... ... using [gru_cond_mCG_mSrc] layer
INFO:mcg.models: Encoder-Decoder: building sampling models
INFO:mcg.models: MultiEncoder: building sampling models
INFO:mcg.models: ... MultiSourceEncoder [es.fr] building sampling models
INFO:mcg.models: ... BidirectionalEncoder [es] building sampling models
INFO:mcg.models: ... ... using [gru] layer
INFO:mcg.models: ... BidirectionalEncoder [fr] building sampling models
INFO:mcg.models: ... ... using [gru] layer
INFO:mcg.models: MultiDecoder: building sampling models
INFO:mcg.models: ... using initializer merger [mean] for encoders: ['es', 'fr']
INFO:mcg.models:Building f_init for CG[es.fr-en]...
INFO:mcg.models: ... using post-context merger [mean] for encoders: ['es', 'fr']
INFO:mcg.models: ... ... using [gru_cond_mCG_mSrc] layer
INFO:mcg.models:Building f_next for decoder[es.fr-en]..
INFO:mcg.models:Parameter shapes for computation graph[es.fr_en]
INFO:mcg.models: (1000,) : 9
INFO:mcg.models: (1000, 1000) : 6
INFO:mcg.models: (620, 1000) : 6
INFO:mcg.models: (2000,) : 5
INFO:mcg.models: (620, 2000) : 5
INFO:mcg.models: (1000, 2000) : 5
INFO:mcg.models: (1200, 1200) : 3
INFO:mcg.models: (1200,) : 3
INFO:mcg.models: (2000, 1200) : 2
INFO:mcg.models: (1200, 1000) : 2
INFO:mcg.models: (500, 600) : 1
INFO:mcg.models: (1200, 1) : 1
INFO:mcg.models: (20212,) : 1
INFO:mcg.models: (1000, 20212) : 1
INFO:mcg.models: (20624, 620) : 1
INFO:mcg.models: (20212, 620) : 1
INFO:mcg.models: (620, 1200) : 1
INFO:mcg.models: (20335, 620) : 1
INFO:mcg.models: (1200, 2000) : 1
INFO:mcg.models: (1200, 500) : 1
INFO:mcg.models: (1000, 1200) : 1
INFO:mcg.models: (600, 1000) : 1
INFO:mcg.models:Total number of parameters for computation graph[es.fr_en]: 58
INFO:mcg.models:Parameter names for computation graph[es.fr_en]:
INFO:mcg.models: (20212,) : ff_logit_en_b
INFO:mcg.models: (1000, 20212) : ff_logit_en_W
INFO:mcg.models: (1000,) : ff_logit_ctx_en_b
INFO:mcg.models: (1200, 1000) : ff_logit_ctx_en_W
INFO:mcg.models: (1200,) : ctx_embedder_fr_b
INFO:mcg.models: (2000, 1200) : ctx_embedder_fr_W
INFO:mcg.models: (1000, 1000) : encoder_r_fr_Ux
INFO:mcg.models: (1000, 2000) : encoder_r_fr_U
INFO:mcg.models: (20335, 620) : Wemb_fr
INFO:mcg.models: (1000,) : encoder_r_fr_bx
INFO:mcg.models: (620, 1000) : encoder_r_fr_Wx
INFO:mcg.models: (2000,) : encoder_r_fr_b
INFO:mcg.models: (620, 2000) : encoder_r_fr_W
INFO:mcg.models: (1000, 1000) : encoder_fr_Ux
INFO:mcg.models: (1000, 2000) : encoder_fr_U
INFO:mcg.models: (1000,) : encoder_fr_bx
INFO:mcg.models: (620, 1000) : encoder_fr_Wx
INFO:mcg.models: (2000,) : encoder_fr_b
INFO:mcg.models: (620, 2000) : encoder_fr_W
INFO:mcg.models: (1200,) : ctx_embedder_es_b
INFO:mcg.models: (2000, 1200) : ctx_embedder_es_W
INFO:mcg.models: (1000, 1000) : encoder_r_es_Ux
INFO:mcg.models: (1000, 2000) : encoder_r_es_U
INFO:mcg.models: (20624, 620) : Wemb_es
INFO:mcg.models: (1000,) : encoder_r_es_bx
INFO:mcg.models: (620, 1000) : encoder_r_es_Wx
INFO:mcg.models: (2000,) : encoder_r_es_b
INFO:mcg.models: (620, 2000) : encoder_r_es_W
INFO:mcg.models: (1000, 1000) : encoder_es_Ux
INFO:mcg.models: (1000, 2000) : encoder_es_U
INFO:mcg.models: (1000,) : encoder_es_bx
INFO:mcg.models: (620, 1000) : encoder_es_Wx
INFO:mcg.models: (2000,) : encoder_es_b
INFO:mcg.models: (620, 2000) : encoder_es_W
INFO:mcg.models: (1200,) : decoder_en_b_att
INFO:mcg.models: (1200, 1200) : decoder_en_Le_att
INFO:mcg.models: (1000, 1000) : decoder_en_Ux
INFO:mcg.models: (1200, 1000) : decoder_en_Wcx
INFO:mcg.models: (1200, 2000) : decoder_en_Wc
INFO:mcg.models: (1200, 1200) : decoder_en_Wp_att
INFO:mcg.models: (1200, 1) : decoder_en_U_att
INFO:mcg.models: (1200, 1200) : decoder_en_Ld_att
INFO:mcg.models: (1000, 1200) : decoder_en_Wd_dec
INFO:mcg.models: (1000, 2000) : decoder_en_U
INFO:mcg.models: (20212, 620) : Wemb_dec_en
INFO:mcg.models: (1000,) : ff_init_en_c
INFO:mcg.models: (600, 1000) : ff_init_en_U
INFO:mcg.models: (500, 600) : ff_init_en_U_shared
INFO:mcg.models: (1200, 500) : ff_init_en_W_shared
INFO:mcg.models: (620, 1200) : decoder_en_Wi_dec
INFO:mcg.models: (1000,) : decoder_en_bx
INFO:mcg.models: (620, 1000) : decoder_en_Wx
INFO:mcg.models: (2000,) : decoder_en_b
INFO:mcg.models: (620, 2000) : decoder_en_W
INFO:mcg.models: (1000,) : ff_logit_prev_en_b
INFO:mcg.models: (620, 1000) : ff_logit_prev_en_W
INFO:mcg.models: (1000,) : ff_logit_lstm_en_b
INFO:mcg.models: (1000, 1000) : ff_logit_lstm_en_W
INFO:mcg.models:Total number of parameters for computation graph[es.fr_en]: 58
INFO:mcg.models:Total number of excluded parameters for CG[es.fr_en]: [0]
INFO:mcg.models:Training parameter from CG[es.fr_en]: ff_logit_en_b
INFO:mcg.models:Training parameter from CG[es.fr_en]: ff_logit_en_W
INFO:mcg.models:Training parameter from CG[es.fr_en]: ff_logit_ctx_en_b
INFO:mcg.models:Training parameter from CG[es.fr_en]: ff_logit_ctx_en_W
INFO:mcg.models:Training parameter from CG[es.fr_en]: ctx_embedder_fr_b
INFO:mcg.models:Training parameter from CG[es.fr_en]: ctx_embedder_fr_W
INFO:mcg.models:Training parameter from CG[es.fr_en]: encoder_r_fr_Ux
INFO:mcg.models:Training parameter from CG[es.fr_en]: encoder_r_fr_U
INFO:mcg.models:Training parameter from CG[es.fr_en]: Wemb_fr
INFO:mcg.models:Training parameter from CG[es.fr_en]: encoder_r_fr_bx
INFO:mcg.models:Training parameter from CG[es.fr_en]: encoder_r_fr_Wx
INFO:mcg.models:Training parameter from CG[es.fr_en]: encoder_r_fr_b
INFO:mcg.models:Training parameter from CG[es.fr_en]: encoder_r_fr_W
INFO:mcg.models:Training parameter from CG[es.fr_en]: encoder_fr_Ux
INFO:mcg.models:Training parameter from CG[es.fr_en]: encoder_fr_U
INFO:mcg.models:Training parameter from CG[es.fr_en]: encoder_fr_bx
INFO:mcg.models:Training parameter from CG[es.fr_en]: encoder_fr_Wx
INFO:mcg.models:Training parameter from CG[es.fr_en]: encoder_fr_b
INFO:mcg.models:Training parameter from CG[es.fr_en]: encoder_fr_W
INFO:mcg.models:Training parameter from CG[es.fr_en]: ctx_embedder_es_b
INFO:mcg.models:Training parameter from CG[es.fr_en]: ctx_embedder_es_W
INFO:mcg.models:Training parameter from CG[es.fr_en]: encoder_r_es_Ux
INFO:mcg.models:Training parameter from CG[es.fr_en]: encoder_r_es_U
INFO:mcg.models:Training parameter from CG[es.fr_en]: Wemb_es
INFO:mcg.models:Training parameter from CG[es.fr_en]: encoder_r_es_bx
INFO:mcg.models:Training parameter from CG[es.fr_en]: encoder_r_es_Wx
INFO:mcg.models:Training parameter from CG[es.fr_en]: encoder_r_es_b
INFO:mcg.models:Training parameter from CG[es.fr_en]: encoder_r_es_W
INFO:mcg.models:Training parameter from CG[es.fr_en]: encoder_es_Ux
INFO:mcg.models:Training parameter from CG[es.fr_en]: encoder_es_U
INFO:mcg.models:Training parameter from CG[es.fr_en]: encoder_es_bx
INFO:mcg.models:Training parameter from CG[es.fr_en]: encoder_es_Wx
INFO:mcg.models:Training parameter from CG[es.fr_en]: encoder_es_b
INFO:mcg.models:Training parameter from CG[es.fr_en]: encoder_es_W
INFO:mcg.models:Training parameter from CG[es.fr_en]: decoder_en_b_att
INFO:mcg.models:Training parameter from CG[es.fr_en]: decoder_en_Le_att
INFO:mcg.models:Training parameter from CG[es.fr_en]: decoder_en_Ux
INFO:mcg.models:Training parameter from CG[es.fr_en]: decoder_en_Wcx
INFO:mcg.models:Training parameter from CG[es.fr_en]: decoder_en_Wc
INFO:mcg.models:Training parameter from CG[es.fr_en]: decoder_en_Wp_att
INFO:mcg.models:Training parameter from CG[es.fr_en]: decoder_en_U_att
INFO:mcg.models:Training parameter from CG[es.fr_en]: decoder_en_Ld_att
INFO:mcg.models:Training parameter from CG[es.fr_en]: decoder_en_Wd_dec
INFO:mcg.models:Training parameter from CG[es.fr_en]: decoder_en_U
INFO:mcg.models:Training parameter from CG[es.fr_en]: Wemb_dec_en
INFO:mcg.models:Training parameter from CG[es.fr_en]: ff_init_en_c
INFO:mcg.models:Training parameter from CG[es.fr_en]: ff_init_en_U
INFO:mcg.models:Training parameter from CG[es.fr_en]: ff_init_en_U_shared
INFO:mcg.models:Training parameter from CG[es.fr_en]: ff_init_en_W_shared
INFO:mcg.models:Training parameter from CG[es.fr_en]: decoder_en_Wi_dec
INFO:mcg.models:Training parameter from CG[es.fr_en]: decoder_en_bx
INFO:mcg.models:Training parameter from CG[es.fr_en]: decoder_en_Wx
INFO:mcg.models:Training parameter from CG[es.fr_en]: decoder_en_b
INFO:mcg.models:Training parameter from CG[es.fr_en]: decoder_en_W
INFO:mcg.models:Training parameter from CG[es.fr_en]: ff_logit_prev_en_b
INFO:mcg.models:Training parameter from CG[es.fr_en]: ff_logit_prev_en_W
INFO:mcg.models:Training parameter from CG[es.fr_en]: ff_logit_lstm_en_b
INFO:mcg.models:Training parameter from CG[es.fr_en]: ff_logit_lstm_en_W
INFO:mcg.models:Total number of parameters will be trained for CG[es.fr_en]: [58]
INFO:mcg.algorithm:Initializing the training algorithm [es.fr_en]
INFO:mcg.algorithm:...computing gradient
INFO:mcg.algorithm:...clipping gradients
INFO:mcg.algorithm:...building optimizer
INFO:mcg.algorithm: took: 65.878868103 seconds
/home1/debajyoty/codes/dl4mt-multi-src.old/mcg/models.py:1123: UserWarning: theano.function was asked to create a function computing outputs given certain inputs, but the provided input variable at index 6 is not part of the computational graph needed to compute the outputs: src_selector.
To make this warning into an error, you can pass the parameter on_unused_input='raise' to theano.function. To disable it completely, use on_unused_input='ignore'.
outputs=cost, on_unused_input='warn')
/home1/debajyoty/codes/dl4mt-multi-src.old/mcg/models.py:1123: UserWarning: theano.function was asked to create a function computing outputs given certain inputs, but the provided input variable at index 7 is not part of the computational graph needed to compute the outputs: trg_selector.
To make this warning into an error, you can pass the parameter on_unused_input='raise' to theano.function. To disable it completely, use on_unused_input='ignore'.
outputs=cost, on_unused_input='warn')
INFO:mcg.algorithm:Entered the main loop
BEFORE FIRST EPOCH
Training status:
batch_interrupt_received: False
epoch_interrupt_received: False
epoch_started: True
epochs_done: 0
iterations_done: 0
received_first_batch: False
training_started: True
Log records from the iteration 0:
time_initialization: 1.69277191162e-05
The text was updated successfully, but these errors were encountered:
I am training on default data set provided in multi-text mode. But the training doesn't seem to progress at all. On running, code prints the following and then just waits without proceeding further. On reading the training log, it shows recieved_first_batch as False. I suspect something is wrong with part of the code which supplies the batch.
Using cuDNN version 6021 on context None
Mapped name None to device cuda2: GeForce GTX 1080 Ti (0000:84:00.0)
INFO:main:Model options:
{'additional_excludes': OrderedDict([('es.fr_en', [])]),
'alpha_c': OrderedDict([('es.fr_en', 0.0)]),
'att_dim': 1200,
'attend_merge_act': 'tanh',
'attend_merge_op': 'mean',
'batch_sizes': OrderedDict([('es.fr_en', 80)]),
'bokeh_port': 3333,
'cgs': ['es.fr_en'],
'dec_embed_sizes': OrderedDict([('en', 620)]),
'dec_nhids': OrderedDict([('en', 1000)]),
'dec_rnn_type': 'gru_cond_mCG',
'decay_c': OrderedDict([('es.fr_en', 0.0)]),
'drop_input': OrderedDict([('es.fr_en', 0.0)]),
'dropout': 1.0,
'enc_embed_sizes': OrderedDict([('es', 620), ('fr', 620)]),
'enc_nhids': OrderedDict([('es', 1000), ('fr', 1000)]),
'exclude_encs': OrderedDict([('es', False), ('fr', False)]),
'finish_after': 2000000,
'finit_act': 'tanh',
'finit_code_dim': 500,
'finit_mid_dim': 600,
'hook_samples': 2,
'incremental_dump': True,
'init_merge_act': 'tanh',
'init_merge_op': 'mean',
'lctxproj_act': 'tanh',
'ldecoder_act': 'tanh',
'learning_rate': 0.0002,
'lencoder_act': 'tanh',
'load_accumulators': True,
'log_prob_bs': 10,
'log_prob_freq': 2000,
'log_prob_sets': OrderedDict([('es.fr_en', {'fr': 'data/dev/newstest2011.fr.tok.bpe20k', 'en': 'data/dev/newstest2011.en.tok.bpe20k', 'es': 'data/dev/newstest2011.es.tok.bpe20k'})]),
'min_seq_lens': OrderedDict([('es.fr_en', 0)]),
'multi_latent': True,
'num_decs': 1,
'num_encs': 2,
'plot': False,
'readout_dim': 1000,
'reload': True,
'representation_act': 'linear',
'representation_dim': 1200,
'sampling_freq': 17,
'save_accumulators': True,
'save_freq': 5000,
'saveto': 'esfr2en_mSrc',
'schedule': OrderedDict([('es.fr_en', 1)]),
'seq_len': 50,
'sort_k_batches': 12,
'src_datas': OrderedDict([('es.fr_en', {'fr': 'data/europarl-v7.esfr-en.fr.tok.bpe20k', 'es': 'data/europarl-v7.esfr-en.es.tok.bpe20k'})]),
'src_eos_idxs': OrderedDict([('es', 0), ('fr', 0), ('es.fr', 0)]),
'src_vocab_sizes': OrderedDict([('es', 20624), ('fr', 20335)]),
'src_vocabs': OrderedDict([('es', 'data/europarl-v7.es-en.es.tok.bpe20k.vocab.pkl'), ('fr', 'data/europarl-v7.fr-en.fr.tok.bpe20k.vocab.pkl')]),
'step_clipping': 1,
'step_rule': 'uAdam',
'stream': 'multiCG_stream',
'take_last': True,
'trg_datas': OrderedDict([('es.fr_en', {'en': 'data/europarl-v7.esfr-en.en.tok.bpe20k'})]),
'trg_eos_idxs': OrderedDict([('en', 0)]),
'trg_vocab_sizes': OrderedDict([('en', 20212)]),
'trg_vocabs': OrderedDict([('en', 'data/europarl-v7.fr-en.en.tok.bpe20k.vocab.pkl')]),
'unk_id': 1,
'val_burn_in': 1,
'weight_noise_ff': False,
'weight_noise_rec': False,
'weight_scale': 0.01}
INFO:mcg.stream:Building training stream for cg:[es.fr_en]
INFO:mcg.stream: ... src:[es] - [data/europarl-v7.esfr-en.es.tok.bpe20k]
INFO:mcg.stream: ... src:[fr] - [data/europarl-v7.esfr-en.fr.tok.bpe20k]
INFO:mcg.stream: ... trg:[en] - [data/europarl-v7.esfr-en.en.tok.bpe20k]
INFO:mcg.stream:Building logprob stream for cg:[es.fr_en]
INFO:mcg.stream: ... src:[es] - [data/dev/newstest2011.es.tok.bpe20k]
INFO:mcg.stream: ... src:[fr] - [data/dev/newstest2011.fr.tok.bpe20k]
INFO:mcg.stream: ... trg:[en] - [data/dev/newstest2011.en.tok.bpe20k]
INFO:mcg.models: Encoder-Decoder: building training models
INFO:mcg.models: MultiEncoder: building training models
INFO:mcg.models: ... MultiSourceEncoder [es.fr] building training models
INFO:mcg.models: ... BidirectionalEncoder [es] building training models
INFO:mcg.models: ... ... using [gru] layer
INFO:mcg.models: ... BidirectionalEncoder [fr] building training models
INFO:mcg.models: ... ... using [gru] layer
INFO:mcg.models: MultiDecoder: building training models
INFO:mcg.models: ... using initializer merger [mean] for encoders: ['es', 'fr']
INFO:mcg.models: ... using post-context merger [mean] for encoders: ['es', 'fr']
INFO:mcg.models: ... ... using [gru_cond_mCG_mSrc] layer
INFO:mcg.models: Encoder-Decoder: building sampling models
INFO:mcg.models: MultiEncoder: building sampling models
INFO:mcg.models: ... MultiSourceEncoder [es.fr] building sampling models
INFO:mcg.models: ... BidirectionalEncoder [es] building sampling models
INFO:mcg.models: ... ... using [gru] layer
INFO:mcg.models: ... BidirectionalEncoder [fr] building sampling models
INFO:mcg.models: ... ... using [gru] layer
INFO:mcg.models: MultiDecoder: building sampling models
INFO:mcg.models: ... using initializer merger [mean] for encoders: ['es', 'fr']
INFO:mcg.models:Building f_init for CG[es.fr-en]...
INFO:mcg.models: ... using post-context merger [mean] for encoders: ['es', 'fr']
INFO:mcg.models: ... ... using [gru_cond_mCG_mSrc] layer
INFO:mcg.models:Building f_next for decoder[es.fr-en]..
INFO:mcg.models:Parameter shapes for computation graph[es.fr_en]
INFO:mcg.models: (1000,) : 9
INFO:mcg.models: (1000, 1000) : 6
INFO:mcg.models: (620, 1000) : 6
INFO:mcg.models: (2000,) : 5
INFO:mcg.models: (620, 2000) : 5
INFO:mcg.models: (1000, 2000) : 5
INFO:mcg.models: (1200, 1200) : 3
INFO:mcg.models: (1200,) : 3
INFO:mcg.models: (2000, 1200) : 2
INFO:mcg.models: (1200, 1000) : 2
INFO:mcg.models: (500, 600) : 1
INFO:mcg.models: (1200, 1) : 1
INFO:mcg.models: (20212,) : 1
INFO:mcg.models: (1000, 20212) : 1
INFO:mcg.models: (20624, 620) : 1
INFO:mcg.models: (20212, 620) : 1
INFO:mcg.models: (620, 1200) : 1
INFO:mcg.models: (20335, 620) : 1
INFO:mcg.models: (1200, 2000) : 1
INFO:mcg.models: (1200, 500) : 1
INFO:mcg.models: (1000, 1200) : 1
INFO:mcg.models: (600, 1000) : 1
INFO:mcg.models:Total number of parameters for computation graph[es.fr_en]: 58
INFO:mcg.models:Parameter names for computation graph[es.fr_en]:
INFO:mcg.models: (20212,) : ff_logit_en_b
INFO:mcg.models: (1000, 20212) : ff_logit_en_W
INFO:mcg.models: (1000,) : ff_logit_ctx_en_b
INFO:mcg.models: (1200, 1000) : ff_logit_ctx_en_W
INFO:mcg.models: (1200,) : ctx_embedder_fr_b
INFO:mcg.models: (2000, 1200) : ctx_embedder_fr_W
INFO:mcg.models: (1000, 1000) : encoder_r_fr_Ux
INFO:mcg.models: (1000, 2000) : encoder_r_fr_U
INFO:mcg.models: (20335, 620) : Wemb_fr
INFO:mcg.models: (1000,) : encoder_r_fr_bx
INFO:mcg.models: (620, 1000) : encoder_r_fr_Wx
INFO:mcg.models: (2000,) : encoder_r_fr_b
INFO:mcg.models: (620, 2000) : encoder_r_fr_W
INFO:mcg.models: (1000, 1000) : encoder_fr_Ux
INFO:mcg.models: (1000, 2000) : encoder_fr_U
INFO:mcg.models: (1000,) : encoder_fr_bx
INFO:mcg.models: (620, 1000) : encoder_fr_Wx
INFO:mcg.models: (2000,) : encoder_fr_b
INFO:mcg.models: (620, 2000) : encoder_fr_W
INFO:mcg.models: (1200,) : ctx_embedder_es_b
INFO:mcg.models: (2000, 1200) : ctx_embedder_es_W
INFO:mcg.models: (1000, 1000) : encoder_r_es_Ux
INFO:mcg.models: (1000, 2000) : encoder_r_es_U
INFO:mcg.models: (20624, 620) : Wemb_es
INFO:mcg.models: (1000,) : encoder_r_es_bx
INFO:mcg.models: (620, 1000) : encoder_r_es_Wx
INFO:mcg.models: (2000,) : encoder_r_es_b
INFO:mcg.models: (620, 2000) : encoder_r_es_W
INFO:mcg.models: (1000, 1000) : encoder_es_Ux
INFO:mcg.models: (1000, 2000) : encoder_es_U
INFO:mcg.models: (1000,) : encoder_es_bx
INFO:mcg.models: (620, 1000) : encoder_es_Wx
INFO:mcg.models: (2000,) : encoder_es_b
INFO:mcg.models: (620, 2000) : encoder_es_W
INFO:mcg.models: (1200,) : decoder_en_b_att
INFO:mcg.models: (1200, 1200) : decoder_en_Le_att
INFO:mcg.models: (1000, 1000) : decoder_en_Ux
INFO:mcg.models: (1200, 1000) : decoder_en_Wcx
INFO:mcg.models: (1200, 2000) : decoder_en_Wc
INFO:mcg.models: (1200, 1200) : decoder_en_Wp_att
INFO:mcg.models: (1200, 1) : decoder_en_U_att
INFO:mcg.models: (1200, 1200) : decoder_en_Ld_att
INFO:mcg.models: (1000, 1200) : decoder_en_Wd_dec
INFO:mcg.models: (1000, 2000) : decoder_en_U
INFO:mcg.models: (20212, 620) : Wemb_dec_en
INFO:mcg.models: (1000,) : ff_init_en_c
INFO:mcg.models: (600, 1000) : ff_init_en_U
INFO:mcg.models: (500, 600) : ff_init_en_U_shared
INFO:mcg.models: (1200, 500) : ff_init_en_W_shared
INFO:mcg.models: (620, 1200) : decoder_en_Wi_dec
INFO:mcg.models: (1000,) : decoder_en_bx
INFO:mcg.models: (620, 1000) : decoder_en_Wx
INFO:mcg.models: (2000,) : decoder_en_b
INFO:mcg.models: (620, 2000) : decoder_en_W
INFO:mcg.models: (1000,) : ff_logit_prev_en_b
INFO:mcg.models: (620, 1000) : ff_logit_prev_en_W
INFO:mcg.models: (1000,) : ff_logit_lstm_en_b
INFO:mcg.models: (1000, 1000) : ff_logit_lstm_en_W
INFO:mcg.models:Total number of parameters for computation graph[es.fr_en]: 58
INFO:mcg.models:Total number of excluded parameters for CG[es.fr_en]: [0]
INFO:mcg.models:Training parameter from CG[es.fr_en]: ff_logit_en_b
INFO:mcg.models:Training parameter from CG[es.fr_en]: ff_logit_en_W
INFO:mcg.models:Training parameter from CG[es.fr_en]: ff_logit_ctx_en_b
INFO:mcg.models:Training parameter from CG[es.fr_en]: ff_logit_ctx_en_W
INFO:mcg.models:Training parameter from CG[es.fr_en]: ctx_embedder_fr_b
INFO:mcg.models:Training parameter from CG[es.fr_en]: ctx_embedder_fr_W
INFO:mcg.models:Training parameter from CG[es.fr_en]: encoder_r_fr_Ux
INFO:mcg.models:Training parameter from CG[es.fr_en]: encoder_r_fr_U
INFO:mcg.models:Training parameter from CG[es.fr_en]: Wemb_fr
INFO:mcg.models:Training parameter from CG[es.fr_en]: encoder_r_fr_bx
INFO:mcg.models:Training parameter from CG[es.fr_en]: encoder_r_fr_Wx
INFO:mcg.models:Training parameter from CG[es.fr_en]: encoder_r_fr_b
INFO:mcg.models:Training parameter from CG[es.fr_en]: encoder_r_fr_W
INFO:mcg.models:Training parameter from CG[es.fr_en]: encoder_fr_Ux
INFO:mcg.models:Training parameter from CG[es.fr_en]: encoder_fr_U
INFO:mcg.models:Training parameter from CG[es.fr_en]: encoder_fr_bx
INFO:mcg.models:Training parameter from CG[es.fr_en]: encoder_fr_Wx
INFO:mcg.models:Training parameter from CG[es.fr_en]: encoder_fr_b
INFO:mcg.models:Training parameter from CG[es.fr_en]: encoder_fr_W
INFO:mcg.models:Training parameter from CG[es.fr_en]: ctx_embedder_es_b
INFO:mcg.models:Training parameter from CG[es.fr_en]: ctx_embedder_es_W
INFO:mcg.models:Training parameter from CG[es.fr_en]: encoder_r_es_Ux
INFO:mcg.models:Training parameter from CG[es.fr_en]: encoder_r_es_U
INFO:mcg.models:Training parameter from CG[es.fr_en]: Wemb_es
INFO:mcg.models:Training parameter from CG[es.fr_en]: encoder_r_es_bx
INFO:mcg.models:Training parameter from CG[es.fr_en]: encoder_r_es_Wx
INFO:mcg.models:Training parameter from CG[es.fr_en]: encoder_r_es_b
INFO:mcg.models:Training parameter from CG[es.fr_en]: encoder_r_es_W
INFO:mcg.models:Training parameter from CG[es.fr_en]: encoder_es_Ux
INFO:mcg.models:Training parameter from CG[es.fr_en]: encoder_es_U
INFO:mcg.models:Training parameter from CG[es.fr_en]: encoder_es_bx
INFO:mcg.models:Training parameter from CG[es.fr_en]: encoder_es_Wx
INFO:mcg.models:Training parameter from CG[es.fr_en]: encoder_es_b
INFO:mcg.models:Training parameter from CG[es.fr_en]: encoder_es_W
INFO:mcg.models:Training parameter from CG[es.fr_en]: decoder_en_b_att
INFO:mcg.models:Training parameter from CG[es.fr_en]: decoder_en_Le_att
INFO:mcg.models:Training parameter from CG[es.fr_en]: decoder_en_Ux
INFO:mcg.models:Training parameter from CG[es.fr_en]: decoder_en_Wcx
INFO:mcg.models:Training parameter from CG[es.fr_en]: decoder_en_Wc
INFO:mcg.models:Training parameter from CG[es.fr_en]: decoder_en_Wp_att
INFO:mcg.models:Training parameter from CG[es.fr_en]: decoder_en_U_att
INFO:mcg.models:Training parameter from CG[es.fr_en]: decoder_en_Ld_att
INFO:mcg.models:Training parameter from CG[es.fr_en]: decoder_en_Wd_dec
INFO:mcg.models:Training parameter from CG[es.fr_en]: decoder_en_U
INFO:mcg.models:Training parameter from CG[es.fr_en]: Wemb_dec_en
INFO:mcg.models:Training parameter from CG[es.fr_en]: ff_init_en_c
INFO:mcg.models:Training parameter from CG[es.fr_en]: ff_init_en_U
INFO:mcg.models:Training parameter from CG[es.fr_en]: ff_init_en_U_shared
INFO:mcg.models:Training parameter from CG[es.fr_en]: ff_init_en_W_shared
INFO:mcg.models:Training parameter from CG[es.fr_en]: decoder_en_Wi_dec
INFO:mcg.models:Training parameter from CG[es.fr_en]: decoder_en_bx
INFO:mcg.models:Training parameter from CG[es.fr_en]: decoder_en_Wx
INFO:mcg.models:Training parameter from CG[es.fr_en]: decoder_en_b
INFO:mcg.models:Training parameter from CG[es.fr_en]: decoder_en_W
INFO:mcg.models:Training parameter from CG[es.fr_en]: ff_logit_prev_en_b
INFO:mcg.models:Training parameter from CG[es.fr_en]: ff_logit_prev_en_W
INFO:mcg.models:Training parameter from CG[es.fr_en]: ff_logit_lstm_en_b
INFO:mcg.models:Training parameter from CG[es.fr_en]: ff_logit_lstm_en_W
INFO:mcg.models:Total number of parameters will be trained for CG[es.fr_en]: [58]
INFO:mcg.algorithm:Initializing the training algorithm [es.fr_en]
INFO:mcg.algorithm:...computing gradient
INFO:mcg.algorithm:...clipping gradients
INFO:mcg.algorithm:...building optimizer
INFO:mcg.algorithm: took: 65.878868103 seconds
/home1/debajyoty/codes/dl4mt-multi-src.old/mcg/models.py:1123: UserWarning: theano.function was asked to create a function computing outputs given certain inputs, but the provided input variable at index 6 is not part of the computational graph needed to compute the outputs: src_selector.
To make this warning into an error, you can pass the parameter on_unused_input='raise' to theano.function. To disable it completely, use on_unused_input='ignore'.
outputs=cost, on_unused_input='warn')
/home1/debajyoty/codes/dl4mt-multi-src.old/mcg/models.py:1123: UserWarning: theano.function was asked to create a function computing outputs given certain inputs, but the provided input variable at index 7 is not part of the computational graph needed to compute the outputs: trg_selector.
To make this warning into an error, you can pass the parameter on_unused_input='raise' to theano.function. To disable it completely, use on_unused_input='ignore'.
outputs=cost, on_unused_input='warn')
INFO:mcg.algorithm:Entered the main loop
BEFORE FIRST EPOCH
Training status:
batch_interrupt_received: False
epoch_interrupt_received: False
epoch_started: True
epochs_done: 0
iterations_done: 0
received_first_batch: False
training_started: True
Log records from the iteration 0:
time_initialization: 1.69277191162e-05
The text was updated successfully, but these errors were encountered: