- Encoder FLOPs(30s): 96,238,430,720, params: 85,709,704
- Feature info: using fbank feature, cmvn, dither, online speed perturb
- Training info: train_conformer_bidecoder_large.yaml, kernel size 31, lr 0.002, batch size 12, 8 gpu, acc_grad 4, 120 epochs, dither 1.0
- Decoding info: ctc_weight 0.3, reverse weight 0.5, average_num 30
- Git hash: 65270043fc8c2476d1ab95e7c39f730017a670e0
- LM-tgmed: 3-gram.pruned.1e-7.arpa.gz
- LM-tglarge: 3-gram.arpa.gz
- LM-fglarge: 4-gram.arpa.gz
decoding mode | test clean | test other |
---|---|---|
ctc prefix beam search | 2.96 | 7.14 |
attention rescoring | 2.66 | 6.53 |
LM-tgmed + attention rescoring | 2.78 | 6.32 |
LM-tglarge + attention rescoring | 2.68 | 6.10 |
LM-fglarge + attention rescoring | 2.65 | 5.98 |
- Encoder info:
- SM12, reduce_idx 5, recover_idx 11, conv1d, batch_norm, syncbn
- encoder_dim 512, output_size 512, head 8, ffn_dim 512*4=2048
- Encoder FLOPs(30s): 82,283,704,832, params: 85,984,648
- Feature info:
- using fbank feature, cmvn, dither, online speed perturb, spec_aug
- Training info:
- train_squeezeformer_bidecoder_large.yaml, kernel size 31
- batch size 12, 8 gpu, acc_grad 4, 120 epochs, dither 1.0
- adamw, lr 8e-4, NoamHold, warmup 0.2, hold 0.3, lr_decay 1.0
- Decoding info:
- ctc_weight 0.3, reverse weight 0.5, average_num 30
decoding mode | dev clean | dev other | test clean | test other |
---|---|---|---|---|
ctc greedy search | 2.62 | 6.80 | 2.92 | 6.77 |
ctc prefix beam search | 2.60 | 6.79 | 2.90 | 6.79 |
attention decoder | 3.06 | 6.90 | 3.38 | 6.82 |
attention rescoring | 2.33 | 6.29 | 2.57 | 6.22 |
- Encoder FLOPs(30s): 34,085,088,512, params: 34,761,608
- Feature info: using fbank feature, cmvn, dither, online speed perturb
- Training info: train_conformer.yaml, kernel size 31, lr 0.004, batch size 12, 8 gpu, acc_grad 4, 120 epochs, dither 0.1
- Decoding info: ctc_weight 0.5, average_num 30
- Git hash: 90d9a559840e765e82119ab72a11a1f7c1a01b78
- LM-fglarge: 4-gram.arpa.gz
decoding mode | test clean | test other |
---|---|---|
ctc greedy search | 3.51 | 9.57 |
ctc prefix beam search | 3.51 | 9.56 |
attention decoder | 3.05 | 8.36 |
attention rescoring | 3.18 | 8.72 |
attention rescoring (beam 50) | 3.12 | 8.55 |
LM-fglarge + attention rescoring | 3.09 | 7.40 |
- Encoder FLOPs(30s): 34,085,088,512, params: 34,761,608
- Feature info: using fbank feature, cmvn, dither, online speed perturb
- Training info: train_squeezeformer.yaml, kernel size 31,
- batch size 12, 8 gpu, acc_grad 4, 120 epochs, dither 0.1
- AdamW, lr 1e-3, NoamHold, warmup 0.2, hold 0.3, lr_decay 1.0
- Decoding info: ctc_weight 0.3, reverse weight 0.5, average_num 30
decoding mode | dev clean | dev other | test clean | test other |
---|---|---|---|---|
ctc greedy search | 3.49 | 9.59 | 3.66 | 9.59 |
ctc prefix beam search | 3.49 | 9.61 | 3.66 | 9.55 |
attention decoder | 3.52 | 9.04 | 3.85 | 8.97 |
attention rescoring | 3.10 | 8.91 | 3.29 | 8.81 |
- Encoder info:
- SM12, reduce_idx 5, recover_idx 11, conv2d, w/o syncbn
- encoder_dim 256, output_size 256, head 4, ffn_dim 256*4=1024
- Encoder FLOPs(30s): 21,158,877,440, params: 22,219,912
- Feature info:
- using fbank feature, cmvn, dither, online speed perturb
- Training info:
- train_squeezeformer.yaml, kernel size 31,
- batch size 12, 8 gpu, acc_grad 4, 120 epochs, dither 0.1
- adamw, lr=1e-3, noamhold, warmup=0.2, hold=0.3, lr_decay=1.0
- Decoding info: ctc_weight 0.3, reverse weight 0.5, average_num 30
decoding mode | dev clean | dev other | test clean | test other |
---|---|---|---|---|
ctc greedy search | 3.49 | 9.24 | 3.51 | 9.28 |
ctc prefix beam search | 3.44 | 9.23 | 3.51 | 9.25 |
attention decoder | 3.59 | 8.74 | 3.75 | 8.70 |
attention rescoring | 2.97 | 8.48 | 3.07 | 8.44 |
- Encoder info:
- SM12, reduce_idx 5, recover_idx 11, conv2d, w/o syncbn
- encoder_dim 256, output_size 256, head 4, ffn_dim 256*8=2048
- encoder FLOPs(30s): 28,230,473,984, params: 34,827,400
- Feature info: using fbank feature, cmvn, dither, online speed perturb
- Training info:
- train_squeezeformer.yaml, kernel size 31
- batch size 12, 8 gpu, acc_grad 4, 120 epochs, dither 0.1
- adamw, lr 1e-3, noamhold, warmup 0.2, hold 0.3, lr_decay 1.0
- Decoding info:
- ctc_weight 0.3, reverse weight 0.5, average_num 30
decoding mode | dev clean | dev other | test clean | test other |
---|---|---|---|---|
ctc greedy search | 3.34 | 9.01 | 3.47 | 8.85 |
ctc prefix beam search | 3.33 | 9.02 | 3.46 | 8.81 |
attention decoder | 3.64 | 8.62 | 3.91 | 8.33 |
attention rescoring | 2.89 | 8.34 | 3.10 | 8.03 |
- Encoder info:
- SM12, reduce_idx 5, recover_idx 11, conv1d, w/o syncbn
- encoder_dim 328, output_size 256, head 4, ffn_dim 328*4=1312
- encoder FLOPs(30s): 34,103,960,008, params: 35,678,352
- Feature info:
- using fbank feature, cmvn, dither, online speed perturb
- Training info:
- train_squeezeformer.yaml, kernel size 31,
- batch size 12, 8 gpu, acc_grad 4, 120 epochs, dither 1.0
- adamw, lr 1e-3, noamhold, warmup 0.2, hold 0.3, lr_decay 1.0
- Decoding info:
- ctc_weight 0.3, reverse weight 0.5, average_num 30
decoding mode | dev clean | dev other | test clean | test other |
---|---|---|---|---|
ctc greedy search | 3.20 | 8.46 | 3.30 | 8.58 |
ctc prefix beam search | 3.18 | 8.44 | 3.30 | 8.55 |
attention decoder | 3.38 | 8.31 | 3.89 | 8.32 |
attention rescoring | 2.81 | 7.86 | 2.96 | 7.91 |
- Feature info: using fbank feature, cmvn, no speed perturb, dither
- Training info: train_u2++_conformer.yaml lr 0.001, batch size 24, 8 gpu, acc_grad 1, 120 epochs, dither 1.0
- Decoding info: ctc_weight 0.3, reverse weight 0.5, average_num 30
- Git hash: 65270043fc8c2476d1ab95e7c39f730017a670e0
test clean
decoding mode | full | 16 |
---|---|---|
ctc prefix beam search | 3.76 | 4.54 |
attention rescoring | 3.32 | 3.80 |
test other
decoding mode | full | 16 |
---|---|---|
ctc prefix beam search | 9.50 | 11.52 |
attention rescoring | 8.67 | 10.38 |
- Encoder info:
- SM12, reduce_idx 5, recover_idx 11, conv1d, layer_norm, do_rel_shift false
- encoder_dim 256, output_size 256, head 4, ffn_dim 256*8=2048
- Encoder FLOPs(30s): 28,230,473,984, params: 34,827,400
- Feature info:
- using fbank feature, cmvn, dither, online speed perturb
- Training info:
- train_squeezeformer.yaml, kernel size 31
- batch size 12, 8 gpu, acc_grad 4, 120 epochs, dither 0.1
- adamw, lr 1e-3, NoamHold, warmup 0.1, hold 0.4, lr_decay 1.0
- Decoding info:
- ctc_weight 0.3, reverse weight 0.5, average_num 30
test clean
decoding mode | full | 16 |
---|---|---|
ctc prefix beam search | 3.81 | 4.59 |
attention rescoring | 3.36 | 3.93 |
test other
decoding mode | full | 16 |
---|---|---|
ctc prefix beam search | 9.12 | 11.17 |
attention rescoring | 8.43 | 10.21 |
- Feature info: using fbank feature, cmvn, speed perturb, dither
- Training info: train_unified_conformer.yaml lr 0.001, batch size 10, 8 gpu, acc_grad 1, 120 epochs, dither 1.0
- Decoding info: ctc_weight 0.5, average_num 30
- Git hash: 90d9a559840e765e82119ab72a11a1f7c1a01b78
- LM-tgmed: 3-gram.pruned.1e-7.arpa.gz
- LM-tglarge: 3-gram.arpa.gz
- LM-fglarge: 4-gram.arpa.gz
test clean
decoding mode | full | 16 |
---|---|---|
ctc prefix beam search | 4.26 | 5.00 |
attention decoder | 3.05 | 3.44 |
attention rescoring | 3.72 | 4.10 |
attention rescoring (beam 50) | 3.57 | 3.95 |
LM-tgmed + attention rescoring | 3.56 | 4.02 |
LM-tglarge + attention rescoring | 3.40 | 3.82 |
LM-fglarge + attention rescoring | 3.38 | 3.74 |
test other
decoding mode | full | 16 |
---|---|---|
ctc prefix beam search | 10.87 | 12.87 |
attention decoder | 9.07 | 10.44 |
attention rescoring | 9.74 | 11.61 |
attention rescoring (beam 50) | 9.34 | 11.13 |
LM-tgmed + attention rescoring | 8.78 | 10.26 |
LM-tglarge + attention rescoring | 8.34 | 9.74 |
LM-fglarge + attention rescoring | 8.17 | 9.44 |