Skip to content

Latest commit

 

History

History
220 lines (185 loc) · 10.6 KB

README.md

File metadata and controls

220 lines (185 loc) · 10.6 KB

Performance Record

Conformer Result Bidecoder (large)

  • Encoder FLOPs(30s): 96,238,430,720, params: 85,709,704
  • Feature info: using fbank feature, cmvn, dither, online speed perturb
  • Training info: train_conformer_bidecoder_large.yaml, kernel size 31, lr 0.002, batch size 12, 8 gpu, acc_grad 4, 120 epochs, dither 1.0
  • Decoding info: ctc_weight 0.3, reverse weight 0.5, average_num 30
  • Git hash: 65270043fc8c2476d1ab95e7c39f730017a670e0
  • LM-tgmed: 3-gram.pruned.1e-7.arpa.gz
  • LM-tglarge: 3-gram.arpa.gz
  • LM-fglarge: 4-gram.arpa.gz
decoding mode test clean test other
ctc prefix beam search 2.96 7.14
attention rescoring 2.66 6.53
LM-tgmed + attention rescoring 2.78 6.32
LM-tglarge + attention rescoring 2.68 6.10
LM-fglarge + attention rescoring 2.65 5.98

SqueezeFormer Result (U2++, FFN:2048)

  • Encoder info:
    • SM12, reduce_idx 5, recover_idx 11, conv1d, batch_norm, syncbn
    • encoder_dim 512, output_size 512, head 8, ffn_dim 512*4=2048
    • Encoder FLOPs(30s): 82,283,704,832, params: 85,984,648
  • Feature info:
    • using fbank feature, cmvn, dither, online speed perturb, spec_aug
  • Training info:
    • train_squeezeformer_bidecoder_large.yaml, kernel size 31
    • batch size 12, 8 gpu, acc_grad 4, 120 epochs, dither 1.0
    • adamw, lr 8e-4, NoamHold, warmup 0.2, hold 0.3, lr_decay 1.0
  • Decoding info:
    • ctc_weight 0.3, reverse weight 0.5, average_num 30
decoding mode dev clean dev other test clean test other
ctc greedy search 2.62 6.80 2.92 6.77
ctc prefix beam search 2.60 6.79 2.90 6.79
attention decoder 3.06 6.90 3.38 6.82
attention rescoring 2.33 6.29 2.57 6.22

Conformer Result

  • Encoder FLOPs(30s): 34,085,088,512, params: 34,761,608
  • Feature info: using fbank feature, cmvn, dither, online speed perturb
  • Training info: train_conformer.yaml, kernel size 31, lr 0.004, batch size 12, 8 gpu, acc_grad 4, 120 epochs, dither 0.1
  • Decoding info: ctc_weight 0.5, average_num 30
  • Git hash: 90d9a559840e765e82119ab72a11a1f7c1a01b78
  • LM-fglarge: 4-gram.arpa.gz
decoding mode test clean test other
ctc greedy search 3.51 9.57
ctc prefix beam search 3.51 9.56
attention decoder 3.05 8.36
attention rescoring 3.18 8.72
attention rescoring (beam 50) 3.12 8.55
LM-fglarge + attention rescoring 3.09 7.40

Conformer Result (12 layers, FFN:2048)

  • Encoder FLOPs(30s): 34,085,088,512, params: 34,761,608
  • Feature info: using fbank feature, cmvn, dither, online speed perturb
  • Training info: train_squeezeformer.yaml, kernel size 31,
  • batch size 12, 8 gpu, acc_grad 4, 120 epochs, dither 0.1
  • AdamW, lr 1e-3, NoamHold, warmup 0.2, hold 0.3, lr_decay 1.0
  • Decoding info: ctc_weight 0.3, reverse weight 0.5, average_num 30
decoding mode dev clean dev other test clean test other
ctc greedy search 3.49 9.59 3.66 9.59
ctc prefix beam search 3.49 9.61 3.66 9.55
attention decoder 3.52 9.04 3.85 8.97
attention rescoring 3.10 8.91 3.29 8.81

SqueezeFormer Result (SM12, FFN:1024)

  • Encoder info:
    • SM12, reduce_idx 5, recover_idx 11, conv2d, w/o syncbn
    • encoder_dim 256, output_size 256, head 4, ffn_dim 256*4=1024
    • Encoder FLOPs(30s): 21,158,877,440, params: 22,219,912
  • Feature info:
    • using fbank feature, cmvn, dither, online speed perturb
  • Training info:
    • train_squeezeformer.yaml, kernel size 31,
    • batch size 12, 8 gpu, acc_grad 4, 120 epochs, dither 0.1
    • adamw, lr=1e-3, noamhold, warmup=0.2, hold=0.3, lr_decay=1.0
  • Decoding info: ctc_weight 0.3, reverse weight 0.5, average_num 30
decoding mode dev clean dev other test clean test other
ctc greedy search 3.49 9.24 3.51 9.28
ctc prefix beam search 3.44 9.23 3.51 9.25
attention decoder 3.59 8.74 3.75 8.70
attention rescoring 2.97 8.48 3.07 8.44

SqueezeFormer Result (SM12, FFN:2048)

  • Encoder info:
    • SM12, reduce_idx 5, recover_idx 11, conv2d, w/o syncbn
    • encoder_dim 256, output_size 256, head 4, ffn_dim 256*8=2048
    • encoder FLOPs(30s): 28,230,473,984, params: 34,827,400
  • Feature info: using fbank feature, cmvn, dither, online speed perturb
  • Training info:
    • train_squeezeformer.yaml, kernel size 31
    • batch size 12, 8 gpu, acc_grad 4, 120 epochs, dither 0.1
    • adamw, lr 1e-3, noamhold, warmup 0.2, hold 0.3, lr_decay 1.0
  • Decoding info:
    • ctc_weight 0.3, reverse weight 0.5, average_num 30
decoding mode dev clean dev other test clean test other
ctc greedy search 3.34 9.01 3.47 8.85
ctc prefix beam search 3.33 9.02 3.46 8.81
attention decoder 3.64 8.62 3.91 8.33
attention rescoring 2.89 8.34 3.10 8.03

SqueezeFormer Result (SM12, FFN:1312)

  • Encoder info:
    • SM12, reduce_idx 5, recover_idx 11, conv1d, w/o syncbn
    • encoder_dim 328, output_size 256, head 4, ffn_dim 328*4=1312
    • encoder FLOPs(30s): 34,103,960,008, params: 35,678,352
  • Feature info:
    • using fbank feature, cmvn, dither, online speed perturb
  • Training info:
    • train_squeezeformer.yaml, kernel size 31,
    • batch size 12, 8 gpu, acc_grad 4, 120 epochs, dither 1.0
    • adamw, lr 1e-3, noamhold, warmup 0.2, hold 0.3, lr_decay 1.0
  • Decoding info:
    • ctc_weight 0.3, reverse weight 0.5, average_num 30
decoding mode dev clean dev other test clean test other
ctc greedy search 3.20 8.46 3.30 8.58
ctc prefix beam search 3.18 8.44 3.30 8.55
attention decoder 3.38 8.31 3.89 8.32
attention rescoring 2.81 7.86 2.96 7.91

Conformer U2++ Result

  • Feature info: using fbank feature, cmvn, no speed perturb, dither
  • Training info: train_u2++_conformer.yaml lr 0.001, batch size 24, 8 gpu, acc_grad 1, 120 epochs, dither 1.0
  • Decoding info: ctc_weight 0.3, reverse weight 0.5, average_num 30
  • Git hash: 65270043fc8c2476d1ab95e7c39f730017a670e0

test clean

decoding mode full 16
ctc prefix beam search 3.76 4.54
attention rescoring 3.32 3.80

test other

decoding mode full 16
ctc prefix beam search 9.50 11.52
attention rescoring 8.67 10.38

SqueezeFormer Result (U2++, FFN:2048)

  • Encoder info:
    • SM12, reduce_idx 5, recover_idx 11, conv1d, layer_norm, do_rel_shift false
    • encoder_dim 256, output_size 256, head 4, ffn_dim 256*8=2048
    • Encoder FLOPs(30s): 28,230,473,984, params: 34,827,400
  • Feature info:
    • using fbank feature, cmvn, dither, online speed perturb
  • Training info:
    • train_squeezeformer.yaml, kernel size 31
    • batch size 12, 8 gpu, acc_grad 4, 120 epochs, dither 0.1
    • adamw, lr 1e-3, NoamHold, warmup 0.1, hold 0.4, lr_decay 1.0
  • Decoding info:
    • ctc_weight 0.3, reverse weight 0.5, average_num 30

test clean

decoding mode full 16
ctc prefix beam search 3.81 4.59
attention rescoring 3.36 3.93

test other

decoding mode full 16
ctc prefix beam search 9.12 11.17
attention rescoring 8.43 10.21

Conformer U2 Result

  • Feature info: using fbank feature, cmvn, speed perturb, dither
  • Training info: train_unified_conformer.yaml lr 0.001, batch size 10, 8 gpu, acc_grad 1, 120 epochs, dither 1.0
  • Decoding info: ctc_weight 0.5, average_num 30
  • Git hash: 90d9a559840e765e82119ab72a11a1f7c1a01b78
  • LM-tgmed: 3-gram.pruned.1e-7.arpa.gz
  • LM-tglarge: 3-gram.arpa.gz
  • LM-fglarge: 4-gram.arpa.gz

test clean

decoding mode full 16
ctc prefix beam search 4.26 5.00
attention decoder 3.05 3.44
attention rescoring 3.72 4.10
attention rescoring (beam 50) 3.57 3.95
LM-tgmed + attention rescoring 3.56 4.02
LM-tglarge + attention rescoring 3.40 3.82
LM-fglarge + attention rescoring 3.38 3.74

test other

decoding mode full 16
ctc prefix beam search 10.87 12.87
attention decoder 9.07 10.44
attention rescoring 9.74 11.61
attention rescoring (beam 50) 9.34 11.13
LM-tgmed + attention rescoring 8.78 10.26
LM-tglarge + attention rescoring 8.34 9.74
LM-fglarge + attention rescoring 8.17 9.44