Commit e0dbec3
Fix encoding of multiple preceding spaces in BPE tokenizer. (#945)
* Fix encoding of multiple preceding spaces in BPE tokenizer.
* Add test
---------
Co-authored-by: rasbt <[email protected]>1 parent 90e0f3c commit e0dbec3
2 files changed
+13
-2
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
609 | 609 | | |
610 | 610 | | |
611 | 611 | | |
612 | | - | |
613 | 612 | | |
614 | 613 | | |
| 614 | + | |
615 | 615 | | |
616 | 616 | | |
617 | 617 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
238 | 238 | | |
239 | 239 | | |
240 | 240 | | |
241 | | - | |
| 241 | + | |
| 242 | + | |
| 243 | + | |
| 244 | + | |
| 245 | + | |
| 246 | + | |
| 247 | + | |
| 248 | + | |
| 249 | + | |
| 250 | + | |
| 251 | + | |
| 252 | + | |
0 commit comments