Skip to content

Commit 3aab51d

Browse files
committed
Merge rust-bitcoin#205: Implement error correction
76d0dae fuzz: add fuzztests that try to correct bech32 and codex32 errors (Andrew Poelstra) 383f788 correction: support erasures (Andrew Poelstra) 2e1b7be implement error correction (Andrew Poelstra) 6c24f98 primitives: introduce the Berlekamp-Massey algorithm for computing linear shift registers (Andrew Poelstra) fc903d6 field: require TryInto<Base> for ExtensionField (Andrew Poelstra) 4dfe325 field: add ability to multiply by integers (Andrew Poelstra) 74ec75f bech32: use correct generator exponents (Andrew Poelstra) Pull request description: This implements the core algorithms for error correction. In principle this exposes an API which is sufficient for somebody to implement error correction (of both substitutions and erasures). In practice the API is unlikely to be super usable because: * We yield error locations as indices from the *end* of the string rather than from the beginning (which we do because the error correction logic doesn't know the original string or even its length); * We similarly require the user indicate the location of erasures as indices from the end of the string; * We yield errors as GF32 offsets to be added to the current character in the string, rather than as correct characters (again, we do this because we don't know the string). * There is a situation in which we detectably cannot correct the string, but we yield some "corrections" anyway (to detect this case, we need to notice if the error iterator ends "early" for a technical definition of "early"; this is not too hard but there's an API question about whether the iterator should be yielding a `Result` or what). * We don't have a way for the user to signal erasures other than providing a valid bech32 character and then later telling the correction logic that the location is an erasure. We should be able to parse `?`s or something. There is also some missing functionality: * We should be able to correct "burst errors" where if the user indicates a long string of erasures all in a row, we should be able to correct up to checksum-length-many of them. (But if there are other errors, we then won't detect them, so I'm unsure what the UX should look like..) * Eventually we ought to have a "list decoder" which not only provides a unique best correction if one exists, but always provides a list of "plausible" corrections that the user would then need to check against the blockchain. This would involve a totally different error correction algorithm and I don't intend to do it in the next several years, but throwing it out there anyway. The next PR will be an "error correction API" PR. I would like some guidance from users on what this API should look like. ACKs for top commit: clarkmoody: ACK 76d0dae Tree-SHA512: 83c6e0a261475bfcf23bff0c7911714f4e366222a67881638818ee991dfe7900e8b38ece872a89ddcfa91cb15b89bd90b0d38d3ae87d2d079bda81c8ed4805e3
2 parents 3f98190 + 76d0dae commit 3aab51d

15 files changed

+1143
-13
lines changed

.github/workflows/fuzz.yml

+1-1
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ jobs:
1010
strategy:
1111
fail-fast: false
1212
matrix:
13-
fuzz_target: [decode_rnd, encode_decode, parse_hrp]
13+
fuzz_target: [berlekamp_massey, correct_bech32, correct_codex32, decode_rnd, encode_decode, parse_hrp]
1414
steps:
1515
- name: Install test dependencies
1616
run: sudo apt-get update -y && sudo apt-get install -y binutils-dev libunwind8-dev libcurl4-openssl-dev libelf-dev libdw-dev cmake gcc libiberty-dev

fuzz/Cargo.toml

+12
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,18 @@ bech32 = { path = ".." }
1717
[workspace]
1818
members = ["."]
1919

20+
[[bin]]
21+
name = "berlekamp_massey"
22+
path = "fuzz_targets/berlekamp_massey.rs"
23+
24+
[[bin]]
25+
name = "correct_bech32"
26+
path = "fuzz_targets/correct_bech32.rs"
27+
28+
[[bin]]
29+
name = "correct_codex32"
30+
path = "fuzz_targets/correct_codex32.rs"
31+
2032
[[bin]]
2133
name = "decode_rnd"
2234
path = "fuzz_targets/decode_rnd.rs"

fuzz/fuzz_targets/berlekamp_massey.rs

+58
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,58 @@
1+
use bech32::primitives::LfsrIter;
2+
use bech32::Fe32;
3+
use honggfuzz::fuzz;
4+
5+
fn do_test(data: &[u8]) {
6+
for ch in data {
7+
if *ch >= 32 {
8+
return;
9+
}
10+
}
11+
if data.is_empty() || data.len() > 1_000 {
12+
return;
13+
}
14+
15+
let mut iv = Vec::with_capacity(data.len());
16+
for ch in data {
17+
iv.push(Fe32::try_from(*ch).unwrap());
18+
}
19+
20+
for (i, d) in LfsrIter::berlekamp_massey(&iv).take(data.len()).enumerate() {
21+
assert_eq!(data[i], d.to_u8());
22+
}
23+
}
24+
25+
fn main() {
26+
loop {
27+
fuzz!(|data| {
28+
do_test(data);
29+
});
30+
}
31+
}
32+
33+
#[cfg(test)]
34+
mod tests {
35+
fn extend_vec_from_hex(hex: &str, out: &mut Vec<u8>) {
36+
let mut b = 0;
37+
for (idx, c) in hex.as_bytes().iter().filter(|&&c| c != b'\n').enumerate() {
38+
b <<= 4;
39+
match *c {
40+
b'A'..=b'F' => b |= c - b'A' + 10,
41+
b'a'..=b'f' => b |= c - b'a' + 10,
42+
b'0'..=b'9' => b |= c - b'0',
43+
_ => panic!("Bad hex"),
44+
}
45+
if (idx & 1) == 1 {
46+
out.push(b);
47+
b = 0;
48+
}
49+
}
50+
}
51+
52+
#[test]
53+
fn duplicate_crash() {
54+
let mut a = Vec::new();
55+
extend_vec_from_hex("00", &mut a);
56+
super::do_test(&a);
57+
}
58+
}

fuzz/fuzz_targets/correct_bech32.rs

+112
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,112 @@
1+
use std::collections::HashMap;
2+
3+
use bech32::primitives::correction::CorrectableError as _;
4+
use bech32::primitives::decode::CheckedHrpstring;
5+
use bech32::{Bech32, Fe32};
6+
use honggfuzz::fuzz;
7+
8+
// coinbase output of block 862290
9+
static CORRECT: &[u8; 62] = b"bc1qwzrryqr3ja8w7hnja2spmkgfdcgvqwp5swz4af4ngsjecfz0w0pqud7k38";
10+
11+
fn do_test(data: &[u8]) {
12+
if data.is_empty() || data.len() % 2 == 1 {
13+
return;
14+
}
15+
16+
let mut any_actual_errors = false;
17+
let mut e2t = 0;
18+
let mut erasures = Vec::with_capacity(CORRECT.len());
19+
// Start with a correct string
20+
let mut hrpstring = *CORRECT;
21+
// ..then mangle it
22+
let mut errors = HashMap::with_capacity(data.len() / 2);
23+
for sl in data.chunks_exact(2) {
24+
let idx = usize::from(sl[0]) & 0x7f;
25+
if idx >= CORRECT.len() - 3 {
26+
return;
27+
}
28+
let offs = match Fe32::try_from(sl[1]) {
29+
Ok(fe) => fe,
30+
Err(_) => return,
31+
};
32+
33+
hrpstring[idx + 3] =
34+
(Fe32::from_char(hrpstring[idx + 3].into()).unwrap() + offs).to_char() as u8;
35+
36+
if errors.insert(CORRECT.len() - (idx + 3) - 1, offs).is_some() {
37+
return;
38+
}
39+
if sl[0] & 0x80 == 0x80 {
40+
// We might push "dummy" errors which are erasures that aren't actually wrong.
41+
// If we do this too many times, we'll exceed the singleton bound so correction
42+
// will fail, but as long as we're within the bound everything should "work",
43+
// in the sense that there will be no crashes and the error corrector will
44+
// just yield an error with value Q.
45+
erasures.push(CORRECT.len() - (idx + 3) - 1);
46+
e2t += 1;
47+
if offs != Fe32::Q {
48+
any_actual_errors = true;
49+
}
50+
} else if offs != Fe32::Q {
51+
any_actual_errors = true;
52+
e2t += 2;
53+
}
54+
}
55+
// We need _some_ errors.
56+
if !any_actual_errors {
57+
return;
58+
}
59+
60+
let s = unsafe { core::str::from_utf8_unchecked(&hrpstring) };
61+
let mut correct_ctx = CheckedHrpstring::new::<Bech32>(s)
62+
.unwrap_err()
63+
.correction_context::<Bech32>()
64+
.unwrap();
65+
66+
correct_ctx.add_erasures(&erasures);
67+
68+
let iter = correct_ctx.bch_errors();
69+
if e2t <= 3 {
70+
for (idx, fe) in iter.unwrap() {
71+
assert_eq!(errors.remove(&idx), Some(fe));
72+
}
73+
for val in errors.values() {
74+
assert_eq!(*val, Fe32::Q);
75+
}
76+
}
77+
}
78+
79+
fn main() {
80+
loop {
81+
fuzz!(|data| {
82+
do_test(data);
83+
});
84+
}
85+
}
86+
87+
#[cfg(test)]
88+
mod tests {
89+
fn extend_vec_from_hex(hex: &str, out: &mut Vec<u8>) {
90+
let mut b = 0;
91+
for (idx, c) in hex.as_bytes().iter().filter(|&&c| c != b'\n').enumerate() {
92+
b <<= 4;
93+
match *c {
94+
b'A'..=b'F' => b |= c - b'A' + 10,
95+
b'a'..=b'f' => b |= c - b'a' + 10,
96+
b'0'..=b'9' => b |= c - b'0',
97+
_ => panic!("Bad hex"),
98+
}
99+
if (idx & 1) == 1 {
100+
out.push(b);
101+
b = 0;
102+
}
103+
}
104+
}
105+
106+
#[test]
107+
fn duplicate_crash() {
108+
let mut a = Vec::new();
109+
extend_vec_from_hex("04010008", &mut a);
110+
super::do_test(&a);
111+
}
112+
}

fuzz/fuzz_targets/correct_codex32.rs

+137
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,137 @@
1+
use std::collections::HashMap;
2+
3+
use bech32::primitives::correction::CorrectableError as _;
4+
use bech32::primitives::decode::CheckedHrpstring;
5+
use bech32::{Checksum, Fe1024, Fe32};
6+
use honggfuzz::fuzz;
7+
8+
/// The codex32 checksum algorithm, defined in BIP-93.
9+
///
10+
/// Used in this fuzztest because it can correct up to 4 errors, vs bech32 which
11+
/// can correct only 1. Should exhibit more interesting behavior.
12+
#[derive(Copy, Clone, PartialEq, Eq, PartialOrd, Ord, Hash)]
13+
pub enum Codex32 {}
14+
15+
impl Checksum for Codex32 {
16+
type MidstateRepr = u128;
17+
type CorrectionField = Fe1024;
18+
const ROOT_GENERATOR: Self::CorrectionField = Fe1024::new([Fe32::_9, Fe32::_9]);
19+
const ROOT_EXPONENTS: core::ops::RangeInclusive<usize> = 9..=16;
20+
21+
const CHECKSUM_LENGTH: usize = 13;
22+
const CODE_LENGTH: usize = 93;
23+
// Copied from BIP-93
24+
const GENERATOR_SH: [u128; 5] = [
25+
0x19dc500ce73fde210,
26+
0x1bfae00def77fe529,
27+
0x1fbd920fffe7bee52,
28+
0x1739640bdeee3fdad,
29+
0x07729a039cfc75f5a,
30+
];
31+
const TARGET_RESIDUE: u128 = 0x10ce0795c2fd1e62a;
32+
}
33+
34+
static CORRECT: &[u8; 48] = b"ms10testsxxxxxxxxxxxxxxxxxxxxxxxxxx4nzvca9cmczlw";
35+
36+
fn do_test(data: &[u8]) {
37+
if data.is_empty() || data.len() % 2 == 1 {
38+
return;
39+
}
40+
41+
let mut any_actual_errors = false;
42+
let mut e2t = 0;
43+
let mut erasures = Vec::with_capacity(CORRECT.len());
44+
// Start with a correct string
45+
let mut hrpstring = *CORRECT;
46+
// ..then mangle it
47+
let mut errors = HashMap::with_capacity(data.len() / 2);
48+
for sl in data.chunks_exact(2) {
49+
let idx = usize::from(sl[0]) & 0x7f;
50+
if idx >= CORRECT.len() - 3 {
51+
return;
52+
}
53+
let offs = match Fe32::try_from(sl[1]) {
54+
Ok(fe) => fe,
55+
Err(_) => return,
56+
};
57+
58+
hrpstring[idx + 3] =
59+
(Fe32::from_char(hrpstring[idx + 3].into()).unwrap() + offs).to_char() as u8;
60+
61+
if errors.insert(CORRECT.len() - (idx + 3) - 1, offs).is_some() {
62+
return;
63+
}
64+
if sl[0] & 0x80 == 0x80 {
65+
// We might push "dummy" errors which are erasures that aren't actually wrong.
66+
// If we do this too many times, we'll exceed the singleton bound so correction
67+
// will fail, but as long as we're within the bound everything should "work",
68+
// in the sense that there will be no crashes and the error corrector will
69+
// just yield an error with value Q.
70+
erasures.push(CORRECT.len() - (idx + 3) - 1);
71+
e2t += 1;
72+
if offs != Fe32::Q {
73+
any_actual_errors = true;
74+
}
75+
} else if offs != Fe32::Q {
76+
any_actual_errors = true;
77+
e2t += 2;
78+
}
79+
}
80+
// We need _some_ errors.
81+
if !any_actual_errors {
82+
return;
83+
}
84+
85+
let s = unsafe { core::str::from_utf8_unchecked(&hrpstring) };
86+
let mut correct_ctx = CheckedHrpstring::new::<Codex32>(s)
87+
.unwrap_err()
88+
.correction_context::<Codex32>()
89+
.unwrap();
90+
91+
correct_ctx.add_erasures(&erasures);
92+
93+
let iter = correct_ctx.bch_errors();
94+
if e2t <= 8 {
95+
for (idx, fe) in iter.unwrap() {
96+
assert_eq!(errors.remove(&idx), Some(fe));
97+
}
98+
for val in errors.values() {
99+
assert_eq!(*val, Fe32::Q);
100+
}
101+
}
102+
}
103+
104+
fn main() {
105+
loop {
106+
fuzz!(|data| {
107+
do_test(data);
108+
});
109+
}
110+
}
111+
112+
#[cfg(test)]
113+
mod tests {
114+
fn extend_vec_from_hex(hex: &str, out: &mut Vec<u8>) {
115+
let mut b = 0;
116+
for (idx, c) in hex.as_bytes().iter().filter(|&&c| c != b'\n').enumerate() {
117+
b <<= 4;
118+
match *c {
119+
b'A'..=b'F' => b |= c - b'A' + 10,
120+
b'a'..=b'f' => b |= c - b'a' + 10,
121+
b'0'..=b'9' => b |= c - b'0',
122+
_ => panic!("Bad hex"),
123+
}
124+
if (idx & 1) == 1 {
125+
out.push(b);
126+
b = 0;
127+
}
128+
}
129+
}
130+
131+
#[test]
132+
fn duplicate_crash() {
133+
let mut a = Vec::new();
134+
extend_vec_from_hex("8c00a10091039e0185008000831f8e0f", &mut a);
135+
super::do_test(&a);
136+
}
137+
}

src/lib.rs

+1-1
Original file line numberDiff line numberDiff line change
@@ -105,7 +105,7 @@
105105
//! type MidstateRepr = u128;
106106
//! type CorrectionField = bech32::primitives::gf32_ext::Fe32Ext<2>;
107107
//! const ROOT_GENERATOR: Self::CorrectionField = Fe1024::new([Fe32::_9, Fe32::_9]);
108-
//! const ROOT_EXPONENTS: core::ops::RangeInclusive<usize> = 77..=84;
108+
//! const ROOT_EXPONENTS: core::ops::RangeInclusive<usize> = 9..=16;
109109
//!
110110
//! const CHECKSUM_LENGTH: usize = 13;
111111
//! const CODE_LENGTH: usize = 93;

0 commit comments

Comments
 (0)