-
-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Description
sort: scientific notation with sign is incorrectly parsed in general numeric sort
Component
sort
Description
GNU sort correctly parses scientific notation like "1e-5" (which equals 0.00001) in general numeric sort mode (-g). It uses the C standard library function strtold which handles scientific notation properly.
In GNU sort, strtold is used to parse general numeric values.
static int
general_numcompare (char const *sa, char const *sb)
{
char *ea;
char *eb;
long double a = strtold (sa, &ea);
long double b = strtold (sb, &eb);
...
}However, in uutils sort, there is an off-by-one error in get_leading_gen when checking for a digit after the exponent sign.
if let Some(&(_, &next_char)) = char_indices.peek() {
if (next_char == b'+' || next_char == b'-')
&& matches!(
char_indices.peek_nth(2), // should be peek_nth(1)
Some((_, c)) if c.is_ascii_digit()
)
{
// Consume the sign. The following digits will be consumed by the main loop.
char_indices.next();
had_e_notation = true;
continue;
}
if next_char.is_ascii_digit() {
had_e_notation = true;
continue;
}
}For "1e-5", when processing 'e', peek() (equivalent to peek_nth(0)) returns '-'. The code should use peek_nth(1) to check the next character '5', but it incorrectly uses peek_nth(2) which returns None. As a result, the exponent is not recognized and the number is parsed incorrectly.
Test / Reproduction Steps
# GNU
$ printf '1\n1e-5' | sort -g
1e-5
1
# uutils
$ printf '1\n1e-5' | coreutils sort -g
1
1e-5Impact
Scientific notation with signed exponents is incorrectly parsed and causes wrong sort order for numeric data.
Recommendations
Change peek_nth(2) to peek_nth(1) to correctly check the digit after the exponent sign.