Skip to content

sort: scientific notation with sign is incorrectly parsed in general numeric sort #10317

@sylvestre

Description

@sylvestre

sort: scientific notation with sign is incorrectly parsed in general numeric sort

Component

sort

Description

GNU sort correctly parses scientific notation like "1e-5" (which equals 0.00001) in general numeric sort mode (-g). It uses the C standard library function strtold which handles scientific notation properly.

In GNU sort, strtold is used to parse general numeric values.

static int
general_numcompare (char const *sa, char const *sb)
{
  char *ea;
  char *eb;
  long double a = strtold (sa, &ea);
  long double b = strtold (sb, &eb);
  ...
}

However, in uutils sort, there is an off-by-one error in get_leading_gen when checking for a digit after the exponent sign.

if let Some(&(_, &next_char)) = char_indices.peek() {
    if (next_char == b'+' || next_char == b'-')
        && matches!(
            char_indices.peek_nth(2), // should be peek_nth(1)
            Some((_, c)) if c.is_ascii_digit()
        )
    {
        // Consume the sign. The following digits will be consumed by the main loop.
        char_indices.next();
        had_e_notation = true;
        continue;
    }
    if next_char.is_ascii_digit() {
        had_e_notation = true;
        continue;
    }
}

For "1e-5", when processing 'e', peek() (equivalent to peek_nth(0)) returns '-'. The code should use peek_nth(1) to check the next character '5', but it incorrectly uses peek_nth(2) which returns None. As a result, the exponent is not recognized and the number is parsed incorrectly.

Test / Reproduction Steps

# GNU
$ printf '1\n1e-5' | sort -g
1e-5
1

# uutils
$ printf '1\n1e-5' | coreutils sort -g
1
1e-5

Impact

Scientific notation with signed exponents is incorrectly parsed and causes wrong sort order for numeric data.

Recommendations

Change peek_nth(2) to peek_nth(1) to correctly check the digit after the exponent sign.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions