Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add comprehensive unit tests to cover edge cases for Delta log path parsing #408

Closed
wants to merge 4 commits into from

Conversation

hackintoshrao
Copy link
Contributor

This PR introduces new unit tests to validate the Delta Lake log path parsing logic corner cases thoroughly:

  1. test_invalid_version_and_part_lengths:

    • Verifies correct handling of invalid version lengths (VERSION_LEN)
    • Tests parsing of invalid multipart checkpoint lengths (MULTIPART_PART_LEN)
    • Ensures proper error handling for invalid UUID lengths (UUID_PART_LEN)
  2. test_empty_filename:

    • Confirms that URLs ending with a slash (resulting in empty filenames)
      are correctly rejected
  3. test_ok_none_and_unknown_cases:

    • Validates handling of non-numeric version prefixes (Ok(None) cases)
    • Verifies correct parsing of missing file extensions (Ok(None) cases)
    • Ensures proper handling of empty file extensions (Ok(Some) with Unknown type)

These tests significantly improve coverage of edge cases and error
conditions in log path parsing.

This commit introduces two new unit tests to enhance the robustness
of our Delta Lake log path parsing:

1. test_invalid_version_and_part_lengths:
   - Validates correct handling of invalid version lengths (VERSION_LEN)
   - Checks parsing of invalid multipart checkpoint lengths (MULTIPART_PART_LEN)
   - Ensures proper error handling for invalid UUID lengths (UUID_PART_LEN)

2. test_empty_filename:
   - Verifies that attempts to parse URLs ending with a slash (resulting
     in empty filenames) are correctly rejected

These tests ensure that our parsing logic correctly identifies and
rejects invalid log paths, maintaining the integrity of our Delta
Lake log processing.
test_ok_none_and_unknown_cases:
   - Validates handling of non-numeric version prefixes (Ok(None) cases)
   - Verifies correct parsing of missing file extensions (Ok(None) cases)
   - Ensures proper handling of empty file extensions (Ok(Some) with Unknown type)
Copy link

codecov bot commented Oct 18, 2024

Codecov Report

Attention: Patch coverage is 80.76923% with 20 lines in your changes missing coverage. Please review.

Project coverage is 77.86%. Comparing base (025aba8) to head (362372a).
Report is 3 commits behind head on main.

Files with missing lines Patch % Lines
kernel/src/path.rs 80.76% 9 Missing and 11 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #408      +/-   ##
==========================================
- Coverage   78.34%   77.86%   -0.48%     
==========================================
  Files          49       49              
  Lines       10282    10360      +78     
  Branches    10282    10360      +78     
==========================================
+ Hits         8055     8067      +12     
- Misses       1775     1833      +58     
- Partials      452      460       +8     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Collaborator

@scovich scovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking a look at this!

This PR definitely adds some good missed cases, but I think most of the newly added cases were already covered by existing tests.

Suggest to add the missed cases to their respective existing tests (in the same style), rather than making entirely new tests. And maybe update the comments and expect_err calls of those existing cases, since they aren't as nicely described as what this PR does.

kernel/src/path.rs Outdated Show resolved Hide resolved
kernel/src/path.rs Outdated Show resolved Hide resolved
kernel/src/path.rs Outdated Show resolved Hide resolved
kernel/src/path.rs Outdated Show resolved Hide resolved
kernel/src/path.rs Outdated Show resolved Hide resolved
kernel/src/path.rs Outdated Show resolved Hide resolved
kernel/src/path.rs Outdated Show resolved Hide resolved
kernel/src/path.rs Outdated Show resolved Hide resolved
kernel/src/path.rs Outdated Show resolved Hide resolved
This commit enhances the Delta Lake log path parsing tests by adding
explicit boundary test cases:

1. UUID length validation in test_uuid_checkpoint_patterns:
   - Tests UUID with exactly 35 characters (one too short)
   - Uses realistic UUID format: "3a0d65cd-4056-49b8-937b-95f9e3ee90e"
   - Makes length requirements clearer for future maintainers

2. File extension handling in test_unknown_invalid_patterns:
   - Tests file with no extension (returns Ok(None))
   - Tests file with empty extension "." (returns Ok(Some) with Unknown type)
   - Clarifies the distinction between these two cases

These boundary tests improve test coverage and make the requirements
more explicit in the test suite, while maintaining the existing test
structure and style.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants