-
Notifications
You must be signed in to change notification settings - Fork 8
Feat/smart parsing #22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Warning Rate limit exceeded@Sandip124 has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 12 minutes and 51 seconds before requesting another review. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. 📒 Files selected for processing (1)
WalkthroughA large static dictionary for Nepali month name variants in the date parser was replaced with a fuzzy matching system using sequence alignment. The parser now uses this matcher to identify months from input strings. Comprehensive tests for various month name spellings were added, and the test project was updated to target .NET 9.0. Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant SmartDateParser
participant NepaliMonthMatcher
User->>SmartDateParser: Parse(input)
SmartDateParser->>NepaliMonthMatcher: FindBestMatch(monthName)
NepaliMonthMatcher-->>SmartDateParser: Month number (if match)
SmartDateParser-->>User: NepaliDate (year, month, day) or error
Poem
✨ Finishing Touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (5)
src/NepDate/SmartDateParser.cs (3)
149-149
: Consider making the similarity threshold configurable.The hardcoded threshold of 0.4 seems quite low and could potentially match unintended month names. Consider:
- Making this a configurable parameter or constant
- Using a higher default threshold (e.g., 0.6) to reduce false positives
- Documenting why 0.4 was chosen
+ private const double DefaultMonthMatchingThreshold = 0.6; + private static bool TryParseMonthNameFormat(string input, out NepaliDate result) { // ... - int? monthNumber = NepaliMonthMatcher.FindBestMatch(word, 0.4); + int? monthNumber = NepaliMonthMatcher.FindBestMatch(word, DefaultMonthMatchingThreshold);
405-432
: Reduce redundancy in the canonical dictionary.The canonical dictionary still contains multiple variations for each month (e.g., "asar", "ashad" for month 3), which partially defeats the purpose of fuzzy matching. Consider keeping only one canonical form per month and relying on the fuzzy matching algorithm to handle variations.
private static readonly Dictionary<string, int> CanonicalMonthNames = new Dictionary<string, int>(StringComparer.OrdinalIgnoreCase) { - { "baisakh", 1 }, - { "jestha", 2 }, - { "asar", 3 },{ "ashad", 3 }, - { "shrawan", 4 },{ "saun", 4 }, - { "bhadra", 5 }, - { "ashwin", 6 },{ "ashoj", 6 },{"aswayuja",6}, - { "kartik", 7 }, - { "mangsir", 8 }, - { "poush", 9 }, - { "magh", 10 }, - { "falgun", 11 }, - { "chaitra", 12 }, + { "baisakh", 1 }, + { "jestha", 2 }, + { "asar", 3 }, + { "shrawan", 4 }, + { "bhadra", 5 }, + { "ashwin", 6 }, + { "kartik", 7 }, + { "mangsir", 8 }, + { "poush", 9 }, + { "magh", 10 }, + { "falgun", 11 }, + { "chaitra", 12 }, // Keep Nepali Unicode names as they are distinct { "बैशाख", 1 }, // ... (rest of Unicode names) };
440-468
: Consider adding caching for improved performance.The fuzzy matching algorithm has O(m*n) complexity for each comparison and iterates through all canonical month names. For applications that parse many dates, consider implementing a cache for previously matched month names.
+ private static readonly Dictionary<string, int?> _matchCache = + new Dictionary<string, int?>(StringComparer.OrdinalIgnoreCase); + private const int MaxCacheSize = 1000; + public static int? FindBestMatch(string input, double threshold = 0.6) { if (string.IsNullOrWhiteSpace(input)) return null; string normalizedInput = input; + // Check cache first + if (_matchCache.TryGetValue(normalizedInput, out var cachedResult)) + return cachedResult; // Try exact match first for performance if (CanonicalMonthNames.TryGetValue(normalizedInput, out int exactMatch)) + { + CacheResult(normalizedInput, exactMatch); return exactMatch; + } // ... rest of the method ... + CacheResult(normalizedInput, bestMatch); return bestMatch; } + + private static void CacheResult(string input, int? result) + { + if (_matchCache.Count >= MaxCacheSize) + _matchCache.Clear(); // Simple cache eviction strategy + _matchCache[input] = result; + }tests/NepDate.Tests/Core/SmartDateParserTests.cs (2)
323-323
: Review test case validity for extreme variations.The test case "chaitanya" for month 12 (Chaitra) seems like it might be testing beyond reasonable misspellings, as "Chaitanya" is typically a person's name. Consider whether such extreme variations should match.
- [InlineData("chaitanya", 12)]
325-329
: Consider adding tests for other date format patterns.The current test only covers the "15 {monthName} 2080" pattern. Consider adding tests for other common patterns to ensure comprehensive coverage.
[Theory] [InlineData("baisakh 15, 2080", 1, 15)] [InlineData("2080 jestha 20", 2, 20)] [InlineData("5th asar 2080", 3, 5)] public void Parse_WithNepaliMonthName_DifferentFormats_ReturnsCorrectDate(string input, int expectedMonth, int expectedDay) { var expectedDate = new NepaliDate(2080, expectedMonth, expectedDay); Assert.Equal(expectedDate, SmartDateParser.Parse(input)); }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
src/NepDate/SmartDateParser.cs
(3 hunks)tests/NepDate.Tests/Core/SmartDateParserTests.cs
(1 hunks)tests/NepDate.Tests/NepDate.Tests.csproj
(1 hunks)
🔇 Additional comments (1)
tests/NepDate.Tests/Core/SmartDateParserTests.cs (1)
178-329
: Comprehensive test coverage for fuzzy month matching!The parameterized test provides excellent coverage of month name variations. The test data effectively validates the fuzzy matching algorithm's ability to handle common misspellings and transliteration differences.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (2)
src/NepDate/NepaliMonthMatcher.cs (2)
13-40
: Consider improving dictionary formatting for better readabilityThe dictionary initialization contains multiple entries on single lines, which reduces readability. Consider placing each entry on its own line for better maintainability.
- { "asar", 3 },{ "ashad", 3 }, - { "shrawan", 4 },{ "saun", 4 }, + { "asar", 3 }, + { "ashad", 3 }, + { "shrawan", 4 }, + { "saun", 4 },
10-142
: Consider monitoring threshold effectiveness in productionThe fuzzy matching implementation using Needleman-Wunsch algorithm is a good architectural choice. Consider:
- Monitoring: Track false positives/negatives in production to tune the default threshold value (0.6)
- Performance: If this becomes a hot path, consider caching similarity scores for frequently queried strings
- Extensibility: The scoring parameters could be made configurable if different use cases require different sensitivity levels
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
src/NepDate/NepaliMonthMatcher.cs
(1 hunks)src/NepDate/SmartDateParser.cs
(3 hunks)tests/NepDate.Tests/Core/SmartDateParserTests.cs
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- src/NepDate/SmartDateParser.cs
- tests/NepDate.Tests/Core/SmartDateParserTests.cs
🔇 Additional comments (1)
src/NepDate/NepaliMonthMatcher.cs (1)
78-139
: Well-implemented sequence alignment algorithmThe Needleman-Wunsch algorithm implementation is correct with proper:
- Dynamic programming approach
- Score normalization to [0, 1] range
- Case-insensitive character comparison
- Edge case handling for null/empty strings
Description
Replaced dictionary-based month name matching with Needleman-Wunsch sequence alignment algorithm for intelligent fuzzy matching in Nepali date parser
The previous implementation relied on a massive dictionary containing hundreds of manually maintained month name variations and which was to fix the misspelling but there may be lot of combination of character for each name and writing that for each month will be very tedious and not good approach in my opinion so i use simple matching algorithm to solve it which aims to match the combination of the month names.
This will not match the names which are too different that the month name. for our case
'vesak' and 'beshak' was not matching at all. I think it is relevant because the word is too different.
Under the hood the algorithm uses a Dynamic Programming approach to solve the matching problem and now the parser intelligently matches the month name making it more adaptative matching engine rather than using static dictionary lookups
Summary by CodeRabbit