You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Description
The keyword_token function uses unsafe { str::from_utf8_unchecked(word) } to convert a byte slice (&[u8]) into a string slice (&str) without validating whether the input is valid UTF-8. This introduces undefined behavior (UB) if the word parameter contains invalid UTF-8 bytes. The absence of validation makes the function unsound.
Problems:
this function is a pub function, so I assume user can control the word field, it cause some problems.
Undefined Behavior on Invalid UTF-8:
unsafe { str::from_utf8_unchecked(word) } assumes that the word slice is valid UTF-8. If this assumption is violated, undefined behavior occurs immediately.
The function does not verify that word is valid UTF-8 before invoking the unsafe conversion.
No Safety Contract:
The function is not marked as unsafe, nor does it document the requirement that the word input must be valid UTF-8. This makes it easy for callers to misuse the function by passing invalid inputs.
Potential Exploitation:
If word is derived from untrusted or external input, it could contain invalid UTF-8. This could lead to crashes, memory corruption, or other unpredictable behavior.
Suggestion
mark this function as unsafe and provide safety doc.
add some check in the function body eg. use from_utf8 instead.
Additional Context:
Unsafe code should only be used when safety invariants are strictly guaranteed. The current implementation assumes that the word input is always valid UTF-8, but this is not enforced or documented, making the function unsound. By switching to std::str::from_utf8, the function can remain safe and robust while handling invalid input gracefully.
The text was updated successfully, but these errors were encountered:
Uh oh!
There was an error while loading. Please reload this page.
Description
The keyword_token function uses unsafe { str::from_utf8_unchecked(word) } to convert a byte slice (&[u8]) into a string slice (&str) without validating whether the input is valid UTF-8. This introduces undefined behavior (UB) if the word parameter contains invalid UTF-8 bytes. The absence of validation makes the function unsound.
libsql/vendored/sqlite3-parser/src/dialect/mod.rs
Line 69 in 9241b00
Problems:
this function is a
pub
function, so I assume user can control theword
field, it cause some problems.unsafe { str::from_utf8_unchecked(word) } assumes that the word slice is valid UTF-8. If this assumption is violated, undefined behavior occurs immediately.
The function does not verify that word is valid UTF-8 before invoking the unsafe conversion.
The function is not marked as unsafe, nor does it document the requirement that the word input must be valid UTF-8. This makes it easy for callers to misuse the function by passing invalid inputs.
If word is derived from untrusted or external input, it could contain invalid UTF-8. This could lead to crashes, memory corruption, or other unpredictable behavior.
Suggestion
from_utf8
instead.Additional Context:
Unsafe code should only be used when safety invariants are strictly guaranteed. The current implementation assumes that the word input is always valid UTF-8, but this is not enforced or documented, making the function unsound. By switching to std::str::from_utf8, the function can remain safe and robust while handling invalid input gracefully.
The text was updated successfully, but these errors were encountered: