Skip to content

IsValidUTF8() can reach behind UBound throwing an error #46

@TeWeBu

Description

@TeWeBu

In this function, there are several occations of tests like this:

if i + 2 > UBound(byteData) OR byteData(i + 1) < 128 Or byteData(i + 1) > 191 Or _
byteData(i + 2) < 128 Or byteData(i + 2) > 191 Then

in this case, if "i + 2 > UBound(byteData)" evaluates to true, the if statement will raise an error, because once "byteData(i + 2) < 128" is evaluated, it reaches out of bound.

To fix this, I think it would be neccessary to have the given expressions put in extra if statements like so:

Private Function IsValidUTF8(byteData() As Byte) As Boolean
Dim i As Long
Dim ANSIfile As Boolean

i = 0
ANSIfile = True
While i <= UBound(byteData)
    If byteData(i) < 128 Then
        ' Single-byte character (ASCII)
        i = i + 1
    Else
        ANSIfile = False
        If byteData(i) >= 192 And byteData(i) <= 223 Then
            ' Two-byte sequence
            If i + 1 > UBound(byteData) Then ' <--------------------- fixes the error
                IsValidUTF8 = False
                Exit Function
            ElseIf byteData(i + 1) < 128 Or byteData(i + 1) > 191 Then
                IsValidUTF8 = False
                Exit Function
            End If
            i = i + 2
        ElseIf byteData(i) >= 224 And byteData(i) <= 239 Then
            ' Three-byte sequence
            If i + 2 > UBound(byteData) Then ' <--------------------- fixes the error
                IsValidUTF8 = False
                Exit Function
            ElseIf byteData(i + 1) < 128 Or byteData(i + 1) > 191 Or _
               byteData(i + 2) < 128 Or byteData(i + 2) > 191 Then
                IsValidUTF8 = False
                Exit Function
            End If
            i = i + 3
        ElseIf byteData(i) >= 240 And byteData(i) <= 247 Then
            ' Four-byte sequence
            If i + 3 > UBound(byteData) Then ' <--------------------- fixes the error
                IsValidUTF8 = False
                Exit Function
            ElseIf byteData(i + 1) < 128 Or byteData(i + 1) > 191 Or _
               byteData(i + 2) < 128 Or byteData(i + 2) > 191 Or byteData(i + 3) < 128 Or byteData(i + 3) > 191 Then
                IsValidUTF8 = False
                Exit Function
            End If
            i = i + 4
        Else
            ' Invalid start byte
            IsValidUTF8 = False
            Exit Function
        End If
    End If
Wend
IsValidUTF8 = Not ANSIfile

End Function

I do not know, if there is a better solutuion, but in my project it works.

Thank you for this nice library! Easy to work with. :-)

Metadata

Metadata

Assignees

No one assigned

    Labels

    AcceptfixedThe issue has been corrected

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions