Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[owasp-modsecurity compatibility] hexDecode method #1253

Open
tty2 opened this issue Dec 18, 2024 · 13 comments · May be fixed by #1275
Open

[owasp-modsecurity compatibility] hexDecode method #1253

tty2 opened this issue Dec 18, 2024 · 13 comments · May be fixed by #1275
Assignees

Comments

@tty2
Copy link
Contributor

tty2 commented Dec 18, 2024

Summary

coraza doesn't implement method hexDecode
owasp-modsecurity has this method

Basic example

https://github.com/owasp-modsecurity/ModSecurity/wiki/Reference-Manual-(v3.x)#hexdecode

Motivation

OWASP modsecurity compatibility

@fzipi
Copy link
Member

fzipi commented Dec 18, 2024

Thanks for reporting! Are you up for a PR?

@tty2
Copy link
Contributor Author

tty2 commented Dec 18, 2024

Hi
I'm gonna take it to implement soon.

@tty2
Copy link
Contributor Author

tty2 commented Dec 20, 2024

Could you help me to understand the approach used in the tests for hexDecode?
https://github.com/corazawaf/coraza/blob/main/internal/transformations/testdata/hexDecode.json

@jcchavezs I see you are an author of these tests, you are probably more in the context. And thanks you've done it in advance!

My current assumption that coraza tends to "best-effort" approach (opposite to "fail fast" approach). In that case it explains everything.
As example this test case:

 {
    "type": "tfn",
    "name": "hexDecode",
    "input": "01234567890a0",
    "output": "\\x01#Eg\\x89\\x0a",
    "ret": 1
 }

From RFC 4648:
"The encoding process represents 8-bit groups (octets) of input bits as output strings of 2 encoded characters." => the valid lenght of the input should be even. In this case it is odd, 13 characters.
As I understand the offered logic is pretty simple: from the expected output I can guess we just trim the input and process the rest. Right?

The other case is more tricky:

  {
    "name": "hexDecode",
    "type": "tfn",
    "ret": 1,
    "input": "01234567890a0z01234567890a",
    "output": "\\x01#Eg\\x89\\x0a#\\x01#Eg\\x89\\x0a"
  }

We have even number of characters at the begging but with the invalid symbol inside ("z" in this case). With removing invalid symbol only we'll get to the situation from the previous case (odd number of symbols). Which logic we should follow here?
I guess find the next valid pair and follow, right?
The most interesting part is how do we get "#" from "0z". Is that some specific behavior? I couldn't find any mentioning about it in RFC. Could you clarify this part, please?

Thanks in advance

@tty2
Copy link
Contributor Author

tty2 commented Dec 30, 2024

@fzipi @jcchavezs
could you take a look at the comment above when you have time please?

@tty2 tty2 linked a pull request Jan 6, 2025 that will close this issue
4 tasks
@fzipi
Copy link
Member

fzipi commented Jan 9, 2025

Yes, sorry for the delay. And thanks for the followup! 💪

@fzipi
Copy link
Member

fzipi commented Jan 9, 2025

This is what happens in modsecurity:

  • v2:
 /* hexEncode */

 static int msre_fn_hexEncode_execute(apr_pool_t *mptmp, unsigned char *input,
     long int input_len, char **rval, long int *rval_len)
 {
     *rval = bytes2hex(mptmp, input, input_len);
     *rval_len = strlen(*rval);

     return 1;
 }

And bytes2hex function:

 /**
  * Converts a series of bytes into its hexadecimal
  * representation.
  */
 char *bytes2hex(apr_pool_t *pool, unsigned char *data, int len) {
     static const unsigned char b2hex[] = "0123456789abcdef";
     char *hex = NULL;
     int i, j;

     hex = apr_palloc(pool, (len * 2) + 1);
     if (hex == NULL) return NULL;

     j = 0;
     for(i = 0; i < len; i++) {
         hex[j++] = b2hex[data[i] >> 4];
         hex[j++] = b2hex[data[i] & 0x0f];
     }
     hex[j] = 0;

     return hex;
 }
  • v3:
 bool HexEncode::transform(std::string &value, const Transaction *trans) const {
     if (value.empty()) return false;

     std::stringstream result;
     for (const auto c : value) {
         unsigned int ii = (unsigned char)c;
         result << std::setw(2) << std::setfill('0') << std::hex << ii;
     }

     value = result.str();
     return true;
 }

@fzipi
Copy link
Member

fzipi commented Jan 9, 2025

I would say v3's approach is clear, and the proper implementation.

@fzipi
Copy link
Member

fzipi commented Jan 9, 2025

So this leaves now with the decode

@fzipi
Copy link
Member

fzipi commented Jan 9, 2025

  • v2:
 int hex2bytes_inplace(unsigned char *data, int len) {
     unsigned char *d = data;
     int i, count = 0;

     if ((data == NULL)||(len == 0)) return 0;

     for(i = 0; i <= len - 2; i += 2) {
         *d++ = x2c(&data[i]);
         count++;
     }
     *d = '\0';

     return count;
 }

with x2c:

 static unsigned char x2c(unsigned char *what) {
     register unsigned char digit;

     digit = (what[0] >= 'A' ? ((what[0] & 0xdf) - 'A') + 10 : (what[0] - '0'));
     digit *= 16;
     digit += (what[1] >= 'A' ? ((what[1] & 0xdf) - 'A') + 10 : (what[1] - '0'));

     return digit;
 }
  • v3:
 static inline int inplace(std::string &value) {
     if (value.empty()) return false;

     const auto len = value.length();
     auto d = reinterpret_cast<unsigned char *>(value.data());
     const auto *data = d;

     for (int i = 0; i <= len - 2; i += 2) {
         *d++ = utils::string::x2c(&data[i]);
     }

     *d = '\0';

     value.resize(d - data);
     return true;
 }

and

 /**
  * Converts a single hexadecimal digit into a decimal value.
  */
 inline unsigned char xsingle2c(const unsigned char *what) {
     unsigned char digit;

     digit = (what[0] >= 'A' ? ((what[0] & 0xdf) - 'A') + 10 : (what[0] - '0'));

     return digit;
 }
 inline unsigned char x2c(const unsigned char *what) {
     unsigned char digit;

     digit = xsingle2c(what);
     digit *= 16;
     digit += xsingle2c(what+1);

     return digit;
 }

@fzipi
Copy link
Member

fzipi commented Jan 9, 2025

So clearly, it is not checking if the received char is valid.

@fzipi
Copy link
Member

fzipi commented Jan 9, 2025

@fzipi
Copy link
Member

fzipi commented Jan 9, 2025

So I think this deserves a modsecurity issue so we align, but the test is broken as you said. I think we should return error if there is a char not in [0-9a-f]. It clearly cannot be "generated with the same algorithm for encoding" as the documentation claims.

@tty2
Copy link
Contributor Author

tty2 commented Jan 9, 2025

Thanks @fzipi
so it leads us to std lib hex.Decode then
I'll update the PR soon, and tests too

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants