Skip to content

MultiByteToWideChar implementation is not threadsafe #7869

@brakhane

Description

@brakhane

Description
The current simple MultiByteToWideChar implementation for non-Windows systems uses setlocale to set LC_ALL=en_US.UTF-8, and later restores the old version. This is problematic for two reasons:

First, the value returned by setlocale can point to internal memory so the return value when called with nullptr might become invalid if other setlocale calls happened in between. See SEI ENV34-C.

To demonstrate the problem, consider the following program:

#include <stdio.h>
#include <locale.h>

int main() {
  const char *x1, *x2;

  x1 = setlocale(LC_ALL, NULL);
  setlocale(LC_ALL, "en_US.UTF-8");
  x2 = setlocale(LC_ALL, NULL);
  setlocale(LC_ALL, "en_US.UTF-8");
  setlocale(LC_ALL, x1);
  setlocale(LC_ALL, x2);

}

when compiled with -fsanitize-address and LC_ALL being empty, there will be warnings about heap-use-after-free on the later setlocale calls.

This can actually happen when a program embeds dxc dxcCompiler->Compile gets called in multiple threads and one compilation finishes before the other; both will call MultiByteToWideChar which triggers the problem.

A fix for this particular problem would be to use strdup to copy the value returned by setlocale(LC_ALL, nullptr); this will get rid of the heap-use-after-free issue.

However, that would still cause threads to potentially "reset" LC_ALL to "en_US.UTF-8", consider the following:

  1. Thread 1 starts a Compile, Compile calls MultiByteToWideChar and stores the current LC_ALL locale, let's assume this is currently "C" and sets LC_ALL to en_US.UTF-8
  2. Thread 2 starts a compile as well, it will now look up the current value of LC_ALL, which is en_US.UTF-8 and store that.
  3. Thread 1 finishes and restores LC_ALL to C
  4. Thread 2 finishes and incorrectly "restores" LC_ALL to en_US.UTF-8

A way around this might be to only store LC_ALL the very first time and always restore it to that value; however, that would interact badly with programs who embed DXC and later on change the locale.

I can prepare a PR that uses strdup to fix the heap-use-after-free, but I'm not quite sure how to handle the second problem.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugBug, regression, crashneeds-triageAwaiting triage

    Type

    No type

    Projects

    Status

    No status

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions