-
-
Notifications
You must be signed in to change notification settings - Fork 606
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement some of MSVC's intrinsics, for ImportC. #16372
base: master
Are you sure you want to change the base?
Conversation
*remainder = cast(int) (dividend % divisor); | ||
return cast(int) (dividend / divisor); | ||
} | ||
else |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it really necessary to provide an asm implementation here? ditto throughout
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For _div64
specifically, yes, as compilers will generate the 64-bit encoding of div
, which won't cause a #DE error on overflow of the 32-bit quotient, so a change of behaviour.
For everything else, generally yeah, either because it can't be represented without asm, or to ensure that the generated code behaves identically to MSVC's code, or for performance where it's relevant (like 128-bit multiplication).
{ | ||
if (__ctfe) | ||
{ | ||
/* This is an amalgamation of core.int128.divmod and core.int128.neg. */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not simply forward to that implementation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that would rely on linking with DRuntime, which would rule out BetterC compatibility.
compiler/src/dmd/cparse.d
Outdated
auto s = new AST.Import(Loc.initial, null, Id.importc_builtins, null, false); | ||
wrap.push(s); | ||
wrap.push(new AST.Import(Loc.initial, null, Id.importc_builtins, null, false)); | ||
wrap.push(new AST.Import(Loc.initial, null, Id.importc_msvc_builtins, null, false)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could mscv builtins be publicly imported by importc builtins instead? That would remove the need to alter the compiler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aye, they can be; good shout. (Though, the __importc_msvc_builtins.d
file had to be renamed to __builtins_msvc.d
, to match the module name.)
2cd0117
to
e3d934c
Compare
e3d934c
to
ba2e427
Compare
I've flipped this back to a draft PR as currently the functions in the |
@@ -0,0 +1,4 @@ | |||
ImportC, when targeting the Microsoft C runtime, supports a subset of the intrinsics recognised by the MSVC compiler. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The documentation in the opening comment in the PR is far superior to this readme. Merge it in here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I've copied the right part—better now?
|
||
static inline unsigned char __readgsbyte(unsigned int Offset) | ||
{ | ||
return *(unsigned char __seg_gs *) (_import_c_msvc_ptr_int) Offset; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ImportC does not recognize __seg_gs or __seg_fs, so I don't see how this can work
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought that GDC delegated to GCC for C files, but I was mistaken. I've replaced this with inline asm in D, as is already done for DMD.
compiler/src/dmd/cparse.d
Outdated
if (idx.length > 2 && idx[0] == '_' && idx[1] == '_') // leading double underscore | ||
importBuiltins = true; // probably one of those compiler extensions | ||
if (idx.length > 1 && idx[0] == '_') // leading underscore | ||
importBuiltins = true; // maybe one of those compiler extensions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a little concerned about nearly always importing 10,000 lines of code for Windows compilations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mhm, it's not great. How about what I've pushed now?
We have a lazily-initialised StringTable
of all the MSVC intrinsic names that we have implemented.
If we're targeting the Microsoft C runtime, and we come across an identifier with a leading underscore, then we check if that identifier is a known MSVC intrinsic, and if it is we import the MSVC builtins and we stop checking future identifiers.
That way, the cost of the import is paid only when one of the intrinsics is actually going to be used.
The downside there is we have to remember to update that StringTable
if more MSVC intrinsics are implemented, and we do pay the minor cost of hashing and looking up identifiers with leading underscores.
ba2e427
to
27fe297
Compare
This is still a draft as I'm unable to get the compiler to actually generate code for the Presently, neither the This is effectively how unsigned long long UnsignedMultiply128 (unsigned long long Multiplier, unsigned long long Multiplicand, unsigned long long *HighProduct);
#define UnsignedMultiply128 _umul128
unsigned long long UnsignedMultiply128 (unsigned long long Multiplier, unsigned long long Multiplicand, unsigned long long *HighProduct); which will expand to But even if the Windows headers didn't do that, we still need code to be generated for ( Right now I'm stuck, because I've yet to figure out a good way to get the The best I've come up with thus far is very hacky, and would require the same changes to be made in LDC and GDC (unless I'm mistaken), which is: in In short, my question is: is there an existing way to make it so the Any suggestions would be appreciated. |
27fe297
to
5878242
Compare
I suggest having the user do the #include line as the first line in his C code. Or have the user pass it in with the dmd switch that passes switches to the C preprocessor. Will that work? |
5878242
to
f9ec5af
Compare
If that works for you, it works for me. #include <importc_msvc_builtins.h>
#include <windows.h> I've added some code to supply Though, the issue of these builtins only working when the Is there an existing way to force codegen for an imported module, when the |
8949057
to
480a37b
Compare
480a37b
to
fc28c1b
Compare
Okay; a separate PR adding a mechanism that this PR can use to force codegen for the Via something like this in if (token.value == TOK._import) // import declaration extension
{
auto a = parseImport();
if (a && a.length)
/*+*/ {
/*+*/ auto imp = (*a)[0].isImport();
/*+*/ imp.forceCodegen = imp.id == Id.builtins_msvc;
symbols.append(a);
/*+*/ }
return;
} |
#elif defined(__GNUC__) | ||
#define __assume(expression) do {if (!(expression)) {__builtin_unreachable();}} while (0) | ||
#else |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In case this is meant to be public, FYI I think importC would discard this macro, whereas the other definitions (clang) are simple enough to be exposed as templates.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also __assume
is supposed to be an optimiser hint, not an assert, which is what this implementation does.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(unreachable is a compile-time barrier, not an assert)
I'll take your word for the clang and !clang!gnu paths.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(unreachable is a compile-time barrier, not an assert)
I know that. assert
is (often) defined as do { if (!cond) { /* print stuff*/ ; __builtin_unreachable();}} while (0)
. This is almost identical (modulo the print stuff).
Assume should be #define assume(x) x
, i.e. a no-op.
Did you try converting these functions to templates? |
private void llvm_arm_hint(int) @safe pure nothrow @nogc; | ||
} | ||
} | ||
else version (GNU) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for taking the time to consider both ldc and gdc here, it's really appreciated
I was just wondering - GCC doesn't support MSVC. So are these GNU version blocks dead on Mingw targets?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for taking the time to consider both ldc and gdc here, it's really appreciated
:)
So are these GNU version blocks dead on Mingw targets?
I think so. Unless GDC defines CRuntime_Microsoft
when targeting MinGW?
I don't really know the story as far as GDC's support of Windows goes. So, I just implemented the GDC versions to save some other poor sod the hassle of doing it later it on.
I don't think there's anything preventing this implementation from working when targeting MinGW or Cygwin, if that's wanted?
I did—but no dice, unfortunately, even with all the affected functions being templates, the simplest possible usage still fails to link: // windows_h_c.c:
#include <importc_msvc_builtins.h>
#include <windows.h>
// windows_h.d:
pragma(lib, "MinCore");
import windows_h_c;
|
Just have the user add the #include line. |
Perhaps I'm misunderstanding you, but is that not what the current implementation is doing? There, the user |
fc28c1b
to
045fe87
Compare
This PR implements all but three of the MSVC intrinsics listed here: https://web.archive.org/web/20240412171516/https://learn.microsoft.com/en-ie/cpp/intrinsics/alphabetical-listing-of-intrinsic-functions?view=msvc-170, and a handful of undocumented intrinsics.
The three unimplemented intrinsics are
_AddressOfReturnAddress
,__getcallerseflags
, and_ReturnAddress
, because they need compiler support for their implementation.The implemented intrinsics are listed in this expando.
__assume
__umulh
__mulh
_umul128
_mul128
__emul
__emulu
_div128
_udiv128
_div64
_udiv64
__cpuid
__cpuidex
_cvt_ftoi_fast
_cvt_ftoll_fast
_cvt_ftoui_fast
_cvt_ftoull_fast
_cvt_dtoi_fast
_cvt_dtoll_fast
_cvt_dtoui_fast
_cvt_dtoull_fast
_cvt_ftoi_sat
_cvt_ftoll_sat
_cvt_ftoui_sat
_cvt_ftoull_sat
_cvt_dtoi_sat
_cvt_dtoll_sat
_cvt_dtoui_sat
_cvt_dtoull_sat
_cvt_ftoi_sent
_cvt_ftoll_sent
_cvt_ftoui_sent
_cvt_ftoull_sent
_cvt_dtoi_sent
_cvt_dtoui_sent
_cvt_dtoll_sent
_cvt_dtoull_sent
__readgsbyte
__readgsword
__readgsdword
__readgsqword
__writegsbyte
__writegsword
__writegsdword
__writegsqword
__addgsbyte
__addgsword
__addgsdword
__addgsqword
__incgsbyte
__incgsword
__incgsdword
__incgsqword
__readfsbyte
__readfsword
__readfsdword
__readfsqword
__writefsbyte
__writefsword
__writefsdword
__writefsqword
__addfsbyte
__addfsword
__addfsdword
__incfsbyte
__incfsword
__incfsdword
__debugbreak
__fastfail
__faststorefence
_disable
_enable
_interlockedadd [Undocumented]
_interlockedadd64 [Undocumented]
_InterlockedAdd
_InterlockedAdd_acq
_InterlockedAdd_rel
_InterlockedAdd_nf
_InterlockedAdd64
_InterlockedAdd64_acq
_InterlockedAdd64_rel
_InterlockedAdd64_nf
_InterlockedAddLargeStatistic
_InterlockedAnd
_InterlockedAnd8
_InterlockedAnd16
_interlockedand64 [Undocumented]
_InterlockedAnd_acq
_InterlockedAnd_rel
_InterlockedAnd_nf
_InterlockedAnd8_acq
_InterlockedAnd8_rel
_InterlockedAnd8_nf
_InterlockedAnd16_acq
_InterlockedAnd16_rel
_InterlockedAnd16_nf
_InterlockedAnd64_acq
_InterlockedAnd64_rel
_InterlockedAnd64_nf
_InterlockedAnd64
_InterlockedAnd_np
_InterlockedAnd8_np
_InterlockedAnd16_np
_InterlockedAnd64_np
_InterlockedAnd64_HLEAcquire
_InterlockedAnd64_HLERelease
_InterlockedAnd_HLEAcquire
_InterlockedAnd_HLERelease
_interlockedbittestandreset
_interlockedbittestandreset64
_interlockedbittestandreset_HLEAcquire
_interlockedbittestandreset_HLERelease
_interlockedbittestandreset64_HLEAcquire
_interlockedbittestandreset64_HLERelease
_interlockedbittestandreset_acq
_interlockedbittestandreset_rel
_interlockedbittestandreset_nf
_interlockedbittestandreset64_acq
_interlockedbittestandreset64_rel
_interlockedbittestandreset64_nf
_interlockedbittestandset
_interlockedbittestandset64
_interlockedbittestandset_HLEAcquire
_interlockedbittestandset_HLERelease
_interlockedbittestandset64_HLEAcquire
_interlockedbittestandset64_HLERelease
_interlockedbittestandset_acq
_interlockedbittestandset_rel
_interlockedbittestandset_nf
_interlockedbittestandset64_acq
_interlockedbittestandset64_rel
_interlockedbittestandset64_nf
_InterlockedCompareExchange
_InterlockedCompareExchange8
_InterlockedCompareExchange16
_InterlockedCompareExchange64
_InterlockedCompareExchange_HLEAcquire
_InterlockedCompareExchange_HLERelease
_InterlockedCompareExchange64_HLEAcquire
_InterlockedCompareExchange64_HLERelease
_InterlockedCompareExchange_np
_InterlockedCompareExchange16_np
_InterlockedCompareExchange64_np
_InterlockedCompareExchange_acq
_InterlockedCompareExchange_rel
_InterlockedCompareExchange_nf
_InterlockedCompareExchange8_acq
_InterlockedCompareExchange8_rel
_InterlockedCompareExchange8_nf
_InterlockedCompareExchange16_acq
_InterlockedCompareExchange16_rel
_InterlockedCompareExchange16_nf
_InterlockedCompareExchange64_acq
_InterlockedCompareExchange64_rel
_InterlockedCompareExchange64_nf
_InterlockedCompareExchange128
_InterlockedCompareExchange128_np
_InterlockedCompareExchange128_acq
_InterlockedCompareExchange128_rel
_InterlockedCompareExchange128_nf
_InterlockedCompareExchangePointer
_InterlockedCompareExchangePointer_HLEAcquire
_InterlockedCompareExchangePointer_HLERelease
_InterlockedCompareExchangePointer_np
_InterlockedCompareExchangePointer_acq
_InterlockedCompareExchangePointer_rel
_InterlockedCompareExchangePointer_nf
_InterlockedDecrement
_InterlockedDecrement16
_interlockeddecrement64
_InterlockedDecrement64
_InterlockedDecrement_acq
_InterlockedDecrement_rel
_InterlockedDecrement_nf
_InterlockedDecrement16_acq
_InterlockedDecrement16_rel
_InterlockedDecrement16_nf
_InterlockedDecrement64_acq
_InterlockedDecrement64_rel
_InterlockedDecrement64_nf
_InterlockedExchange
_InterlockedExchange8
_InterlockedExchange16
_interlockedexchange64
_InterlockedExchange64
_InterlockedExchange_HLEAcquire
_InterlockedExchange_HLERelease
_InterlockedExchange64_HLEAcquire
_InterlockedExchange64_HLERelease
_InterlockedExchange_acq
_InterlockedExchange_rel
_InterlockedExchange_nf
_InterlockedExchange8_acq
_InterlockedExchange8_rel
_InterlockedExchange8_nf
_InterlockedExchange16_acq
_InterlockedExchange16_rel
_InterlockedExchange16_nf
_InterlockedExchange64_acq
_InterlockedExchange64_rel
_InterlockedExchange64_nf
_InterlockedExchangeAdd
_InterlockedExchangeAdd8
_InterlockedExchangeAdd16
_interlockedexchangeadd64 [Undocumented]
_InterlockedExchangeAdd64
_InterlockedExchangeAdd_HLEAcquire
_InterlockedExchangeAdd_HLERelease
_InterlockedExchangeAdd64_HLEAcquire
_InterlockedExchangeAdd64_HLERelease
_InterlockedExchangeAdd_acq
_InterlockedExchangeAdd_rel
_InterlockedExchangeAdd_nf
_InterlockedExchangeAdd8_acq
_InterlockedExchangeAdd8_rel
_InterlockedExchangeAdd8_nf
_InterlockedExchangeAdd16_acq
_InterlockedExchangeAdd16_rel
_InterlockedExchangeAdd16_nf
_InterlockedExchangeAdd64_acq
_InterlockedExchangeAdd64_rel
_InterlockedExchangeAdd64_nf
_InterlockedExchangePointer
_InterlockedExchangePointer_HLEAcquire
_InterlockedExchangePointer_HLERelease
_InterlockedExchangePointer_acq
_InterlockedExchangePointer_rel
_InterlockedExchangePointer_nf
_InterlockedIncrement
_InterlockedIncrement16
_interlockedincrement64 [Undocumented]
_InterlockedIncrement64
_InterlockedIncrement_acq
_InterlockedIncrement_rel
_InterlockedIncrement_nf
_InterlockedIncrement16_acq
_InterlockedIncrement16_rel
_InterlockedIncrement16_nf
_InterlockedIncrement64_acq
_InterlockedIncrement64_rel
_InterlockedIncrement64_nf
_InterlockedOr
_InterlockedOr8
_InterlockedOr16
_interlockedor64 [Undocumented]
_InterlockedOr_acq
_InterlockedOr_rel
_InterlockedOr_nf
_InterlockedOr8_acq
_InterlockedOr8_rel
_InterlockedOr8_nf
_InterlockedOr16_acq
_InterlockedOr16_rel
_InterlockedOr16_nf
_InterlockedOr64_acq
_InterlockedOr64_rel
_InterlockedOr64_nf
_InterlockedOr64
_InterlockedOr_np
_InterlockedOr8_np
_InterlockedOr16_np
_InterlockedOr64_np
_InterlockedOr64_HLEAcquire
_InterlockedOr64_HLERelease
_InterlockedOr_HLEAcquire
_InterlockedOr_HLERelease
_InterlockedXor
_InterlockedXor8
_InterlockedXor16
_interlockedxor64 [Undocumented]
_InterlockedXor_acq
_InterlockedXor_rel
_InterlockedXor_nf
_InterlockedXor8_acq
_InterlockedXor8_rel
_InterlockedXor8_nf
_InterlockedXor16_acq
_InterlockedXor16_rel
_InterlockedXor16_nf
_InterlockedXor64_acq
_InterlockedXor64_rel
_InterlockedXor64_nf
_InterlockedXor64
_InterlockedXor_np
_InterlockedXor8_np
_InterlockedXor16_np
_InterlockedXor64_np
_InterlockedXor64_HLEAcquire
_InterlockedXor64_HLERelease
_InterlockedXor_HLEAcquire
_InterlockedXor_HLERelease
__inbyte
__inword
__indword
__outbyte
__outword
__outdword
__inbytestring
__inwordstring
__indwordstring
__outbytestring
__outwordstring
__outdwordstring
__int2c
__invlpg
__lidt
__ll_lshift
__ll_rshift
__ull_rshift
__lzcnt16
__lzcnt
__lzcnt64
_mm_cvtsi64x_ss
_mm_cvtss_si64x
_mm_cvttss_si64x
_mm_extract_si64
_mm_extracti_si64
_mm_insert_si64
_mm_inserti_si64
_mm_stream_sd
_mm_stream_ss
_mm_stream_si64x
__movsb
__movsw
__movsd
__movsq
__noop
__nop
__popcnt16
__popcnt
__popcnt64
__rdtsc
__rdtscp
__readcr0
__readcr2
__readcr3
__readcr4
__readcr8
__readdr
__readeflags
__readmsr
__readpmc
__segmentlimit
__shiftleft128
__shiftright128
__sidt
__stosb
__stosw
__stosd
__stosq
__svm_clgi
__svm_invlpga
__svm_skinit
__svm_stgi
__svm_vmload
__svm_vmrun
__svm_vmsave
__ud2
__vmx_off
__vmx_on
__vmx_vmclear
__vmx_vmlaunch
__vmx_vmptrld
__vmx_vmptrst
__vmx_vmread
__vmx_vmresume
__vmx_vmwrite
__wbinvd
__writecr0
__writecr2
__writecr3
__writecr4
__writecr8
__writedr
__writeeflags
__writemsr
_ReadBarrier
_WriteBarrier
_ReadWriteBarrier
_BitScanForward
_BitScanReverse
_BitScanForward64
_BitScanReverse64
_bittest
_bittestandcomplement
_bittestandreset
_bittestandset
_bittest64
_bittestandcomplement64
_bittestandreset64
_bittestandset64
_byteswap_uint64
_byteswap_ulong
_byteswap_ushort
_lrotr
_lrotl
_rotr
_rotl
_rotr64
_rotl64
_rotr16
_rotl16
_rotr8
_rotl8
These implementations aim to be as compatible with the MSVC intrinsics as is possible—adhering to Hyrum's Law.
Separate implementations are provided for DMD, LDC, and GDC, and for x86, x86-64, AArch64, and ARM.
Almost all the intrinsics are implemented in D, except for
__assume
which is a C macro.Every intrinsic, where possible, has a CTFE-compatible code-path.
It all compiles with or without DIP1000 being enabled.
Care has been taken to ensure that none of the implementations rely on DRuntime, so that these work in BetterC.
Regarding the
_cvt_
family of functions: by default MSVC will generate code that uses SSE2 instructions, even for 32-bit targets, which means that for 32-bit targets the_cvt_
functions will use SSE2.This is contrary to DMD's usual behaviour of using x87 for 32-bit Windows.
For their reimplementations, I've used SSE2 anyway for 32-bit targets for DMD, as doing otherwise would constitute a change in behaviour, as x87 FP-exceptions are different from SIMD FP-exceptions (and, I think the SSE2 and x87 versions return different results).
The oldest of the
_cvt_
functions was introduced in the May of 2021, so I think it's almost certain that any code using them will be targeting at-least SSE2 anyway.Additionally, I've written a program that tests that the
_cvt_
implementations return identical results to the MSVC implementations for all float values, and for ~402,653,184 double values (except for_cvt_ftoi_fast
and_cvt_ftoi_sent
on 32-bit targets, as they cause an internal compiler error in MSVC); it also tests thatctfeX86RoundLongToFloat
andctfeX86RoundFloatToLong
produce the same results as the hardware.It relies on Phobos and MSVC, so I don't really know what to do with it other than link to it here: https://github.com/just-harry/float-fuzzing-for-msvc-intrinsics
A few of the intrinsics can be used only in kernel-mode, so unittests have been omitted for them, as I don't think we have any infrastructure in place for testing in kernel-mode.
Some of the intrinsics terminate the program, so their unittests have been wrapped in a
version (none)
, others rely on specific compiler optimisations, so they too have been wrapped in aversion (none)
.I've split the intrinsics up into a few dozen commits to try and alleviate the whole 13,000-lines-of-codes-all-at-once thing.
(I staged them after-the-fact, so a few braces may be out-of-place in the intermediate commits.)
The intrinsics have been placed in their own files, separate from the existing ImportC builtin files, to try and avoid crowding the builtin files.
Currently, the ImportC builtins are imported conditionally, based on some heuristics.
One of those heuristics is if any identifier beginning with two underscores is used – I've changed that one to instead trigger on a single leading underscore as many of MSVC's intrinsics begin with only one underscore.
One notable header that can be successfully included by ImportC, with these intrinsics implemented, is
windows.h
.P.S. If this is outside the purview of ImportC: that's fine, I'll just publish this as a library instead.