Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify RID Model #260

Open
wants to merge 13 commits into
base: main
Choose a base branch
from
Open

Simplify RID Model #260

wants to merge 13 commits into from

Conversation

richlander
Copy link
Member

@richlander richlander commented Apr 3, 2022

This plan is intended to:

  • Freeze the RID graph.
  • Adopt stable RIDs (no version numbers).
  • Limit runtimes.json to being used by NuGet only (by default).
  • Hosts adopt a simple non-extensible model.
  • Contractualize what linux-x64 means (same for Arm64 and x86).
  • Define linux-x64 vs source-build.

More generally, we waste significant amount of time talking about RIDs every release. This proposal is intended to simplify the most problematic aspect of the RID topic. Once that is resolved, we can think about taking on other RID challenges.

Related to:

Rendered view

accepted/2022/simplify-rid-model.md Outdated Show resolved Hide resolved
accepted/2022/simplify-rid-model.md Show resolved Hide resolved
accepted/2022/simplify-rid-model.md Outdated Show resolved Hide resolved

For Red Hat, it makes sense to accept and support `linux-x64` NuGet assets, but not to produce or offer them. Instead, Red Hat would want to produce `rhel-x64` runtime pack assets. It's easiest to describe this design-point in terms of concrete scenarios.

**NuGet packages** -- NuGet authors will typically value breadth and efficiency. For example, a package like `SkiaSharp` might target `linux-x64`, but not `rhel-x64`. There is no technical reason for that package to include distro-specific assets. Also, if the author produced a `rhel-x64` asset, they would need to consider producing an `ubuntu-x64` and other similar assets for other distros, and that's not scalable. The expectation is that the `rhel-x64` .NET supports `linux-x64` NuGet assets, enabling NuGet authors to target a minimal set of RIDs.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like this document generally, and this section specifically, is making some assumptions that I don't think are justified. There are absolutely scenarios where distro-specific assets need to be shipped in a NuGet package.

As an example of this, take a look at https://www.nuget.org/packages/LibGit2Sharp.NativeBinaries/2.0.306

LibGit2Sharp is a managed wrapper around the native libgit2 library and has to accommodate the native dependencies of libgit2. One of the dependencies is OpenSSL, which varies wildly among all the distros that .NET supports.

In order to have a reasonable chance of having a native binary that just works on all platforms, I have had to spend a lot of time understanding the RID graph and shipping enough distro-specific binaries to cover all the supported Linux distros

I will point out that libgit2 has recently changed how they are binding to OpenSSL which has let me simplify things down to more of an "ideal" situation (see the newer package), but that was largely out of my control since I'm not a libgit2 maintainer.

If the ability to ship distro-specific assets in a NuGet package and have the proper one selected had not been a feature that I could count on, then I would not have been able to support LibGit2Sharp on Linux at all for the past several years. Maybe I'm misunderstanding something, but it seems like this proposal is removing this feature.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mind you, libgit2/libgit2sharp#1714 changed the model used by libgit2sharp. AFAIK, the change basically makes libgit2sharp try and load all native libgit2 libraries one by one until one of them can be loaded successfully (ie, links to a version of OpenSSL available on the system). The actual name (and RID) is not too relevant in this scenario.

I feel like your use-case would still work if nuget did not "know" about distro-specific RIDs.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great feedback. We discussed it.

There are three models:

  • Don't support specific distros as a general concept (the current proposal).
  • Fully support specific distros as a general concept (some variation on status-quo).
  • Adopt a model that is a variation on the Python manylinux wheel plan, including support for OpenSSL.

I like what @omajid is proposing. It's the same approach we use for OpenSSL. We don't use RIDs. Also, the current proposal works for the vast majority of cases and is a massive simplification. I'm hoping we can keep it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

libgit2sharp was suffering from the rid scalability issue: it was unable to deal with unknown rids.

Using NativeLibrary.SetDllImportResolver it now tries the different so files. The code is here: https://github.com/libgit2/libgit2sharp/blob/97bee65fd296f1c7dd2d1d64581c170f45b584e1/LibGit2Sharp/Core/NativeMethods.cs#L99-L121

This way, the library can have its own logic for picking so files. It's not limited to/by rids.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this logic was added, but I consider it a fallback solution, and not something desirable to rely on as the primary logic, for a couple of reasons:

  • It only works if the entire runtimes folder is shipped as part of the application, which is only going to be true if you use one of the cross-platform publishing options
  • Having to attempt to load every binary until you find one that works seems like it would have a negative impact on startup perf.

If libraries no longer can assume that the best native asset is selected for them by the framework/NuGet, then that means they have to manually ensure that all binaries are included in the publish output to be able to select from them at runtime. That can pretty massively increase the size of the output. Using LibGit2Sharp.NativeBinaries 2.0.306 as an example, that means going from having a single ~1MB .so file in your output to having 9 different copies, and that's assuming I'd have some way to know to only copy the Linux binaries and not the entire runtimes folder.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For example add a property to csproj that enables the user to choose the supported OpenSSL versions, and include so files based on that.

The problem does not end with OpenSSL. The same problem exists for many other libraries.

I believe that the only truly scalable option for Linux ecosystem is to have building from source as an option (NuGet/Home#9631). Building from source is capable of producing a binary that works for the specific configuration of your system without the package maintainer pre-building it for you.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem does not end with OpenSSL. The same problem exists for many other libraries.

Yes, and this was the main point of the feedback I was trying to raise here. The proposal currently assumes that there is never a reason to care about the specific distro you're running on, but that is demonstrably not true.

This all comes down to there being a disconnect between the .NET ecosystem expectations of sharing pre-compiled binaries, and the Linux ecosystem that expects you to share source and compile it on your own system. Shipping a Linux binary in a package is always going to be an uphill battle, so I would like to see the .NET side of things evolve in a way that makes it easier, not harder.

Copy link
Member

@tmds tmds Apr 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Notice that it does not have support for probing for OpenSSL 3 yet so it is going to break once distros switch to OpenSSL 3 only. dotnet/runtime has probing for OpenSSL3 that adds a whole new level of shimming and complexity.

Let's assume OpenSSL 3 won't be supported by a libgit2 shimming (immediately) and you want to solve it issue using distro rids.

Fedora 36 should use the OpenSSL 3 build.

Fedora 37 should be using the OpenSSL 3 build also. That doesn't work automagically today, because the rid grah doesn't describe a supports relationship between Fedora 37 and Fedora 36.

For your package to remain maintainable, this needs to be addressed (and preferably not by bloating the graph but by making the rid logic smarter).

I've only mentioned Fedora here, but OpenSSL will be adopted by Ubuntu, Debian, so you need to manage this problem for them too.

Your package now looks like this:

linux-x64 -> supports OpenSSL 1, 2, 3 and picks using `NativeLibrary`
fedora.36 -> OpenSSL 3
ubuntu.22.04 -> OpenSSL 3
debian.12 -> OpenSSL 3
...

Note that you are including the distro specific assets solely for the purpose of rid-based trimming. They bloat your package and require you to track in what version a distro adopts a new OpenSSL version.

The proposal currently assumes that there is never a reason to care about the specific distro you're running on

NativeLibrary can be used, and should be used to make linux-x64 work across a range of distros.

What is lost, is the ability to trim against a rid.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bording I hope this example helped illustrate how unscalable the rid mechanism is?

And, if it should stay, inheritance between successive distro versions is a must-have, which is now missing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And, if it should stay, inheritance between successive distro versions is a must-have, which is now missing.

To elaborate. The rids as we have them are not scalable. The alternative to the simple model that is proposed here, is to have semantic rids. That means rids follow a <distro>.<version>-<arch> naming scheme. Information is derived from that instead of hard coding it in the json file (e.g. Fedora 36 -> Fedora 35). The json file should then only contain non-derivable information (like what distros are glibc/musl based).

This is a much more complex solution.


## Minimum CRT/libc version

This scheme doesn't explicitly define a CRT/libc version. Instead, we will define a minimum CRT/libc version, per major .NET version. That minimum CRT/libc version will form a contract and should be documented, per .NET version. Practically, it defines the oldest OS version that you can run an app on, for a given .NET version.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using C runtime version to define the OS version makes sense on Linux. It is the best approximation of the OS version to use for determining binary compatibility (of binaries written in C/C++).

Using C runtime version for this purpose makes less sense on Windows and macOS. Windows and macOS have well-defined OS version. Both Windows (Windows SDK) and macOS (XCode) native toolchains are oriented on using the OS version for this purpose (WINVER on Windows and -mmacos-version-min on macOS). We should use OS version for determining binary compatibility on Windows and macOS.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want me to add that text, or adapt its meaning into the current text?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think adapting the meaning into the current text would look better. I would lead with minimum OS version as a documented contract, per .NET version. libc version is Linux-specific approximation of OS version across distros for the purposes of this contract.

@tmds
Copy link
Member

tmds commented Sep 6, 2022

I think the main impact of the design is that package maintainers must change from using distro-rids for separating native libraries to group them under a portable rid (e.g. linux-x64) and use NativeLibrary to load the appropriate native library.

(In practice) we'll also loose the ability to trim for a distro-specific rid. This came for free because the native assets were organized by distro-rid. If the package maintainer needs to maintain a portable rid (linux-x64) which includes native assets for all platforms, there isn't much in it for him to add these assets a second time under the distro rid(, bloating the package,) solely for purpose of trimming. And trimming for the distro-specific rid will only work on the sdk that knows the rid.

@bording
Copy link

bording commented Sep 6, 2022

I think the main impact of the design is that package maintainers must change from using distro-rids for separating native libraries to group them under a portable rid (e.g. linux-x64) and use NativeLibrary to load the appropriate native library.

The problem I see with saying package maintainers have to use this approach is that it is overlooking how publish output changes if you specify a rid or not.

A portable publish will copy all of the rid folders into the output, but a publishing with -r linux-x64 will copy just the content of the package's linux-x64 folder into the main publish output folder.

That makes writing code that can use NativeLibrary to search and find the right binary much more complicated because you can't always assume there is a linux-64 folder in the project output to be searching in.

@tmds
Copy link
Member

tmds commented Sep 6, 2022

My comment was to highlight the complexity that gets pushed towards the package maintainers.

Note that if your library has different native libraries for glibc Linux distros, in order to make -r linux-x64 portable (that is: work across that range of distros), you already need to use NativeLibrary to select the appropriate native library.

With this proposal, in the case the user doesn't specify a rid, the library now also needs to use NativeLibrary because host will no longer do the distro rid-graph based selection.

@bording
Copy link

bording commented Sep 6, 2022

My comment was to highlight the complexity that gets pushed towards the package maintainers.

👍 And that is what I've been somewhat trying to push back on through my comments here. Or at least acknowledgement that this is understood and documented appropriately on how to author packages in this more complicated way.

Note that if your library has different native libraries for glibc Linux distros, in order to make -r linux-x64 portable (that is: work across that range of distros), you already need to use NativeLibrary to select the appropriate native library.

With this proposal, in the case the user doesn't specify a rid, the library now also needs to use NativeLibrary because host will no longer do the distro rid-graph based selection.

And my previous comment was trying to point out that writing this NativeLibrary code is much harder with this proposal because of needing to handle two different cases, which I don't think I've seen acknowledged/understood on this PR yet.

@tmds
Copy link
Member

tmds commented Sep 7, 2022

And my previous comment was trying to point out that writing this NativeLibrary code is much harder with this proposal because of needing to handle two different cases, which I don't think I've seen acknowledged/understood on this PR yet.

It depends.

If you know the names of the libraries, you can pass them to NativeLibrary and it will probe the paths that have the native libraries. Then the code is the same.

If you want to iterate yourself the directories being searched to discover at runtime what native libraries are available (e.g. based on a naming pattern), then there is no way to get these directories. They could be exposed through a property on NativeLibrary for example.

@bording
Copy link

bording commented Sep 7, 2022

It depends.

If you know the names of the libraries, you can pass them to NativeLibrary and it will probe the paths that have the native libraries. Then the code is the same.

If you want to iterate yourself the directories being searched to discover at runtime what native libraries are available (e.g. based on a naming pattern), then there is no way to get these directories. They could be exposed through a property on NativeLibrary for example.

To make sure everyone is talking about the same thing, I want to be clear and describe the scenario. We are talking about a NuGet package that has a distro-specific native dependency.

Currently, the way to do that is to ship a package with as many distro-specific RIDs as needed. For example, the runtimes folder might look something like:

+---alpine-x64
|   \---native
|           libgit2-106a5f2.so
|
+---alpine.3.9-x64
|   \---native
|           libgit2-106a5f2.so
|
+---debian-arm64
|   \---native
|           libgit2-106a5f2.so
|
+---debian.9-x64
|   \---native
|           libgit2-106a5f2.so
|
+---fedora-x64
|   \---native
|           libgit2-106a5f2.so
|
+---linux-x64
|   \---native
|           libgit2-106a5f2.so
|
+---osx
|   \---native
|           libgit2-106a5f2.dylib
|
+---rhel-x64
|   \---native
|           libgit2-106a5f2.so
|
+---ubuntu.16.04-arm64
|   \---native
|           libgit2-106a5f2.so
|
+---ubuntu.18.04-x64
|   \---native
|           libgit2-106a5f2.so
|
+---win-x64
|   \---native
|           git2-106a5f2.dll
|           git2-106a5f2.pdb
|
\---win-x86
    \---native
            git2-106a5f2.dll
            git2-106a5f2.pdb

This is tricky to get right and does require ongoing work to ensure the binaries shipped are comprehensive. It also requires that the RID graph is maintained and accurate.


The proposed alternative here seems to be something like the following instead:

+---linux-x64
|   \---native
|       +---alpine-x64
|       |       libgit2-106a5f2.so
|       |
|       +---debian-x64
|       |       libgit2-106a5f2.so
|       |
|       +---fedora-x64
|       |       libgit2-106a5f2.so
|       |
|       +---rhel-x64
|       |       libgit2-106a5f2.so
|       |
|       \---ubuntu-x64
|               libgit2-106a5f2.so
|
+---osx
|   \---native
|           libgit2-106a5f2.dylib
|
+---win-x64
|   \---native
|           git2-106a5f2.dll
|           git2-106a5f2.pdb
|
\---win-x86
    \---native
            git2-106a5f2.dll
            git2-106a5f2.pdb

With this sort of package layout, it is now on the author to write code that can search inside the runtimes\linux-x64 folder and try to load each binary with NativeLibrary to find one that works. This sort of code would vaguely look something like this.

As things currently stand, I see some problems with this proposal, and it actually isn't possible to author a package like this with the way NuGet works right now.

Problems:

  1. Differences between portable and RID-specific publish output

When you build a project or publish it without a RID specified (dotnet publish), the publish output gets a copy of the package's entire runtimes folder. Code written to use NativeLibrary will need to expect this folder structure when searching for the correct Linux binary to load, for example subdirectories of runtimes\linux-x64\native.

However, as soon as you publish with a RID specified, the folder structure is collapsed and the contents of the native folder are now top-level in the publish output. Using the proposed folder structure above, with something like dotnet publish -r win-x64, that results in just the git2-106a5f2.dll file being in the publish output.

What happens with dotnet publish -r linux-x64? You might assume that it would copy the contents of the native folder and maintain the folder structure defined in the package.

If this was true, how are you supposed to author the NativeLibrary loading code that in one case needs to look for binaries in runtimes\linux-x64\native and in another instance needs to be looking for files\folders in the root directory? Is there some way at runtime to know if you're running from RID-specific published content or not? The only thing I can think of at the moment would be to check and see if a runtimes folder exists or not, but that also has an ongoing maintenance burden of having to maintain the list of distro folders might be in the application root to search.

However, NuGet does not actually seem to work this way currently.

  1. NuGet won't honor the proposed folder structure

When you publish with -r linux-x64, instead of just copying the contents of runtimes\linux-x64\native into the publish output, you get an error:

error NETSDK1152: Found multiple publish output files with the same relative path

It is complaining about seeing more than one copy of the binary as if it's trying to put all of them into the publish folder instead of maintaining the defined folder structure.

If you instead try to put the distro subfolders directly under runtimes\linux-x64 and not inside the native folder, that makes things worse. NuGet now completely ignores the linux-x64 folder and it doesn't show up inside publish output at all.

You could potentially work around the NETSDK1152 error by naming every single binary differently, but that's not always under your control. It also doesn't solve the NativeLibrary problem, and in some ways seems to make it worse.


Ultimately, my goal with all of this is to ensure that whatever change ends up happening is done with an awareness that work needs to be done to continue to support packages that need to have distro-specific native binaries. The burden can't be shifted entirely to package maintainers, and in fact the way things currently work make that nearly impossible.

@jkotas
Copy link
Member

jkotas commented Sep 7, 2022

The proposed alternative here seems to be something like the following instead:

The proposed alternative here is the following structure instead:

+---linux-x64
|   \---native
|       +---linux-musl-x64
|       |       libgit2-106a5f2.so
|       |
|       +---libgit2-106a5f2.so
|
+---osx
|   \---native
|           libgit2-106a5f2.dylib
|
+---win-x64
|   \---native
|           git2-106a5f2.dll
|           git2-106a5f2.pdb
|
\---win-x86
    \---native
            git2-106a5f2.dll
            git2-106a5f2.pdb

where libgit2-106a5f2.so is built in prescribed environment with specific glibc and musl versions so that the binary runs on all Linux versions supported by the .NET runtime that the nuget package is targeting. It is a direct equivalent of the approach used by Python to solve this problem: https://peps.python.org/pep-0600/ .

@bording
Copy link

bording commented Sep 7, 2022

@jkotas I don't see how that is meaningfully different from what I had in my post. And it doesn't solve any of the problems I described.

This isn't just about glibc vs musl. This is about any sort of native dependency that is different per distro. For example, OpenSSL.

@jkotas
Copy link
Member

jkotas commented Sep 7, 2022

The assumption in this proposal is that it is just about glibc vs. musl for the vast majority of native libraries bundled into packages. The Python solution is based on the same assumption.

For OpenSSL 2 vs. 3 problem, you can build one version of the .so that dynamically binds to OpenSSL2 or 3. It is what .NET runtime itself is doing. libgit2 mirrored that solution as well (https://github.com/libgit2/libgit2/blob/main/src/libgit2/streams/openssl_dynamic.c). Or you can build two version of the so like libgit2_openssl2.so and libgit2_openssl3.so and implement a loader that loads the right one at runtime.

If you have a complex library with a many different dependencies, you can do a custom loader that picks up the right flavor for given distro. Yes, it is complex. The assumption is that only a small fraction of packages would need to do something like this.

@tmds
Copy link
Member

tmds commented Sep 8, 2022

This code works both when publishing with or without a rid.

foreach (var nativeLibrary in new[] { "mylib-opensslv3.so", "mylib-opensslv2.so", "mylib-opensslv1.so" })
{
    if (NativeLibrary.TryLoad(nativeLibrary, typeof(SomeType).Assembly, DllImportSearchPath.ApplicationDirectory, out handle)
       break;
}

This sort of code would vaguely look something like this.

If there are use-cases like this, which require looking at the included native libraries, the directories could be made available through an API.

foreach (var nativeLibraryDir in NativeLibrary.NativeLibraryDirectories)
{
    foreach (var nativeLibrary in Directory.GetFiles(nativeLibraryDir, "mylib-*.so"))
    {
        if (NativeLibrary.TryLoad(nativeLibrary, typeof(SomeType).Assembly, DllImportSearchPath.ApplicationDirectory, out handle)
           break;
    }
}

The downsides to this design are:

  • NativeLibrary handling is now needed when there are multiple native libraries for a single portable rid.
  • Trimming on a non-portable rid is no longer possible.

The upside of the NativeLibrary handling is that libraries will work beyond the known rids of the runtime graph.

The assumption is that only a small fraction of packages would need to do something like this.

Are there some numbers that tell us how many packages on nuget.org have native libraries for non-portable rids?

@jkotas
Copy link
Member

jkotas commented Sep 8, 2022

Are there some numbers that tell us how many packages on nuget.org have native libraries for non-portable rids?

Yes, that would be useful data to have, together with the usage numbers.

I believe that people tend to stay away from packages with native dependencies on Linux today since they are broken too often. For example, we had to drop the dependency on LibGit2Sharp in source link (dotnet/sourcelink#288) and reimplement it in C#. One of the reasons was that the distro-specific libgit2 builds caused too many problems.

@trungnt2910
Copy link

This issue seems to be focused on relieving of the complexity of having to manage various Linux distros and their versions.

What about other OSes that are completely different from Linux in ways that are not only related to native library ABI, but also managed system library code?

dotnet/runtime#90695, which seems to be related to this issue, is currently breaking managed library builds for Haiku, since the official .NET SDK is not aware of Haiku and the build is now configured to ignore the runtimes.json in the repository.

Unlike different Linux distros like Ubuntu and Alpine, which can use the same IL code as any linux-x64 in most cases, Haiku is as different from Linux as FreeBSD is when it comes to managed libraries, so wouldn't it make sense to still allow new RIDs for new OS support, but not for new Linux distros?

@agocke
Copy link
Member

agocke commented Aug 20, 2023

The intent of this proposal is to move away from OS flavor- and version-specific targeting. This is most notable for Linux because there are so many versions and flavors, but it's equally true for Windows and Mac, where you will no longer be able to target versions individually.

However, different OSes are still classified as different portable RIDs. A portable RID is essentially <baseOS>[-optional-libc]-arch. Since Haiku is a different base OS, it would be eligible for its own. It would just take a community member or members willing to port the runtime to Haiku and maintain compatibility. Since it is not one of the officially supported OSes, Microsoft would not provide support directly. This is similar to the status of FreeBSD.

@trungnt2910
Copy link

trungnt2910 commented Aug 20, 2023

It would just take a community member or members willing to port the runtime to Haiku and maintain compatibility.

dotnet/runtime#86391 (comment)

It is being done here, but other members of dotnet are questioning whether a RID for Haiku should be added.

Since Haiku is a different base OS, it would be eligible for its own.

This means unlike what dotnet/runtime#90695 states, the list of RIDs should not be frozen for good, it should just be less volatile since version numbers and distro flavors are not being updated anymore?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.