Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[question] Many conan packages in one code repository #17194

Open
1 task done
mmmfarrell opened this issue Oct 21, 2024 · 8 comments
Open
1 task done

[question] Many conan packages in one code repository #17194

mmmfarrell opened this issue Oct 21, 2024 · 8 comments
Assignees

Comments

@mmmfarrell
Copy link

What is your question?

Currently our company uses 1 repo per conan package. However, this has led to a lot of pain in a several scenarios including: developing a change across several packages, testing a change in downstream packages, and integrating changes downstream. We believe many of our pain points would be resolved by putting many of our conan packages into a single repo. We'd then build out local develop scripts and CI scripts to allow for developers to quickly make changes across many packages and build and test them.

Now this sounds a lot like the intention of the Conan 1 workspace tool, so maybe your answer will be "wait for conan2 workspace". Is there a development / release timeline for workspace in conan2?

But more generally, has anyone attempted building out such a monorepo and all of the scripting required for local development and for CI? If so, any lessons learned that I should be aware of?

I'm early on in the prototyping phase and am happy with a lot of how its shaping up, but running into a few issues around ensuring that multiple developers can submit and merge PRs in parallel for multiple changes in the same conan package. One specific issue is that if packages depend on specific versions of the other packages from the same repo, this would lead to a linear merge queue due to merge conflicts caused by version bumps every time someone edits anything in a package. The alternative seems to be all packages depend on latest of other packages in the same repo. However, this necessitates usage of lockfiles to prevent the ground from shifting under a develop when a new PR merges and therefore a new latest is pushed to the conan remote. Lockfiles, however, seem to lead to a linear merge queue due to merge conflicts caused by changes every time two developers edit the same package (essentially the same problem as depending on version numbers).

My goal is to get to a monorepo composed of N conan packages, where multiple developers can be developing and merging PRs for the same package(s) at the same time. Any insights to help me get here would be appreciated!

Have you read the CONTRIBUTING guide?

  • I've read the CONTRIBUTING guide
@memsharded memsharded self-assigned this Oct 21, 2024
@memsharded
Copy link
Member

Hi @mmmfarrell

Thanks for your question.

I'd like to start first saying that unfortunately there are no silver bullets. All approaches have pros and cons, and the mono-repo like approach has some advantages, but it also has its own challenges.

I'm early on in the prototyping phase and am happy with a lot of how its shaping up, but running into a few issues around ensuring that multiple developers can submit and merge PRs in parallel for multiple changes in the same conan package. One specific issue is that if packages depend on specific versions of the other packages from the same repo, this would lead to a linear merge queue due to merge conflicts caused by version bumps every time someone edits anything in a package. The alternative seems to be all packages depend on latest of other packages in the same repo. However, this necessitates usage of lockfiles to prevent the ground from shifting under a develop when a new PR merges and therefore a new latest is pushed to the conan remote. Lockfiles, however, seem to lead to a linear merge queue due to merge conflicts caused by changes every time two developers edit the same package (essentially the same problem as depending on version numbers).

This would be what we call the problem of CI at scale. Something actually a bit beyond of what a package manager is, but due to the demand, Conan 2 incorporated some better tools for this. And the good news is that we are finalizing the first step of a new "CI tutorial" that might help with this. Please check conan-io/docs#3799, even if it is still a PR, it might be good if you can read it.

About the workspace feature, we have already resumed work on it. Our goal is to have it released (maybe not fully complete, but at least the core functionality) before the end of the year.

My goal is to get to a monorepo composed of N conan packages, where multiple developers can be developing and merging PRs for the same package(s) at the same time. Any insights to help me get here would be appreciated!

This would mean the mono-repo approach, which is essence is a bit opposed to the package-based development paradigm. If you put everything in a mono-repo, then it would be great to learn and understand what are the pains or problems that you would be trying to solve using a package manager.

@mmmfarrell
Copy link
Author

Thanks @memsharded

I really appreciate the quick response and link to the docs PR, I'll read through it today and post any questions/comments to the PR directly.

Agreed that my question is mostly about development and CI at scale and not necessarily in scope for a package manager, but definitely helpful when example end-to-end workflows are documented to help out others that will need to implement similar workflows. I'm glad to see conan is getting more opinionated (or at least documented) in these areas.

This would mean the mono-repo approach, which is essence is a bit opposed to the package-based development paradigm. If you put everything in a mono-repo, then it would be great to learn and understand what are the pains or problems that you would be trying to solve using a package manager

This is a great point and something I should have called out in my original post. For this mono-repo approach, I also believe the "right answer" is to drop all of the subpackages and instead only ship the entirety of the mono-repo as a single conan package. However, given our current division into N repos + packages, build times would be too long if simply pulling them all together (a different problem to solve) and downstream users are used to only pulling in their N, small conan package dependencies instead of one large conan dependency. I am also trying to minimally disrupt current workflows. So in the mono-repo, conan would effectively be used as a build cache locally and to provide granular packages for downstream users to depend on. Obviously not what conan was designed for, but meant to provide an incremental move towards a more traditional mono-repo set up.

@memsharded
Copy link
Member

Thanks for your feedback.

That makes sense, it is a valid aim and problem to solve. Indeed the workspace feature can help with this, but there is still some large challenges regarding the build system, I guess that you are using or planning to use CMake? Because it has no way of representing simultaneously a single project build tree and using find_package() way to define dependencies, and that is probably the biggest challenge for the workspace.

That means that making Conan act as a build cache for local development within a mono-repo is not possible, even with a fully finalized workspace feature. It would only be possible for external dependencies to that mono-repo.

So probably I'd try to move somehow with the following guidelines:

  • Try to find a balance, not everything needs to go into the mono repo, not everything needs to be an isolated package.
  • A hybrid solution, detecting some "clusters" of highly connected components and making them a single package, and leaving independent packages for other more decoupled dependencies
  • For the clusters, it is probably better to start with a single package for the whole cluster, and avoid over-linking by using components. There will be some cost in build performance (having to sometimes download and unzip more artifacts than strictly necessary), but it might be a good trade-off because of the simplicity of maintaining the recipes, creating the packages, etc.
  • Work on the modularity and componentization of the project when applicable. Sometimes it is a matter of having more clear and defined interfaces of a given package with its unit-tests, so it can be really developed as an independent unit. The most common cause for having to work on several packages simultaneously is the lack of clear interface and trusting the tests to cover for the functionality under development, then relying on the consumers of the package to act as "tests" of that package. Sometimes there are great benefits to be able to focus on just one package to debug, fix things, work on features, etc. Not always, but considering a "package" a "unit of development" is good to avoid a too large increase in cognitive overheads of having to work on a much larger piece of the whole project.

I'll annotate this ticket also to let you know when the workspace feature starts to be released.

@mmmfarrell
Copy link
Author

@memsharded very helpful feedback!

Because it has no way of representing simultaneously a single project build tree and using find_package() way to define dependencies, and that is probably the biggest challenge for the workspace.
That means that making Conan act as a build cache for local development within a mono-repo is not possible, even with a fully finalized workspace feature. It would only be possible for external dependencies to that mono-repo.

I should dive into the details to understand better, but how does conan editable mode work? Is it not effectively doing this?

The prototype monorepo I've created to date has build scripts that look at which package you're trying to build and which packages have changes since their version in the conan.lock file (cached build), then sets the necessary packages into conan editable mode based on the build graph. This has been working pretty well locally, so curious how this is different than what you describe.

@memsharded
Copy link
Member

I should dive into the details to understand better, but how does conan editable mode work? Is it not effectively doing this?

Conan editable is mostly re-defining the location of a given package to use it from the user folder location instead of using it from the Conan cache. It relies on a correct definition of the "local" folder layout in the layout() method to describe where the headers and compiled libraries are in the user local folder.

But the "consumer" code would still be using find_package() to locate the dependency. This is quite incompatible with a CMake project with subfolders and add_subdirectory(), which is quite necessary to have a good single-project development experience. You can see an example in https://www.youtube.com/live/VzUJQw89U7o?si=0XVUDMmPm2CSzwGY&t=2773 how it is possible to define a single VS project by adding sub-projects with a couple of clicks, but it is still a manual process, and for very specific tools and setup.

The prototype monorepo I've created to date has build scripts that look at which package you're trying to build and which packages have changes since their version in the conan.lock file (cached build), then sets the necessary packages into conan editable mode based on the build graph. This has been working pretty well locally, so curious how this is different than what you describe.

Interesting. That sounds good, and I think this is definitely within the scope of the new workspace feature, that will have ways to allow users to define which packages are in editable mode. If you are interested this is the ticket you want to track, hopefully we will start releasing some initial experimental features: #15992

@Artalus
Copy link

Artalus commented Oct 22, 2024

My previous company resorted to monorepo as well, but that was waaay before I introduced Conan ni there - so there we mostly were using it for 3rdparty deps, and then relying on regular CMake calls.
My current company has collapsed into a mono-repo after Conan usage was already established, I believe (happened right before me joining). Here we use conan editable add to line down every single package present in the monorepo, and then use regular conan install + cmake --build on the topmost "products" - or on every transient package of ours. The CMakeLists are full of find_package() calls both to locate our own packages and 3rdparty deps; all paths are provided from Conan. When doing a clean rebuild for a product you do have to build its "in-house" dependencies from source - but since those are in editable mode, their build directories are preserved, which speeds up further rebuilds; and you are free to edit their sources as part of your changes on the topmost project.

The trick is that we use versions only for said topmost products, and everything else uses 0.0.1 as a version (could've used any other string like current instead, I believe). So you would have a forest of dependencies like product_foo/1.2.3 -> somewrapper/0.0.1 -> stringutils/0.0.1 , product_bar/2.3.4 -> otherwrapper/0.0.1, product_baz -> timeutils/0.0.1. I really am not sure how the scheme would work if we'd needed to release and maintain versioning for any of the transitive packages.


Also, just to clarify - do you intend to eventually start releasing actual Conan packages? Or just use Conan as a convenience tool to get binaries in one place? Asking since in both of my cases there was never a point for us to do that, so your workflows might differ. In the old place, the final product was just a hardware running our middleware. In the current place, we ship the cmake --installed libraries and tools - but they get incapsulated in other wrappers down the "pipeline", before the consumers can use them.

@mmmfarrell
Copy link
Author

@Artalus thank you for sharing!

When doing a clean rebuild for a product you do have to build its "in-house" dependencies from source - but since those are in editable mode, their build directories are preserved, which speeds up further rebuilds; and you are free to edit their sources as part of your changes on the topmost project.

Have you tried using ccache or any other caching tool to speed up these builds? For the size of my project, if someone were to do a full clean rebuild (delete build directories for all packages) and force everything to recompile via editable mode, it'd take way too long. I'm trying to resolve this issue in a few ways: 1) using conan cache + lock file for all of the packages (even the transitive packages as you call it) 2) build script selectively puts packages into editable mode vs pulling from conan cache based on the files that have been changed + the conan graph 3) ccache.

Based on what you described, it sounds like the transitive packages are never pushed to / pulled from the conan cache. Is that accurate? Otherwise how do you prevent the ground from moving under a developer when someone merges new changes, a new stringutils/0.0.1 gets pushed, but the developer did not rebase or rebranch off of master?

I really am not sure how the scheme would work if we'd needed to release and maintain versioning for any of the transitive packages.

At the moment, I do need to release and maintain versions for the transitive dependencies. I'm not too nervous about maintaining versions b/c we want to maintain versions together in that we care about sets of all of the dependencies that are compatible and tested against one another. So I still think we'd only require N branches (not N * M for each package).

But I definitely agree that versioning the transitive packages is the tricky part. Ideally every change is a version bump, or a new current/latest. But allowing for parallel development and merging is what is difficult. I have some ideas here that I'm prototyping, but nothing solid yet. The short is to have two branches (a la git flow): developers merge their changes to develop in parallel, automated CI pipeline linearly merges develop into master/main. I can report back here if my prototype works out.

Also, just to clarify - do you intend to eventually start releasing actual Conan packages? Or just use Conan as a convenience tool to get binaries in one place?

We will be releasing the conan packages for internal ingestion by other groups/teams across the company. To get to the final products, there are additional layers that get the dependencies from the conan packages and then either repackage the software libraries and tools for external use or build / deploy debians to run on hardware.

@memsharded
Copy link
Member

But I definitely agree that versioning the transitive packages is the tricky part. Ideally every change is a version bump, or a new current/latest. But allowing for parallel development and merging is what is difficult. I have some ideas here that I'm prototyping, but nothing solid yet. The short is to have two branches (a la git flow): developers merge their changes to develop

One of the points of the recipe-revisions is that they do not collide (are not explicit changes in the recipe), and they are "lazy ordered", that is, they are not ordered at the time of creation, they are ordered in the server at the time of upload to the server. This helps a bit for the case of concurrent PRs to the same package.

there are additional layers that get the dependencies from the conan packages and then either repackage the software libraries and tools for external use or build / deploy debians to run on hardware.

Recall that there is now the vendor=True feature that can help for distribution of re-packaged things.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants