Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: Ability to update pipelines provisioned by config files #1875

Open
nickchomey opened this issue Oct 2, 2024 · 8 comments
Open

Feature: Ability to update pipelines provisioned by config files #1875

nickchomey opened this issue Oct 2, 2024 · 8 comments
Labels
feature New feature or request

Comments

@nickchomey
Copy link

nickchomey commented Oct 2, 2024

Feature description

It is much easier/faster to define a pipeline file config files than to do so via the Web UI (too many clicks, and various bugs like buttons sporadically not working) or HTTP API (would require setting up scripts to run all the http requests and their varied payloads etc...). However, file configurations are immutable, so you have to restart the Conduit server in order to change them.

It would be great if we could update the config file and the changes would get reflected immediately.

I dont know what limitations there are with this compared to the HTTP API, but perhaps fsnotify could be used to monitor the pipelines and maybe even the connectors directories?

@nickchomey nickchomey added feature New feature or request triage Needs to be triaged labels Oct 2, 2024
@github-project-automation github-project-automation bot moved this to Triage in Conduit Main Oct 2, 2024
@nickchomey
Copy link
Author

Also, I've just learned that the web UI was deprecated. All the more reason to allow for something like this!

@lovromazgon
Copy link
Member

This feature would be great to have, but we currently have higher priority items on our roadmap. Although, if you wanted to add this feature we are happy to assist you with a review!

@lovromazgon lovromazgon removed the triage Needs to be triaged label Oct 7, 2024
@nickchomey
Copy link
Author

Right now its more of an annoyance than anything. I'll let you know if it becomes a sufficiently large pain point for me to try to implement this feature!

@gedw99
Copy link

gedw99 commented Oct 11, 2024

I really think this feature will eventually be needed.

but it will perhaps need to be more like a reconciler design ?

so you have a before and after and a diff.

then you need to do the reconcile.

And that is the part that stops processes snd starts processes , with config adjusted accordingly.

So my point is that to achieve being able to change config with automatic restart , you might have to account for the config diff in order to not break the system.

kind of how with a db , we have to write a db migration script. Does conduit need a similar thing ?

curious if other people see it this way or not though ? Happy to be wrong about this because reconciliation is nasty :)

@nickchomey
Copy link
Author

My thought was that it would just watch the config files and when a change is detected, reload the config file as if it was a fresh startup of Conduit - no diffing necessary.

Surely this can be done without restarting conduit altogether - it could probably leverage whatever is used by the API to do stop/start/etc...

What do you figure diffing would be needed for?

@gedw99
Copy link

gedw99 commented Oct 15, 2024

In a production deployment you will sometimes need the configs in git to be migrated to the config in production .

doing a diff and reconciliation is a pretty standard way to do it .

In the mono epi we will try out different strategies to see what we are up against first though and iterate deployments from dev to prod, and find out .

@nickchomey
Copy link
Author

nickchomey commented Oct 16, 2024

Again, while a diff might be standard, is it necessary? Do you see any downsides to what I've proposed above? It seems quite easy to implement.

Moreover, it's not even evident that any zero downtime diffing is even possible within conduit - it may very well just need a restart, as I've proposed. In that case, is there any reason to do some complicated diff vs just a full reload of the pipeline config?

@lovromazgon
Copy link
Member

The way we treat this issue when Conduit starts is to do a diff between the pipeline Conduit finds in its store (e.g. badger DB) and the pipeline in the config file. The assumption is that the config file is the source of truth, so if an entity (source, destination, processor, pipeline) can't be found in the file we delete it, if a new entity is found we create it, if an existing one is changed we update it. We match entities using the IDs, so let's say changing the ID of a source in the pipeline config file will result in the existing source being deleted and a new one being created instead, consequently the pipeline will start from scratch because the position state will be removed together with the source (as explained here).

That's the same behavior I'd expect to see if we implement hot-reloads of pipeline config files. In my opinion, Conduit shouldn't bother with a diff or user guided reconciliation, that sounds like something that can be handled separately by establishing proper deploy procedures, using git and separate deploy environments to ensure a config works correctly before it hits production. We need to see Conduit as the low-level tool that it is, focused on moving data. So I'm more for detecting file changes and treating them as the new source of truth, Conduit then stops the existing pipeline, applies the changes and restarts the pipeline.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants