Skip to content

Refactor module loading v2: move writing of module_config.bin to bootstrap #425

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

dale-wahl
Copy link
Member

I was looking over PR #396, updated it from the master, but ran into some issues. The "do not lazy load" commit (71ba788) causes failures since self.modules is often None. When I dove a bit deeper I became a little unsatisfied with how often ModuleCollector was being initialized and doing all that work. Before we would import the same instantiated object from backends instead of passing it around. I could do some more work here, but I think this PR's method may accomplish the same thing in a more simple fashion.

This PR only writes the module_config.bin in bootstrap and then updating the shared config object.

Both this solution and the other PR mean that the frontend will only have the latest data from module_config.bin if it is started after the backend. I do not believe this is a large problem as module_config.bin is only modified if code is updated (necessitating a restart of both anyway), but worth noting for 4CAT instances not managed by Docker as order becomes important.

As you are well aware, both PRs have an oddity in that all module classes are imported with a version of config that does not have the module configurations. The only place I noticed an effect was in ModuleCollector.expand_datasources() which calls get_options(). That method does often use config, but in expand_datasources we only care about any options returning so there seems to be no issue. I note it in case something happens down the road related to this.

@dale-wahl dale-wahl requested a review from stijn-uva April 16, 2024 10:47
stijn-uva added a commit that referenced this pull request Sep 19, 2024
In spirit, at least
@stijn-uva stijn-uva mentioned this pull request Sep 19, 2024
@dale-wahl
Copy link
Member Author

This better: #455

@dale-wahl dale-wahl closed this Mar 4, 2025
stijn-uva added a commit that referenced this pull request Jun 23, 2025
This is a more radical version of #425 that makes some significant changes to how configuration contexts are used and handled. TL;DR, in order of preference:

* Use `self.config` when in an object context to read settings
* Use the method argument `config` when in a constructor, or static or class method, to read settings
* Use the `config` imported from `webtool` when in a Flask view to read settings
* Use the global `config` variable to read settings otherwise. In principle this should only occur in two places in the code, `bootstrap.py` and Flask's `__init__.py` (there are currently a few other places where relying on injection would be too inconvenient)
* Do not pass around user objects. Instead pass around user-aware `ConfigWrapper`s. Where you need to use `User` objects, pass a `config` as argument to their instructor so that they will read from their own configuration context (if desired). 

The long version:

* Eliminate the passing of user objects as much as possible in favour of heavier usage of `ConfigWrapper`. This means that processors no longer need to be aware of users (with very few exceptions) - I think this is good because users are a front-end concept, not one the back-end should need to care about. Users were only ever really needed in the back-end to make sure the right configuration values were read.
* `ConfigWrapper` is used to create a configuration manager (reader) that reads configuration values taking into account a certain user and optionally a Flask request. This is then used to make sure that the right configuration tags are applied. The wrapping can happen in low-level parts of the code such as the main worker class, and the result can just be passed on to higher level functions such as processor `process()` etc. As long as a processor reads from `self.config` the right values should always be read.
* Because `get_options()`, `validate_query()` and `is_compatible_with()` are class methods they cannot access `self.config`. Instead they now have an argument `config` (replacing `user`) to which a reader or wrapped reader is passed. They then read from that for e.g. checking whether an API key is present or how many items can be processed.
* Instead of each set of views defining their own `config` as before, this is now done in the main Flask app, so that the front-end only ever instantiates a single (wrapped) configuration reader.
* In the backend, the configuration reader is instantiated as early as in `bootstrap.py`, and then passed all the way down to the individual modules where it is wrapped as needed. Thus in the backend there is also only a single reader that is wrapped in various ways as needed.
* `ModuleCollector` now also gets its config reader via injection rather than instantiating its own, to avoid some config managers being aware of module-defined settings and others not (this was the original goal of #425).
* If a processor ever still needs to access the user object, it is available via the `ConfigWrapper().user` property (which only exists for wrapped readers). This should be avoided but is necessary in some places e.g. the Telegram search worker which needs to behave differently when used by an anonymous user.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant