Refactor module loading v2: move writing of module_config.bin to bootstrap #425

dale-wahl · 2024-04-16T10:47:51Z

I was looking over PR #396, updated it from the master, but ran into some issues. The "do not lazy load" commit (71ba788) causes failures since self.modules is often None. When I dove a bit deeper I became a little unsatisfied with how often ModuleCollector was being initialized and doing all that work. Before we would import the same instantiated object from backends instead of passing it around. I could do some more work here, but I think this PR's method may accomplish the same thing in a more simple fashion.

This PR only writes the module_config.bin in bootstrap and then updating the shared config object.

Both this solution and the other PR mean that the frontend will only have the latest data from module_config.bin if it is started after the backend. I do not believe this is a large problem as module_config.bin is only modified if code is updated (necessitating a restart of both anyway), but worth noting for 4CAT instances not managed by Docker as order becomes important.

As you are well aware, both PRs have an oddity in that all module classes are imported with a version of config that does not have the module configurations. The only place I noticed an effect was in ModuleCollector.expand_datasources() which calls get_options(). That method does often use config, but in expand_datasources we only care about any options returning so there seems to be no issue. I note it in case something happens down the road related to this.

In spirit, at least

dale-wahl · 2025-03-04T10:54:17Z

This better: #455

This is a more radical version of #425 that makes some significant changes to how configuration contexts are used and handled. TL;DR, in order of preference: * Use `self.config` when in an object context to read settings * Use the method argument `config` when in a constructor, or static or class method, to read settings * Use the `config` imported from `webtool` when in a Flask view to read settings * Use the global `config` variable to read settings otherwise. In principle this should only occur in two places in the code, `bootstrap.py` and Flask's `__init__.py` (there are currently a few other places where relying on injection would be too inconvenient) * Do not pass around user objects. Instead pass around user-aware `ConfigWrapper`s. Where you need to use `User` objects, pass a `config` as argument to their instructor so that they will read from their own configuration context (if desired). The long version: * Eliminate the passing of user objects as much as possible in favour of heavier usage of `ConfigWrapper`. This means that processors no longer need to be aware of users (with very few exceptions) - I think this is good because users are a front-end concept, not one the back-end should need to care about. Users were only ever really needed in the back-end to make sure the right configuration values were read. * `ConfigWrapper` is used to create a configuration manager (reader) that reads configuration values taking into account a certain user and optionally a Flask request. This is then used to make sure that the right configuration tags are applied. The wrapping can happen in low-level parts of the code such as the main worker class, and the result can just be passed on to higher level functions such as processor `process()` etc. As long as a processor reads from `self.config` the right values should always be read. * Because `get_options()`, `validate_query()` and `is_compatible_with()` are class methods they cannot access `self.config`. Instead they now have an argument `config` (replacing `user`) to which a reader or wrapped reader is passed. They then read from that for e.g. checking whether an API key is present or how many items can be processed. * Instead of each set of views defining their own `config` as before, this is now done in the main Flask app, so that the front-end only ever instantiates a single (wrapped) configuration reader. * In the backend, the configuration reader is instantiated as early as in `bootstrap.py`, and then passed all the way down to the individual modules where it is wrapped as needed. Thus in the backend there is also only a single reader that is wrapped in various ways as needed. * `ModuleCollector` now also gets its config reader via injection rather than instantiating its own, to avoid some config managers being aware of module-defined settings and others not (this was the original goal of #425). * If a processor ever still needs to access the user object, it is available via the `ConfigWrapper().user` property (which only exists for wrapped readers). This should be avoided but is necessary in some places e.g. the Telegram search worker which needs to behave differently when used by an anonymous user.

move writing of module_config.bin to bootstrap

cebc775

dale-wahl requested a review from stijn-uva April 16, 2024 10:47

stijn-uva added a commit that referenced this pull request Sep 19, 2024

Implement #425

bff0a4d

In spirit, at least

stijn-uva mentioned this pull request Sep 19, 2024

Refactor module loader #455

Merged

dale-wahl closed this Mar 4, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Refactor module loading v2: move writing of module_config.bin to bootstrap #425

Refactor module loading v2: move writing of module_config.bin to bootstrap #425

Uh oh!

dale-wahl commented Apr 16, 2024

Uh oh!

dale-wahl commented Mar 4, 2025

Uh oh!

Uh oh!

Refactor module loading v2: move writing of module_config.bin to bootstrap #425

Refactor module loading v2: move writing of module_config.bin to bootstrap #425

Uh oh!

Conversation

dale-wahl commented Apr 16, 2024

Uh oh!

dale-wahl commented Mar 4, 2025

Uh oh!

Uh oh!