Skip to content

Update config module#59

Merged
jonavellecuerdo merged 2 commits intomainfrom
IN-1679-update-config-module
Mar 18, 2026
Merged

Update config module#59
jonavellecuerdo merged 2 commits intomainfrom
IN-1679-update-config-module

Conversation

@jonavellecuerdo
Copy link
Contributor

@jonavellecuerdo jonavellecuerdo commented Mar 17, 2026

Purpose and background context

Update the config module to align with some of our current conventions for instantiation, env var access, and logging controls. Though we have not standardized our approaches to managing app configurations, the following properties/features were implemented for DSS:

  • Use a WARNING_ONLY_LOGGERS environment variable to minimize “noise” for select third-party libraries (e.g., [botocore, boto3, smart_open, urllib3]).
  • DSS CLI supports a verbose flag.
  • Config uses property methods to set defaults for optional environment variables and raise OSError for missing required environment variables.
  • Methods configure_logger and configure_sentry are defined in the config module (but not as methods of the Config class itself).

How can a reviewer manually see the effects of these changes?

DSS was run in Dev using a sample OpenCourseWare item submission and a test collection in DSpace 8.

  1. View CloudWatch log stream for a successful ingest: https://mit-dspace.eks.prod.4science.cloud/handle/1721.1/164334
    Note: Ignore the very first logged message with "VERBOSE: True"; was used for testing and has been removed.
  2. View CloudWatch log stream for a failed item submission
    ✨ We can now see the logged exceptions from dspace-rest-python more easily
    Note: Recreated the error from IN-1687 by temporarily removing the DSS E-Person from the "Administrator" group in the DSpace 8 instance prior to running DSS.

Includes new or updated dependencies?

YES

Changes expectations for external applications?

YES; To see logs at DEBUG level, users and calling apps must set --verbose / -v CLI option.

What are the relevant tickets?

Code review

  • Code review best practices are documented here and you are encouraged to have a constructive dialogue with your reviewers about their preferences and expectations.

@jonavellecuerdo jonavellecuerdo force-pushed the IN-1679-update-config-module branch 3 times, most recently from a3b937b to 8e09ec6 Compare March 17, 2026 15:48
@jonavellecuerdo jonavellecuerdo marked this pull request as ready for review March 17, 2026 15:49
@jonavellecuerdo jonavellecuerdo requested a review from a team as a code owner March 17, 2026 15:49
@ehanson8 ehanson8 self-assigned this Mar 17, 2026
Copy link
Contributor

@ehanson8 ehanson8 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work! A few questions and suggestions

Comment on lines -24 to +40
"--queue", default=CONFIG.INPUT_QUEUE, help="Name of queue to process messages from"
"--queue", envvar="INPUT_QUEUE", help="Name of queue to process messages from"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why change this from the Config attribute as the default? I thought our general practice was to only read env var in the Config class

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've seen envvar used more frequently in our more current applications, and this is the syntax for it: https://click.palletsprojects.com/en/stable/options/#values-from-environment-variables (i.e., it accepts a string).

The order of precedence in Click is:

  • A value provided on the command line will override an environment variable or default value.
  • An environment variable will override the default value defined in the option decorator.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels weird to me that we have all the logic in the Config module for processing env var and what to do if they're not there but then we're accessing directly through the CLI. Maybe this is a topic better reserved for a larger Config discussion but I would also appreciate @ghukill thoughts on this before proceeding

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For completeness, we could set:

envvar="INPUT_QUEUE"
default=Config.input_queue

Where envvar is insufficient is if you have to do something with the environment variable, like splitting by a delimiter.

Comment on lines -37 to +53
default=CONFIG.INPUT_QUEUE,
envvar="INPUT_QUEUE",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question as above

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See response #59 (comment)!

def dss_dspace_credentials(self) -> str:
value = os.getenv("DSS_DSPACE_CREDENTIALS")
if not value:
raise OSError("Env var 'DSS_DSPACE_CREDENTIALS' must be defined")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about a check_required_env_vars method like we have in DSC to eliminate some of these checks in the properties?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I'd like to hold off on defining additional methods on the Config class until DataEng can have a more focused discussion on how we can standardize the format of this module (i.e., what methods to include, how to check required env vars).

In the case of development with DSC, I found it a hindrance that check_required_env_vars is called whenever you run the CLI, requiring the user to define all env vars even if only for local development.
In the latest version of python-lambdas-template, the method is defined, but it is not called when executed.

TLDR: Given that this works for now, I propose we move forward as is and discuss a better way to handle environment-specific required env var checking!

Copy link
Contributor

@ehanson8 ehanson8 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can discuss a more standardized approach to the Config class soon so this is fine for now!

Why these changes are being introduced:
* Prior to these changes, it was difficult to debug errors for
failing ingests due to the setup of the `Config` class and how
it configured logging.

How this addresses that need:
* Use a WARNING_ONLY_LOGGERS environment variable to minimize “noise”
for select third-party libraries (e.g., [botocore, boto3, smart_open, urllib3]).
* Support 'verbose' CLI option
* Access env vars on Config via property methods
* Update how/when Config is instantiated and logger is configured

Side effects of this change:
* This deprecates the 'LOG_FILTER' and 'LOG_LEVEL' environment
variables for DSS, so the ECS Task Definition should also be updated.

Relevant ticket(s):
https://mitlibraries.atlassian.net/browse/IN-1679
@jonavellecuerdo jonavellecuerdo force-pushed the IN-1679-update-config-module branch from f0db14d to 51725aa Compare March 18, 2026 13:19
@jonavellecuerdo jonavellecuerdo merged commit 30fe56e into main Mar 18, 2026
4 checks passed
@jonavellecuerdo jonavellecuerdo deleted the IN-1679-update-config-module branch March 18, 2026 13:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants