Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix NetCDF Authorization failure #89

Closed
eyvorchuk opened this issue Jun 30, 2021 · 16 comments
Closed

Fix NetCDF Authorization failure #89

eyvorchuk opened this issue Jun 30, 2021 · 16 comments
Labels
bug Something isn't working

Comments

@eyvorchuk
Copy link
Contributor

eyvorchuk commented Jun 30, 2021

When running the wps_full_rvic_demo.ipynb notebook with localhost as the target url, having a long time range (longer than 2-3 years) causes the following issue at some point during the convolution part: owslib.wps.WPSException : {'code': 'NoApplicableCode', 'locator': 'None', 'text': 'Process error: RuntimeError: NetCDF: Authorization failure'}.

Steps to reproduce the behavior:

  1. Start an instance of osprey on one terminal using osprey start.
  2. Start an instance of jupyter on another terminal using jupyter lab.
  3. In the wps_full_rvic_demo.ipynb notebook, go to cell 2 and change get_target_url("osprey") to "http://localhost:5000/wps"
  4. Go to cell 5 and change the run_startdate to 2010-01-01-00 (or a date before that)
  5. Run cells 1-5 and observe the output on the terminal with the osprey instance. At some point (not consistent when), the aforementioned error should occur and the following traceback should be seen in the osprey terminal:
syntax error, unexpected WORD_WORD, expecting SCAN_ATTR or SCAN_DATASET or SCAN_ERROR
context: <?xml^ version="1.0" encoding="utf-8"?><ExceptionReport version="1.0.0" xmlns="http://www.opengis.net/ows/1.1" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.opengis.net/ows/1.1 http://schemas.opengis.net/ows/1.1.0/owsExceptionReport.xsd"> <Exception exceptionCode="NoApplicableCode" locator="AccessForbidden"> <ExceptionText>Not authorized to access this resource. Missing user authentication.</ExceptionText> </Exception></ExceptionReport>
INFO:close_logger>> Exited safely
Traceback (most recent call last):
  File "/storage/home/eyvorchuk/osprey/osprey/processes/wps_full_rvic.py", line 199, in _handler
    convolution(convolve_config)
  File "/tmp/osprey-venv/lib64/python3.6/site-packages/rvic/convolution.py", line 60, in convolution
    hist_tapes, data_model, rout_var, time_handle, directories
  File "/tmp/osprey-venv/lib64/python3.6/site-packages/rvic/convolution.py", line 337, in convolution_run
    runin = data_model.read(timestamp)
  File "/tmp/osprey-venv/lib64/python3.6/site-packages/rvic/core/read_forcing.py", line 279, in read
    self.current_fhdl.variables[self.time_fld][self.current_tind],
  File "src/netCDF4/_netCDF4.pyx", line 4406, in netCDF4._netCDF4.Variable.__getitem__
  File "src/netCDF4/_netCDF4.pyx", line 5350, in netCDF4._netCDF4.Variable._get
  File "src/netCDF4/_netCDF4.pyx", line 1927, in netCDF4._netCDF4._ensure_nc_success
RuntimeError: NetCDF: Authorization failure

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/tmp/osprey-venv/lib64/python3.6/site-packages/pywps/app/Process.py", line 250, in _run_process
    self.handler(wps_request, wps_response)  # the user must update the wps_response.
  File "/storage/home/eyvorchuk/osprey/osprey/processes/wps_full_rvic.py", line 201, in _handler
    raise ProcessError(f"{type(e).__name__}: {e}")
pywps.app.exceptions.ProcessError: RuntimeError: NetCDF: Authorization failure
2021-07-16 11:24:02 ERROR: osprey: Process error: method=wps_full_rvic.py._handler, line=201, msg=RuntimeError: NetCDF: Authorization failure

Ideally, the process should be able to run for a long time range with no errors. Since this error does not occur when running RVIC normally, it appears to be an issue within osprey.

@eyvorchuk eyvorchuk added the bug Something isn't working label Jun 30, 2021
@tlvu
Copy link

tlvu commented Jul 30, 2021

RuntimeError: NetCDF: Authorization failure

This looks like our intermittent failure since we upgraded Magpie, see this very similar error Ouranosinc/Magpie#433 (comment)

I would suggest a work-around for the moment to bypass all the front proxies (Nginx, Twitcher) and hit Thredds directly at http://host:8083/<SAME_PATH_AFTER> (port taken from https://github.com/bird-house/birdhouse-deploy/blob/c5f45c9d0c2f450379c874ad435f466284518819/birdhouse/docker-compose.yml#L256)

Quick explanation is since this is intermittent, bigger is your date range, probably you'll access more NetCDF files and more chance one of your access will blow up.

The good news is a fix has been found in PR bird-house/birdhouse-deploy#182 and probably be merged next week when Francis from CRIM is back from vacation.

@tlvu
Copy link

tlvu commented Aug 2, 2021

The PR that should theoretically fix this error has been merged bird-house/birdhouse-deploy#182.

Note the new Magpie has a new unique email constraint that is not backward compatible. Run this query bird-house/birdhouse-deploy#182 (comment) to find all your users email and change any duplicated email before the upgrade. Always good to back /data/magpie_persist before the upgrade if you ever want to rollback.

@nikola-rados
Copy link
Contributor

Thank you for the suggestion @tlvu! I have applied the changes made to birdhouse-deploy but am running into some trouble with magpie:

[SQL: ALTER TABLE users ADD CONSTRAINT uq_users_email UNIQUE (email)]
(Background on this error at: http://sqlalche.me/e/13/gkpj)
[2021-08-03 18:48:01,680] ERROR      [MainThread][magpie.db] Database not ready
Traceback (most recent call last):
  File "/usr/local/bin/pserve", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.7/site-packages/pyramid/scripts/pserve.py", line 34, in main
    return command.run()
  File "/usr/local/lib/python3.7/site-packages/pyramid/scripts/pserve.py", line 275, in run
    app = loader.get_wsgi_app(app_name, config_vars)
  File "/usr/local/lib/python3.7/site-packages/plaster_pastedeploy/__init__.py", line 129, in get_wsgi_app
    global_conf=defaults,
  File "/usr/local/lib/python3.7/site-packages/paste/deploy/loadwsgi.py", line 253, in loadapp
    return loadobj(APP, uri, name=name, **kw)
  File "/usr/local/lib/python3.7/site-packages/paste/deploy/loadwsgi.py", line 278, in loadobj
    return context.create()
  File "/usr/local/lib/python3.7/site-packages/paste/deploy/loadwsgi.py", line 715, in create
    return self.object_type.invoke(self)
  File "/usr/local/lib/python3.7/site-packages/paste/deploy/loadwsgi.py", line 150, in invoke
    **context.local_conf)
  File "/usr/local/lib/python3.7/site-packages/paste/deploy/util.py", line 55, in fix_call
    val = callable(*args, **kw)
  File "/usr/local/lib/python3.7/site-packages/paste/urlmap.py", line 31, in urlmap_factory
    app = loader.get_app(app_name, global_conf=global_conf)
  File "/usr/local/lib/python3.7/site-packages/paste/deploy/loadwsgi.py", line 356, in get_app
    name=name, global_conf=global_conf).create()
  File "/usr/local/lib/python3.7/site-packages/paste/deploy/loadwsgi.py", line 715, in create
    return self.object_type.invoke(self)
  File "/usr/local/lib/python3.7/site-packages/paste/deploy/loadwsgi.py", line 235, in invoke
    filtered = context.next_context.create()
  File "/usr/local/lib/python3.7/site-packages/paste/deploy/loadwsgi.py", line 715, in create
    return self.object_type.invoke(self)
  File "/usr/local/lib/python3.7/site-packages/paste/deploy/loadwsgi.py", line 152, in invoke
    return fix_call(context.object, context.global_conf, **context.local_conf)
  File "/usr/local/lib/python3.7/site-packages/paste/deploy/util.py", line 55, in fix_call
    val = callable(*args, **kw)
  File "/opt/local/src/magpie/magpie/app.py", line 46, in main
    run_database_migration_when_ready(settings)  # cannot pass db session as it might not even exist yet!
  File "/opt/local/src/magpie/magpie/db.py", line 239, in run_database_migration_when_ready
    raise_log("Database not ready", exception=RuntimeError, logger=LOGGER)
  File "/opt/local/src/magpie/magpie/utils.py", line 168, in raise_log
    raise exception(msg)
RuntimeError: Database not ready

We have a password set for magpie but no other settings have been changed off of the defaults. Are there additional items we need to configure to get this change working?

@tlvu
Copy link

tlvu commented Aug 4, 2021

RuntimeError: Database not ready

@nikola-rados I have never seen this error in the many Magpie upgrade in the past. Is your postgres-magpie container running?! Hope you had a backup of your /data/magpie_persist folder before the upgrade.

Here are the few things I would try:

  • try to resume the upgrade:
    • ./pavics-compose.sh up -d
    • or ./pavics-compose.sh restart postgres-magpie magpie
  • try to redo the upgrade again from scratch:
    • delete magpie and postgres-magpie container: ./pavics-compose.sh stop magpie postgres-magpie && ./pavics-compose.sh rm -vf magpie postgres-magpie
    • restore your backup of /data/magpie_persist
    • checkout the previous version before the upgrade: git checkout 1.13.14 (the upgrade is tag 1.14.0)
    • bring PAVICS up in the previous state: ./pavics-compose.sh up -d
    • stop the autodeploy by creating any new files under the checkout
    • perform any tests required to confirm the previous state are working fine to set a baseline for the upgrade
    • checkout the upgraded state: git checkout 1.14.0
    • upgrade: ./pavics-compose.sh up -d
    • check magpie logs: docker logs magpie
    • perform any tests required to confirm the upgrade working fine and compare the test result with your baseline above
    • if all is good, re-enable autodeploy by deleting the previous bogus file created

@nikola-rados
Copy link
Contributor

I've tried to rollback several times but for whatever reason magpie refuses to work. I rolled back to 1.13.12 which is what we were using before I attempted the upgrade, here is what things look like:

[nrados@docker-dev03 birdhouse]$ docker ps -a | grep magpie
b4a251262d77   pavics/twitcher:magpie-3.12.0         "pserve /opt/birdhou…"   About a minute ago   Up About a minute                 8080/tcp, 8443/tcp, 0.0.0.0:8000->8000/tcp, :::8000->8000/tcp, 9001/tcp                                                                                                                                                                                                                                     twitcher
a311641568c0   pavics/magpie:3.12.0                  "/bin/sh -c 'crond -…"   About a minute ago   Restarting (255) 29 seconds ago                                                                                                                                                                                                                                                                                                               magpie
3486c52a9f8a   postgres:9.6                          "docker-entrypoint.s…"   About a minute ago   Up About a minute                 5432/tcp                                                                                                                                                                                                                                                                                                    postgres-magpie

And the logs coming from magpie:

[nrados@docker-dev03 birdhouse]$ docker logs --follow magpie
[2021-08-04 18:52:11,306] WARNING    [MainThread][magpie.utils] Constant could not be found: GITHUB_CLIENT_ID (using default: None)
[2021-08-04 18:52:11,306] WARNING    [MainThread][magpie.utils] Constant could not be found: GITHUB_CLIENT_SECRET (using default: None)
[2021-08-04 18:52:11,306] WARNING    [MainThread][magpie.utils] Constant could not be found: WSO2_HOSTNAME (using default: None)
[2021-08-04 18:52:11,306] WARNING    [MainThread][magpie.utils] Constant could not be found: WSO2_CLIENT_ID (using default: None)
[2021-08-04 18:52:11,306] WARNING    [MainThread][magpie.utils] Constant could not be found: WSO2_CLIENT_SECRET (using default: None)
[2021-08-04 18:52:11,306] WARNING    [MainThread][magpie.utils] Constant could not be found: WSO2_CERTIFICATE_FILE (using default: None)
[2021-08-04 18:52:11,306] WARNING    [MainThread][magpie.utils] Constant could not be found: WSO2_SSL_VERIFY (using default: True)
[2021-08-04 18:52:11,558] INFO       [MainThread][magpie.app] Setting up loggers...
[2021-08-04 18:52:11,558] INFO       [MainThread][magpie.app] Looking for db migration requirement...
[2021-08-04 18:52:11,558] INFO       [MainThread][magpie.db] Running database migration (as required)...
[2021-08-04 18:52:11,558] INFO       [MainThread][magpie.db] Using file '/opt/local/src/magpie/config/magpie.ini' for migration.
[2021-08-04 18:52:11,621] WARNING    [MainThread][magpie.utils] Constant could not be found: MAGPIE_DB_URL (using default: None)
[2021-08-04 18:52:11,622] INFO       [MainThread][magpie.db] Using composed settings 'MAGPIE_POSTGRES_<>' for database connection.
[2021-08-04 18:52:11,645] INFO       [MainThread][alembic.runtime.migration] Context impl PostgresqlImpl.
[2021-08-04 18:52:11,645] INFO       [MainThread][alembic.runtime.migration] Will assume transactional DDL.
[2021-08-04 18:52:11,683] ERROR      [MainThread][alembic.util.messaging] Can't locate revision identified by 'dea413e13a8a'
FAILED: Can't locate revision identified by 'dea413e13a8a'
[2021-08-04 18:52:13,722] WARNING    [MainThread][magpie.utils] Constant could not be found: GITHUB_CLIENT_ID (using default: None)
[2021-08-04 18:52:13,722] WARNING    [MainThread][magpie.utils] Constant could not be found: GITHUB_CLIENT_SECRET (using default: None)
[2021-08-04 18:52:13,722] WARNING    [MainThread][magpie.utils] Constant could not be found: WSO2_HOSTNAME (using default: None)
[2021-08-04 18:52:13,723] WARNING    [MainThread][magpie.utils] Constant could not be found: WSO2_CLIENT_ID (using default: None)
[2021-08-04 18:52:13,723] WARNING    [MainThread][magpie.utils] Constant could not be found: WSO2_CLIENT_SECRET (using default: None)
[2021-08-04 18:52:13,723] WARNING    [MainThread][magpie.utils] Constant could not be found: WSO2_CERTIFICATE_FILE (using default: None)
[2021-08-04 18:52:13,723] WARNING    [MainThread][magpie.utils] Constant could not be found: WSO2_SSL_VERIFY (using default: True)
[2021-08-04 18:52:13,935] INFO       [MainThread][magpie.app] Setting up loggers...
[2021-08-04 18:52:13,935] INFO       [MainThread][magpie.app] Looking for db migration requirement...
[2021-08-04 18:52:13,935] INFO       [MainThread][magpie.db] Running database migration (as required)...
[2021-08-04 18:52:13,935] INFO       [MainThread][magpie.db] Using file '/opt/local/src/magpie/config/magpie.ini' for migration.
[2021-08-04 18:52:13,995] WARNING    [MainThread][magpie.utils] Constant could not be found: MAGPIE_DB_URL (using default: None)
[2021-08-04 18:52:13,995] INFO       [MainThread][magpie.db] Using composed settings 'MAGPIE_POSTGRES_<>' for database connection.
[2021-08-04 18:52:14,016] INFO       [MainThread][alembic.runtime.migration] Context impl PostgresqlImpl.
[2021-08-04 18:52:14,016] INFO       [MainThread][alembic.runtime.migration] Will assume transactional DDL.
[2021-08-04 18:52:14,030] ERROR      [MainThread][alembic.util.messaging] Can't locate revision identified by 'dea413e13a8a'
FAILED: Can't locate revision identified by 'dea413e13a8a'
[2021-08-04 18:52:16,097] WARNING    [MainThread][magpie.utils] Constant could not be found: GITHUB_CLIENT_ID (using default: None)
[2021-08-04 18:52:16,097] WARNING    [MainThread][magpie.utils] Constant could not be found: GITHUB_CLIENT_SECRET (using default: None)
[2021-08-04 18:52:16,097] WARNING    [MainThread][magpie.utils] Constant could not be found: WSO2_HOSTNAME (using default: None)
[2021-08-04 18:52:16,097] WARNING    [MainThread][magpie.utils] Constant could not be found: WSO2_CLIENT_ID (using default: None)
[2021-08-04 18:52:16,097] WARNING    [MainThread][magpie.utils] Constant could not be found: WSO2_CLIENT_SECRET (using default: None)
[2021-08-04 18:52:16,097] WARNING    [MainThread][magpie.utils] Constant could not be found: WSO2_CERTIFICATE_FILE (using default: None)
[2021-08-04 18:52:16,097] WARNING    [MainThread][magpie.utils] Constant could not be found: WSO2_SSL_VERIFY (using default: True)
[2021-08-04 18:52:16,315] INFO       [MainThread][magpie.app] Setting up loggers...
[2021-08-04 18:52:16,316] INFO       [MainThread][magpie.app] Looking for db migration requirement...
[2021-08-04 18:52:16,316] INFO       [MainThread][magpie.db] Running database migration (as required)...
[2021-08-04 18:52:16,316] INFO       [MainThread][magpie.db] Using file '/opt/local/src/magpie/config/magpie.ini' for migration.
[2021-08-04 18:52:16,378] WARNING    [MainThread][magpie.utils] Constant could not be found: MAGPIE_DB_URL (using default: None)
[2021-08-04 18:52:16,378] INFO       [MainThread][magpie.db] Using composed settings 'MAGPIE_POSTGRES_<>' for database connection.
[2021-08-04 18:52:16,399] INFO       [MainThread][alembic.runtime.migration] Context impl PostgresqlImpl.
[2021-08-04 18:52:16,399] INFO       [MainThread][alembic.runtime.migration] Will assume transactional DDL.
[2021-08-04 18:52:16,414] ERROR      [MainThread][alembic.util.messaging] Can't locate revision identified by 'dea413e13a8a'
FAILED: Can't locate revision identified by 'dea413e13a8a'
[2021-08-04 18:52:18,683] WARNING    [MainThread][magpie.utils] Constant could not be found: GITHUB_CLIENT_ID (using default: None)
[2021-08-04 18:52:18,683] WARNING    [MainThread][magpie.utils] Constant could not be found: GITHUB_CLIENT_SECRET (using default: None)
[2021-08-04 18:52:18,683] WARNING    [MainThread][magpie.utils] Constant could not be found: WSO2_HOSTNAME (using default: None)
[2021-08-04 18:52:18,683] WARNING    [MainThread][magpie.utils] Constant could not be found: WSO2_CLIENT_ID (using default: None)
[2021-08-04 18:52:18,683] WARNING    [MainThread][magpie.utils] Constant could not be found: WSO2_CLIENT_SECRET (using default: None)
[2021-08-04 18:52:18,683] WARNING    [MainThread][magpie.utils] Constant could not be found: WSO2_CERTIFICATE_FILE (using default: None)
[2021-08-04 18:52:18,683] WARNING    [MainThread][magpie.utils] Constant could not be found: WSO2_SSL_VERIFY (using default: True)
[2021-08-04 18:52:18,894] INFO       [MainThread][magpie.app] Setting up loggers...
[2021-08-04 18:52:18,894] INFO       [MainThread][magpie.app] Looking for db migration requirement...
[2021-08-04 18:52:18,894] INFO       [MainThread][magpie.db] Running database migration (as required)...
[2021-08-04 18:52:18,894] INFO       [MainThread][magpie.db] Using file '/opt/local/src/magpie/config/magpie.ini' for migration.
[2021-08-04 18:52:18,955] WARNING    [MainThread][magpie.utils] Constant could not be found: MAGPIE_DB_URL (using default: None)
[2021-08-04 18:52:18,956] INFO       [MainThread][magpie.db] Using composed settings 'MAGPIE_POSTGRES_<>' for database connection.
[2021-08-04 18:52:18,975] INFO       [MainThread][alembic.runtime.migration] Context impl PostgresqlImpl.
[2021-08-04 18:52:18,975] INFO       [MainThread][alembic.runtime.migration] Will assume transactional DDL.
[2021-08-04 18:52:18,988] ERROR      [MainThread][alembic.util.messaging] Can't locate revision identified by 'dea413e13a8a'
FAILED: Can't locate revision identified by 'dea413e13a8a'
[2021-08-04 18:52:21,748] WARNING    [MainThread][magpie.utils] Constant could not be found: GITHUB_CLIENT_ID (using default: None)
[2021-08-04 18:52:21,748] WARNING    [MainThread][magpie.utils] Constant could not be found: GITHUB_CLIENT_SECRET (using default: None)
[2021-08-04 18:52:21,749] WARNING    [MainThread][magpie.utils] Constant could not be found: WSO2_HOSTNAME (using default: None)
[2021-08-04 18:52:21,749] WARNING    [MainThread][magpie.utils] Constant could not be found: WSO2_CLIENT_ID (using default: None)
[2021-08-04 18:52:21,749] WARNING    [MainThread][magpie.utils] Constant could not be found: WSO2_CLIENT_SECRET (using default: None)
[2021-08-04 18:52:21,749] WARNING    [MainThread][magpie.utils] Constant could not be found: WSO2_CERTIFICATE_FILE (using default: None)
[2021-08-04 18:52:21,749] WARNING    [MainThread][magpie.utils] Constant could not be found: WSO2_SSL_VERIFY (using default: True)
[2021-08-04 18:52:21,975] INFO       [MainThread][magpie.app] Setting up loggers...
[2021-08-04 18:52:21,975] INFO       [MainThread][magpie.app] Looking for db migration requirement...
[2021-08-04 18:52:21,975] INFO       [MainThread][magpie.db] Running database migration (as required)...
[2021-08-04 18:52:21,975] INFO       [MainThread][magpie.db] Using file '/opt/local/src/magpie/config/magpie.ini' for migration.
[2021-08-04 18:52:22,039] WARNING    [MainThread][magpie.utils] Constant could not be found: MAGPIE_DB_URL (using default: None)
[2021-08-04 18:52:22,039] INFO       [MainThread][magpie.db] Using composed settings 'MAGPIE_POSTGRES_<>' for database connection.
[2021-08-04 18:52:22,061] INFO       [MainThread][alembic.runtime.migration] Context impl PostgresqlImpl.
[2021-08-04 18:52:22,061] INFO       [MainThread][alembic.runtime.migration] Will assume transactional DDL.
[2021-08-04 18:52:22,076] ERROR      [MainThread][alembic.util.messaging] Can't locate revision identified by 'dea413e13a8a'
FAILED: Can't locate revision identified by 'dea413e13a8a'
[2021-08-04 18:52:25,638] WARNING    [MainThread][magpie.utils] Constant could not be found: GITHUB_CLIENT_ID (using default: None)
[2021-08-04 18:52:25,639] WARNING    [MainThread][magpie.utils] Constant could not be found: GITHUB_CLIENT_SECRET (using default: None)
[2021-08-04 18:52:25,639] WARNING    [MainThread][magpie.utils] Constant could not be found: WSO2_HOSTNAME (using default: None)
[2021-08-04 18:52:25,639] WARNING    [MainThread][magpie.utils] Constant could not be found: WSO2_CLIENT_ID (using default: None)
[2021-08-04 18:52:25,639] WARNING    [MainThread][magpie.utils] Constant could not be found: WSO2_CLIENT_SECRET (using default: None)
[2021-08-04 18:52:25,639] WARNING    [MainThread][magpie.utils] Constant could not be found: WSO2_CERTIFICATE_FILE (using default: None)
[2021-08-04 18:52:25,639] WARNING    [MainThread][magpie.utils] Constant could not be found: WSO2_SSL_VERIFY (using default: True)
[2021-08-04 18:52:25,856] INFO       [MainThread][magpie.app] Setting up loggers...
[2021-08-04 18:52:25,856] INFO       [MainThread][magpie.app] Looking for db migration requirement...
[2021-08-04 18:52:25,856] INFO       [MainThread][magpie.db] Running database migration (as required)...
[2021-08-04 18:52:25,856] INFO       [MainThread][magpie.db] Using file '/opt/local/src/magpie/config/magpie.ini' for migration.
[2021-08-04 18:52:25,918] WARNING    [MainThread][magpie.utils] Constant could not be found: MAGPIE_DB_URL (using default: None)
[2021-08-04 18:52:25,918] INFO       [MainThread][magpie.db] Using composed settings 'MAGPIE_POSTGRES_<>' for database connection.
[2021-08-04 18:52:25,939] INFO       [MainThread][alembic.runtime.migration] Context impl PostgresqlImpl.
[2021-08-04 18:52:25,939] INFO       [MainThread][alembic.runtime.migration] Will assume transactional DDL.
[2021-08-04 18:52:25,953] ERROR      [MainThread][alembic.util.messaging] Can't locate revision identified by 'dea413e13a8a'
FAILED: Can't locate revision identified by 'dea413e13a8a'
[2021-08-04 18:52:31,035] WARNING    [MainThread][magpie.utils] Constant could not be found: GITHUB_CLIENT_ID (using default: None)
[2021-08-04 18:52:31,035] WARNING    [MainThread][magpie.utils] Constant could not be found: GITHUB_CLIENT_SECRET (using default: None)
[2021-08-04 18:52:31,035] WARNING    [MainThread][magpie.utils] Constant could not be found: WSO2_HOSTNAME (using default: None)
[2021-08-04 18:52:31,035] WARNING    [MainThread][magpie.utils] Constant could not be found: WSO2_CLIENT_ID (using default: None)
[2021-08-04 18:52:31,035] WARNING    [MainThread][magpie.utils] Constant could not be found: WSO2_CLIENT_SECRET (using default: None)
[2021-08-04 18:52:31,035] WARNING    [MainThread][magpie.utils] Constant could not be found: WSO2_CERTIFICATE_FILE (using default: None)
[2021-08-04 18:52:31,035] WARNING    [MainThread][magpie.utils] Constant could not be found: WSO2_SSL_VERIFY (using default: True)
[2021-08-04 18:52:31,243] INFO       [MainThread][magpie.app] Setting up loggers...
[2021-08-04 18:52:31,243] INFO       [MainThread][magpie.app] Looking for db migration requirement...
[2021-08-04 18:52:31,243] INFO       [MainThread][magpie.db] Running database migration (as required)...
[2021-08-04 18:52:31,244] INFO       [MainThread][magpie.db] Using file '/opt/local/src/magpie/config/magpie.ini' for migration.
[2021-08-04 18:52:31,305] WARNING    [MainThread][magpie.utils] Constant could not be found: MAGPIE_DB_URL (using default: None)
[2021-08-04 18:52:31,305] INFO       [MainThread][magpie.db] Using composed settings 'MAGPIE_POSTGRES_<>' for database connection.
[2021-08-04 18:52:31,328] INFO       [MainThread][alembic.runtime.migration] Context impl PostgresqlImpl.
[2021-08-04 18:52:31,328] INFO       [MainThread][alembic.runtime.migration] Will assume transactional DDL.
[2021-08-04 18:52:31,341] ERROR      [MainThread][alembic.util.messaging] Can't locate revision identified by 'dea413e13a8a'
FAILED: Can't locate revision identified by 'dea413e13a8a'
[2021-08-04 18:52:39,631] WARNING    [MainThread][magpie.utils] Constant could not be found: GITHUB_CLIENT_ID (using default: None)
[2021-08-04 18:52:39,631] WARNING    [MainThread][magpie.utils] Constant could not be found: GITHUB_CLIENT_SECRET (using default: None)
[2021-08-04 18:52:39,631] WARNING    [MainThread][magpie.utils] Constant could not be found: WSO2_HOSTNAME (using default: None)
[2021-08-04 18:52:39,631] WARNING    [MainThread][magpie.utils] Constant could not be found: WSO2_CLIENT_ID (using default: None)
[2021-08-04 18:52:39,631] WARNING    [MainThread][magpie.utils] Constant could not be found: WSO2_CLIENT_SECRET (using default: None)
[2021-08-04 18:52:39,631] WARNING    [MainThread][magpie.utils] Constant could not be found: WSO2_CERTIFICATE_FILE (using default: None)
[2021-08-04 18:52:39,631] WARNING    [MainThread][magpie.utils] Constant could not be found: WSO2_SSL_VERIFY (using default: True)
[2021-08-04 18:52:39,852] INFO       [MainThread][magpie.app] Setting up loggers...
[2021-08-04 18:52:39,853] INFO       [MainThread][magpie.app] Looking for db migration requirement...
[2021-08-04 18:52:39,853] INFO       [MainThread][magpie.db] Running database migration (as required)...
[2021-08-04 18:52:39,853] INFO       [MainThread][magpie.db] Using file '/opt/local/src/magpie/config/magpie.ini' for migration.
[2021-08-04 18:52:39,916] WARNING    [MainThread][magpie.utils] Constant could not be found: MAGPIE_DB_URL (using default: None)
[2021-08-04 18:52:39,916] INFO       [MainThread][magpie.db] Using composed settings 'MAGPIE_POSTGRES_<>' for database connection.
[2021-08-04 18:52:39,937] INFO       [MainThread][alembic.runtime.migration] Context impl PostgresqlImpl.
[2021-08-04 18:52:39,937] INFO       [MainThread][alembic.runtime.migration] Will assume transactional DDL.
[2021-08-04 18:52:39,951] ERROR      [MainThread][alembic.util.messaging] Can't locate revision identified by 'dea413e13a8a'
FAILED: Can't locate revision identified by 'dea413e13a8a'
[2021-08-04 18:52:54,660] WARNING    [MainThread][magpie.utils] Constant could not be found: GITHUB_CLIENT_ID (using default: None)
[2021-08-04 18:52:54,660] WARNING    [MainThread][magpie.utils] Constant could not be found: GITHUB_CLIENT_SECRET (using default: None)
[2021-08-04 18:52:54,660] WARNING    [MainThread][magpie.utils] Constant could not be found: WSO2_HOSTNAME (using default: None)
[2021-08-04 18:52:54,660] WARNING    [MainThread][magpie.utils] Constant could not be found: WSO2_CLIENT_ID (using default: None)
[2021-08-04 18:52:54,660] WARNING    [MainThread][magpie.utils] Constant could not be found: WSO2_CLIENT_SECRET (using default: None)
[2021-08-04 18:52:54,660] WARNING    [MainThread][magpie.utils] Constant could not be found: WSO2_CERTIFICATE_FILE (using default: None)
[2021-08-04 18:52:54,661] WARNING    [MainThread][magpie.utils] Constant could not be found: WSO2_SSL_VERIFY (using default: True)
[2021-08-04 18:52:54,869] INFO       [MainThread][magpie.app] Setting up loggers...
[2021-08-04 18:52:54,869] INFO       [MainThread][magpie.app] Looking for db migration requirement...
[2021-08-04 18:52:54,869] INFO       [MainThread][magpie.db] Running database migration (as required)...
[2021-08-04 18:52:54,869] INFO       [MainThread][magpie.db] Using file '/opt/local/src/magpie/config/magpie.ini' for migration.
[2021-08-04 18:52:54,931] WARNING    [MainThread][magpie.utils] Constant could not be found: MAGPIE_DB_URL (using default: None)
[2021-08-04 18:52:54,931] INFO       [MainThread][magpie.db] Using composed settings 'MAGPIE_POSTGRES_<>' for database connection.
[2021-08-04 18:52:54,950] INFO       [MainThread][alembic.runtime.migration] Context impl PostgresqlImpl.
[2021-08-04 18:52:54,950] INFO       [MainThread][alembic.runtime.migration] Will assume transactional DDL.
[2021-08-04 18:52:54,964] ERROR      [MainThread][alembic.util.messaging] Can't locate revision identified by 'dea413e13a8a'
FAILED: Can't locate revision identified by 'dea413e13a8a'
[2021-08-04 18:53:22,410] WARNING    [MainThread][magpie.utils] Constant could not be found: GITHUB_CLIENT_ID (using default: None)
[2021-08-04 18:53:22,410] WARNING    [MainThread][magpie.utils] Constant could not be found: GITHUB_CLIENT_SECRET (using default: None)
[2021-08-04 18:53:22,410] WARNING    [MainThread][magpie.utils] Constant could not be found: WSO2_HOSTNAME (using default: None)
[2021-08-04 18:53:22,410] WARNING    [MainThread][magpie.utils] Constant could not be found: WSO2_CLIENT_ID (using default: None)
[2021-08-04 18:53:22,410] WARNING    [MainThread][magpie.utils] Constant could not be found: WSO2_CLIENT_SECRET (using default: None)
[2021-08-04 18:53:22,411] WARNING    [MainThread][magpie.utils] Constant could not be found: WSO2_CERTIFICATE_FILE (using default: None)
[2021-08-04 18:53:22,411] WARNING    [MainThread][magpie.utils] Constant could not be found: WSO2_SSL_VERIFY (using default: True)
[2021-08-04 18:53:22,616] INFO       [MainThread][magpie.app] Setting up loggers...
[2021-08-04 18:53:22,616] INFO       [MainThread][magpie.app] Looking for db migration requirement...
[2021-08-04 18:53:22,616] INFO       [MainThread][magpie.db] Running database migration (as required)...
[2021-08-04 18:53:22,616] INFO       [MainThread][magpie.db] Using file '/opt/local/src/magpie/config/magpie.ini' for migration.
[2021-08-04 18:53:22,677] WARNING    [MainThread][magpie.utils] Constant could not be found: MAGPIE_DB_URL (using default: None)
[2021-08-04 18:53:22,677] INFO       [MainThread][magpie.db] Using composed settings 'MAGPIE_POSTGRES_<>' for database connection.
[2021-08-04 18:53:22,697] INFO       [MainThread][alembic.runtime.migration] Context impl PostgresqlImpl.
[2021-08-04 18:53:22,697] INFO       [MainThread][alembic.runtime.migration] Will assume transactional DDL.
[2021-08-04 18:53:22,710] ERROR      [MainThread][alembic.util.messaging] Can't locate revision identified by 'dea413e13a8a'
FAILED: Can't locate revision identified by 'dea413e13a8a'
[2021-08-04 18:54:15,774] WARNING    [MainThread][magpie.utils] Constant could not be found: GITHUB_CLIENT_ID (using default: None)
[2021-08-04 18:54:15,774] WARNING    [MainThread][magpie.utils] Constant could not be found: GITHUB_CLIENT_SECRET (using default: None)
[2021-08-04 18:54:15,774] WARNING    [MainThread][magpie.utils] Constant could not be found: WSO2_HOSTNAME (using default: None)
[2021-08-04 18:54:15,775] WARNING    [MainThread][magpie.utils] Constant could not be found: WSO2_CLIENT_ID (using default: None)
[2021-08-04 18:54:15,775] WARNING    [MainThread][magpie.utils] Constant could not be found: WSO2_CLIENT_SECRET (using default: None)
[2021-08-04 18:54:15,775] WARNING    [MainThread][magpie.utils] Constant could not be found: WSO2_CERTIFICATE_FILE (using default: None)
[2021-08-04 18:54:15,775] WARNING    [MainThread][magpie.utils] Constant could not be found: WSO2_SSL_VERIFY (using default: True)
[2021-08-04 18:54:15,984] INFO       [MainThread][magpie.app] Setting up loggers...
[2021-08-04 18:54:15,984] INFO       [MainThread][magpie.app] Looking for db migration requirement...
[2021-08-04 18:54:15,984] INFO       [MainThread][magpie.db] Running database migration (as required)...
[2021-08-04 18:54:15,984] INFO       [MainThread][magpie.db] Using file '/opt/local/src/magpie/config/magpie.ini' for migration.
[2021-08-04 18:54:16,047] WARNING    [MainThread][magpie.utils] Constant could not be found: MAGPIE_DB_URL (using default: None)
[2021-08-04 18:54:16,047] INFO       [MainThread][magpie.db] Using composed settings 'MAGPIE_POSTGRES_<>' for database connection.
[2021-08-04 18:54:16,070] INFO       [MainThread][alembic.runtime.migration] Context impl PostgresqlImpl.
[2021-08-04 18:54:16,070] INFO       [MainThread][alembic.runtime.migration] Will assume transactional DDL.
[2021-08-04 18:54:16,084] ERROR      [MainThread][alembic.util.messaging] Can't locate revision identified by 'dea413e13a8a'
FAILED: Can't locate revision identified by 'dea413e13a8a'

@nikola-rados
Copy link
Contributor

After applying the update it looks like there may be a duplicated email that is causing a problem:

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/opt/local/src/magpie/magpie/db.py", line 213, in run_database_migration_when_ready
    run_database_migration(db_session=db_session, container=settings)
  File "/opt/local/src/magpie/magpie/db.py", line 155, in run_database_migration
    alembic.config.main(argv=alembic_args)
  File "/usr/local/lib/python3.7/site-packages/alembic/config.py", line 581, in main
    CommandLine(prog=prog).main(argv=argv)
  File "/usr/local/lib/python3.7/site-packages/alembic/config.py", line 575, in main
    self.run_cmd(cfg, options)
  File "/usr/local/lib/python3.7/site-packages/alembic/config.py", line 555, in run_cmd
    **dict((k, getattr(options, k, None)) for k in kwarg)
  File "/usr/local/lib/python3.7/site-packages/alembic/command.py", line 298, in upgrade
    script.run_env()
  File "/usr/local/lib/python3.7/site-packages/alembic/script/base.py", line 489, in run_env
    util.load_python_file(self.dir, "env.py")
  File "/usr/local/lib/python3.7/site-packages/alembic/util/pyfiles.py", line 98, in load_python_file
    module = load_module_py(module_id, path)
  File "/usr/local/lib/python3.7/site-packages/alembic/util/compat.py", line 184, in load_module_py
    spec.loader.exec_module(module)
  File "<frozen importlib._bootstrap_external>", line 728, in exec_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "/opt/local/src/magpie/config/../magpie/alembic/env.py", line 104, in <module>
    run_migrations_online(config_connection)
  File "/opt/local/src/magpie/config/../magpie/alembic/env.py", line 94, in run_migrations_online
    context.run_migrations()
  File "<string>", line 8, in run_migrations
  File "/usr/local/lib/python3.7/site-packages/alembic/runtime/environment.py", line 846, in run_migrations
    self.get_context().run_migrations(**kw)
  File "/usr/local/lib/python3.7/site-packages/alembic/runtime/migration.py", line 522, in run_migrations
    step.migration_fn(**kw)
  File "/opt/local/src/magpie/magpie/alembic/versions/2021-04-19_00c617174e54_unique_user_emails.py", line 19, in upgrade
    op.create_unique_constraint("uq_users_email", "users", ["email"])
  File "<string>", line 8, in create_unique_constraint
  File "<string>", line 3, in create_unique_constraint
  File "/usr/local/lib/python3.7/site-packages/alembic/operations/ops.py", line 439, in create_unique_constraint
    return operations.invoke(op)
  File "/usr/local/lib/python3.7/site-packages/alembic/operations/base.py", line 373, in invoke
    return fn(self, operation)
  File "/usr/local/lib/python3.7/site-packages/alembic/operations/toimpl.py", line 152, in create_constraint
    operation.to_constraint(operations.migration_context)
  File "/usr/local/lib/python3.7/site-packages/alembic/ddl/impl.py", line 245, in add_constraint
    self._exec(schema.AddConstraint(const))
  File "/usr/local/lib/python3.7/site-packages/alembic/ddl/impl.py", line 141, in _exec
    return conn.execute(construct, *multiparams, **params)
  File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1011, in execute
    return meth(self, multiparams, params)
  File "/usr/local/lib/python3.7/site-packages/sqlalchemy/sql/ddl.py", line 72, in _execute_on_connection
    return connection._execute_ddl(self, multiparams, params)
  File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1073, in _execute_ddl
    compiled,
  File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1317, in _execute_context
    e, statement, parameters, cursor, context
  File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1511, in _handle_dbapi_exception
    sqlalchemy_exception, with_traceback=exc_info[2], from_=e
  File "/usr/local/lib/python3.7/site-packages/sqlalchemy/util/compat.py", line 182, in raise_
    raise exception
  File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1277, in _execute_context
    cursor, statement, parameters, context
  File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/default.py", line 608, in do_execute
    cursor.execute(statement, parameters)
sqlalchemy.exc.IntegrityError: (psycopg2.errors.UniqueViolation) could not create unique index "uq_users_email"
DETAIL:  Key (email)=([email protected]) is duplicated.

@tlvu
Copy link

tlvu commented Aug 5, 2021

@nikola-rados before going any further, please take immediately a backup of /data/magpie_persist by creating a tarball of it.

Then try the command in bird-house/birdhouse-deploy#182 (comment) to find all the duplicate emails and manually update those duplicate email to avoid duplication via direct sql update. Magpie UI do not work anymore at this point you'll have to use direct sql update.

Then resume the upgrade again.

How many custom access control rules and how many users did you add? If not too much, you might as well recreate these rules and users from scratch. Let's keep this option only as the last resort.

Pinging @fmigneault the Magpie developer if there is another way out.

@fmigneault
Copy link

The easiest method is with manual SQL update of the email as @tlvu suggested (or delete the user if it is not needed anymore).
Magpie cannot assume/migrate data in this case because it cannot know which users are important or not.

@tlvu
Copy link

tlvu commented Aug 5, 2021

FYI @nikola-rados SQL to manual update email direct in DB so you do not have to search yourself: Ouranosinc/Magpie#443 (comment)

@tlvu
Copy link

tlvu commented Aug 5, 2021

I'll be on vacation in a few hours. Francis (@fmigneault) please continue helping Nik (@nikola-rados) from PCIC for this Magpie upgrade, thanks.

@nikola-rados
Copy link
Contributor

Ok magpie is back up and running with the update. Now we can see if it fixes our issue here, thanks @tlvu and @fmigneault!

@eyvorchuk
Copy link
Contributor Author

Good news is that the NetCDF Authorization Failure has stopped after updating birdhouse-deploy. Bad news is that when running osprey using our proxy URL as the target, it times out after about four minutes with the following error:

HTTPError: 504 Server Error: Gateway Time-out for url: https://docker-dev03.pcic.uvic.ca/twitcher/ows/proxy/osprey/wps

Also, our docker logs for osprey show the following error, which also occurs when using the port for production:

Traceback (most recent call last):
  File "/tmp/osprey/processes/wps_full_rvic.py", line 199, in _handler
    convolution(convolve_config)
  File "/root/.local/lib/python3.8/site-packages/rvic/convolution.py", line 59, in convolution
    time_handle, hist_tapes = convolution_run(
  File "/root/.local/lib/python3.8/site-packages/rvic/convolution.py", line 337, in convolution_run
    runin = data_model.read(timestamp)
  File "/root/.local/lib/python3.8/site-packages/rvic/core/read_forcing.py", line 320, in read
    temp = self.current_fhdl.variables[fld][self.current_tind]
  File "src/netCDF4/_netCDF4.pyx", line 4406, in netCDF4._netCDF4.Variable.__getitem__
  File "src/netCDF4/_netCDF4.pyx", line 5350, in netCDF4._netCDF4.Variable._get
  File "src/netCDF4/_netCDF4.pyx", line 1927, in netCDF4._netCDF4._ensure_nc_success
RuntimeError: NetCDF: DAP server error

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/root/.local/lib/python3.8/site-packages/pywps/app/Process.py", line 250, in _run_process
    self.handler(wps_request, wps_response)  # the user must update the wps_response.
  File "/tmp/osprey/processes/wps_full_rvic.py", line 201, in _handler
    raise ProcessError(f"{type(e).__name__}: {e}")
pywps.app.exceptions.ProcessError: RuntimeError: NetCDF: DAP server error
2021-08-06 18:55:01 ERROR: osprey: Process error: method=wps_full_rvic.py._handler, line=201, msg=RuntimeError: NetCDF: DAP server error

@eyvorchuk
Copy link
Contributor Author

I did another run of osprey last week while monitoring our logs for osprey and twitcher. After the run finishes, osprey shows this:

2021-08-18 18:13:01 DEBUG: osprey: Initializing database connection
2021-08-18 18:13:01,353 INFO sqlalchemy.engine.Engine BEGIN (implicit)
2021-08-18 18:13:01,354 INFO sqlalchemy.engine.Engine SELECT count(*) AS count_1
FROM (SELECT pywps_requests.uuid AS pywps_requests_uuid, pywps_requests.pid AS pywps_requests_pid, pywps_requests.operation AS pywps_requests_operation, pywps_requests.version AS pywps_requests_version, pywps_requests.time_start AS pywps_requests_time_start, pywps_requests.time_end AS pywps_requests_time_end, pywps_requests.identifier AS pywps_requests_identifier, pywps_requests.message AS pywps_requests_message, pywps_requests.percent_done AS pywps_requests_percent_done, pywps_requests.status AS pywps_requests_status
FROM pywps_requests
WHERE pywps_requests.uuid = ?) AS anon_1
2021-08-18 18:13:01,354 INFO sqlalchemy.engine.Engine [cached since 8.944e+04s ago] ('fb9988c8-004e-11ec-b7fc-0242ac12000c',)
2021-08-18 18:13:01,355 INFO sqlalchemy.engine.Engine SELECT pywps_requests.uuid AS pywps_requests_uuid, pywps_requests.pid AS pywps_requests_pid, pywps_requests.operation AS pywps_requests_operation, pywps_requests.version AS pywps_requests_version, pywps_requests.time_start AS pywps_requests_time_start, pywps_requests.time_end AS pywps_requests_time_end, pywps_requests.identifier AS pywps_requests_identifier, pywps_requests.message AS pywps_requests_message, pywps_requests.percent_done AS pywps_requests_percent_done, pywps_requests.status AS pywps_requests_status
FROM pywps_requests
WHERE pywps_requests.uuid = ?
2021-08-18 18:13:01,355 INFO sqlalchemy.engine.Engine [cached since 8.944e+04s ago] ('fb9988c8-004e-11ec-b7fc-0242ac12000c',)
2021-08-18 18:13:01,357 INFO sqlalchemy.engine.Engine UPDATE pywps_requests SET time_end=?, message=?, percent_done=?, status=? WHERE pywps_requests.uuid = ?
2021-08-18 18:13:01,357 INFO sqlalchemy.engine.Engine [cached since 8.944e+04s ago] ('2021-08-18 18:13:01.356406', 'Process error: RuntimeError: NetCDF: DAP server error', 100.0, 5, 'fb9988c8-004e-11ec-b7fc-0242ac12000c')
2021-08-18 18:13:01,357 INFO sqlalchemy.engine.Engine COMMIT
2021-08-18 18:13:01 DEBUG: osprey: _update_status: status=5, clean=True
2021-08-18 18:13:01 DEBUG: osprey: clean workdir: status=5
2021-08-18 18:13:01 INFO: osprey: Removing temporary working directory: /tmp/pywps_process_ftlcy2n_
2021-08-18 18:13:01 DEBUG: osprey: Checking for stored requests
2021-08-18 18:13:01 DEBUG: osprey: Initializing database connection
2021-08-18 18:13:01,362 INFO sqlalchemy.engine.Engine BEGIN (implicit)
2021-08-18 18:13:01,364 INFO sqlalchemy.engine.Engine SELECT pywps_stored_requests.uuid AS pywps_stored_requests_uuid, pywps_stored_requests.request AS pywps_stored_requests_request
FROM pywps_stored_requests
 LIMIT ? OFFSET ?
2021-08-18 18:13:01,364 INFO sqlalchemy.engine.Engine [generated in 0.00019s] (1, 0)
2021-08-18 18:13:01 DEBUG: osprey: No stored request found
2021-08-18 18:13:01 INFO: osprey: Request: getcapabilities

twitcher gives the following info repeatedly during the runtime:

2021-08-18 18:09:01,242 INFO  [TWITCHER:158][waitress-1] 'None' request 'read' permission on '/ows/proxy/thredds/dodsC/datasets/storage/data/projects/comp_support/climate_explorer_data_prep/hydro/sample_data/set4/columbia_vicset2.nc.dods'
2021-08-18 18:09:01,255 INFO  [TWITCHER:60][waitress-1] Using adapter: '<class 'magpie.adapter.MagpieAdapter'>'
2021-08-18 18:09:01,274 INFO  [TWITCHER:60][waitress-1] Using adapter: '<class 'magpie.adapter.MagpieAdapter'>'

Sometimes, the following shows up:

2021-08-18 18:09:01,348 WARNI [waitress.queue:117][MainThread] Task queue depth is 1

This also appears after the server error occurs:

2021-08-18 18:13:01,439 INFO  [waitress:353][waitress-0] Client disconnected while serving /ows/proxy/thredds/dodsC/datasets/storage/data/projects/comp_support/climate_explorer_data_prep/hydro/sample_data/set4/columbia_vicset2.nc.dods

The following also shows up occasionally, but it shows something similar for our other web services, so it's probably normal:

2021-08-18 18:25:01,827 INFO  [TWITCHER:158][waitress-0] 'None' request 'getcapabilities' permission on '/ow

@tlvu I was wondering if any of these logging messages could indicate why we're having a RuntimeError: NetCDF: DAP server error.

@fmigneault
Copy link

2021-08-18 18:09:01,348 WARNI [waitress.queue:117][MainThread] Task queue depth is 1
If memory serves right, this happens when waitress is using all its threads/workers to answer to some requests, and incoming ones after that will have to wait for one to become available, potentially causing timeouts.

If the server can handle more workers, the number could be increased if multiple download / WPS requests are often executed in parallel by multiple users. If not, I would investigate further why waitress doesn't free them after closing the response.

I remember getting many caching slowdown with waitress, amongst other issues.
We replaced it with gunicorn as of 0.5.4
https://github.com/bird-house/twitcher/blob/master/CHANGES.rst#054-2020-10-29
with the command override to pass the INI file:
https://github.com/bird-house/birdhouse-deploy/blob/4e9a94dbcbf1ddd8bb275317206c3376fd7897d3/birdhouse/docker-compose.yml#L354

2021-08-18 18:09:01,242 INFO  [TWITCHER:158][waitress-1] 'None' request 'read' permission on '/ows/proxy/thredds/dodsC/datasets/storage/data/projects/comp_support/climate_explorer_data_prep/hydro/sample_data/set4/columbia_vicset2.nc.dods'

This one only indicates that the "anonymous"/public user is used for the corresponding request.
If the file should be publicly accessible, there is nothing to worry about.

@tlvu
Copy link

tlvu commented Aug 24, 2021

Sorry for the late reply, I was on vacation.

when running osprey using our proxy URL as the target, it times out after about four minutes

That 4 mins timeout seems to be related to this config https://github.com/bird-house/birdhouse-deploy/blob/4e9a94dbcbf1ddd8bb275317206c3376fd7897d3/birdhouse/default.env#L77-L79.

You can bump that value to a more appropriate value. The side-effect of a too long timeout is if the server is indeed crashing or dead, you won't know about it sooner. Yours to decide on your appropriate value.

The note on the config also mention about using async mode. Note there is currently a bug with queue handling in PyWPS that you can not mix sync and async call within the same bird, see bird-house/finch#121. We got around about this problem by having 2 same bird, one dedicated for sync and another one for async (see bird-house/finch#98 (comment)).

General debugging note: the first step to debugging should be to remove the front Nginx and Twitcher proxy and directly hit the WPS and Thredds. This can be achieved by directly hitting the port of the service instead of going through port 443 (httpS). For example Thredds port is 8083 by looking at the docker-compose.yml file (https://github.com/bird-house/birdhouse-deploy/blob/4e9a94dbcbf1ddd8bb275317206c3376fd7897d3/birdhouse/docker-compose.yml#L256).

Note it's also a security risk to expose direct access to all the services so your firewall should only expose 443 to the internet. All those direct service port should only be allowed inside your network.

Hope it helps.

@eyvorchuk
Copy link
Contributor Author

This issue has been fully resolved by increasing the PROXY_READ_TIMEOUT_VALUE.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants