Skip to content

Cognition integration provider #302

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 96 commits into from
Jul 4, 2025
Merged

Conversation

andhreljaKern
Copy link
Contributor

@andhreljaKern andhreljaKern commented May 15, 2025

This is the main PR.

Related PRs

Notes

  • after merge double check all env var are correct in release PR

New repository

Important

Retrieve:

Use dev-setup@cognition-integration-provider to run cognition-integration-provider (bash start -a -b cognition-integration-provider)

Tests

Tests were not developed for this container due to long running extraction and transformation tasks

Affected areas

  • dev-setup, deployment-cognition, deployment-managed-cognition
    • added a new internal container cognition-integration-provider
  • refinery-submodule-model
    • added 2 new cognition schema tables (integrations + access) and individual integrations tables (new integration schema)
  • cognition-task-master
    • added a new "INTEGRATION" task that runs "execute-integration" (delta loads)
  • admin-dashboard
    • added the ability to "assign" integrations to organizations
  • cognition-ui
    • added a new "Integrations" page next to ETL page
  • refinery-ui
    • added info tooltips that label refinery projects as integration created
  • refinery-gateway
    • alembic migrations
  • cognition-gateway
    • routes to cognition-integration-provider (CRUD, Sync, Execute, Check for Updates)
    • added support for creating Cognition projects from Integrations
  • cognition-integration-provider
    • new container with integration extraction/transformation logic

Performance

MP - multiprocessing (# workers)
SP - singleprocessing
image

@JWittmeyer
Copy link
Member

JWittmeyer commented Jul 2, 2025

Just started everything fresh and got an error for env var collection
image

all builds are done so im assuming it's an acutal issue, the databse looks fine imo

  • resolved
  • ignored for now

edit jwittmeyer
starting excluded didn't result in the same error so maybe something else was up. ill double check after the merge with dev

@JWittmeyer
Copy link
Member

Checked further on folder permission for a external user ([email protected])
image

Folder with permission:
image

Items in Folder without permission:
image

Not sure how easy it would be to collect an email from that but would potentially be needed for the user access

to double check i did the same with my private email with the same resulting in the same uuid (so not every user gets a new uuid) so i'm assuming there are some steps involved in the process to get the emails


also i never got an actual invite link via email but that is a sharepoint issue not ours


  • resolved
  • not possible in given time

@JWittmeyer
Copy link
Member

JWittmeyer commented Jul 3, 2025

Something is problematic when i start unexcluded. I'm not sure what but in the docker compose file i always get some kind of database error during execution:

Error 1
2025-07-03 08:37:12 concurrent.futures.process._RemoteTraceback: 
2025-07-03 08:37:12 """
2025-07-03 08:37:12 Traceback (most recent call last):
2025-07-03 08:37:12   File "/usr/local/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 1089, in _commit_impl
2025-07-03 08:37:12     self.engine.dialect.do_commit(self.connection)
2025-07-03 08:37:12   File "/usr/local/lib/python3.9/site-packages/sqlalchemy/engine/default.py", line 686, in do_commit
2025-07-03 08:37:12     dbapi_connection.commit()
2025-07-03 08:37:12 psycopg2.DatabaseError: error with status PGRES_TUPLES_OK and no message from the libpq
2025-07-03 08:37:12 
2025-07-03 08:37:12 The above exception was the direct cause of the following exception:
2025-07-03 08:37:12 
2025-07-03 08:37:12 Traceback (most recent call last):
2025-07-03 08:37:12   File "/usr/local/lib/python3.9/concurrent/futures/process.py", line 246, in _process_worker
2025-07-03 08:37:12     r = call_item.fn(*call_item.args, **call_item.kwargs)
2025-07-03 08:37:12   File "/app/src/handler/custom/sharepoint.py", line 112, in __extract
2025-07-03 08:37:12     integration_util.delta_load(
2025-07-03 08:37:12   File "/app/src/util/integration.py", line 99, in delta_load
2025-07-03 08:37:12     integration_record_manager.create(
2025-07-03 08:37:12   File "/app/submodules/model/integration_objects/manager.py", line 143, in create
2025-07-03 08:37:12     general.add(integration_record, with_commit)
2025-07-03 08:37:12   File "/app/submodules/model/business_objects/general.py", line 99, in add
2025-07-03 08:37:12     flush_or_commit(with_commit)
2025-07-03 08:37:12   File "/app/submodules/model/business_objects/general.py", line 142, in flush_or_commit
2025-07-03 08:37:12     session.commit()
2025-07-03 08:37:12   File "<string>", line 2, in commit
2025-07-03 08:37:12   File "/usr/local/lib/python3.9/site-packages/sqlalchemy/orm/session.py", line 1451, in commit
2025-07-03 08:37:12     self._transaction.commit(_to_root=self.future)
2025-07-03 08:37:12   File "/usr/local/lib/python3.9/site-packages/sqlalchemy/orm/session.py", line 836, in commit
2025-07-03 08:37:12     trans.commit()
2025-07-03 08:37:12   File "/usr/local/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 2459, in commit
2025-07-03 08:37:12     self._do_commit()
2025-07-03 08:37:12   File "/usr/local/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 2649, in _do_commit
2025-07-03 08:37:12     self._connection_commit_impl()
2025-07-03 08:37:12   File "/usr/local/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 2620, in _connection_commit_impl
2025-07-03 08:37:12     self.connection._commit_impl()
2025-07-03 08:37:12   File "/usr/local/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 1091, in _commit_impl
2025-07-03 08:37:12     self._handle_dbapi_exception(e, None, None, None, None)
2025-07-03 08:37:12   File "/usr/local/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 2124, in _handle_dbapi_exception
2025-07-03 08:37:12     util.raise_(
2025-07-03 08:37:12   File "/usr/local/lib/python3.9/site-packages/sqlalchemy/util/compat.py", line 208, in raise_
2025-07-03 08:37:12     raise exception
2025-07-03 08:37:12   File "/usr/local/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 1089, in _commit_impl
2025-07-03 08:37:12     self.engine.dialect.do_commit(self.connection)
2025-07-03 08:37:12   File "/usr/local/lib/python3.9/site-packages/sqlalchemy/engine/default.py", line 686, in do_commit
2025-07-03 08:37:12     dbapi_connection.commit()
2025-07-03 08:37:12 sqlalchemy.exc.DatabaseError: (psycopg2.DatabaseError) error with status PGRES_TUPLES_OK and no message from the libpq
2025-07-03 08:37:12 (Background on this error at: https://sqlalche.me/e/14/4xp6)
2025-07-03 08:37:12 """
2025-07-03 08:37:12 
2025-07-03 08:37:12 The above exception was the direct cause of the following exception:
2025-07-03 08:37:12 
2025-07-03 08:37:12 Traceback (most recent call last):
2025-07-03 08:37:12   File "/app/src/handler/base.py", line 25, in extract
2025-07-03 08:37:12     return handler.extract(integration=integration)
2025-07-03 08:37:12   File "/app/src/handler/manager.py", line 24, in extract
2025-07-03 08:37:12     return sharepoint.extract(integration)
2025-07-03 08:37:12   File "/app/src/handler/custom/sharepoint.py", line 177, in extract
2025-07-03 08:37:12     documents = _extract_multi_process(
2025-07-03 08:37:12   File "/app/src/handler/custom/sharepoint.py", line 281, in _extract_multi_process
2025-07-03 08:37:12     for document in map(lambda x: x.result(), futures):
2025-07-03 08:37:12   File "/app/src/handler/custom/sharepoint.py", line 281, in <lambda>
2025-07-03 08:37:12     for document in map(lambda x: x.result(), futures):
2025-07-03 08:37:12   File "/usr/local/lib/python3.9/concurrent/futures/_base.py", line 439, in result
2025-07-03 08:37:12     return self.__get_result()
2025-07-03 08:37:12   File "/usr/local/lib/python3.9/concurrent/futures/_base.py", line 391, in __get_result
2025-07-03 08:37:12     raise self._exception
2025-07-03 08:37:12 sqlalchemy.exc.DatabaseError: (psycopg2.DatabaseError) error with status PGRES_TUPLES_OK and no message from the libpq
2025-07-03 08:37:12 (Background on this error at: https://sqlalche.me/e/14/4xp6)
2025-07-03 08:37:12 
2025-07-03 08:37:12 Exception in thread Thread-4:
2025-07-03 08:37:12 concurrent.futures.process._RemoteTraceback: 
2025-07-03 08:37:12 """
2025-07-03 08:37:12 Traceback (most recent call last):
2025-07-03 08:37:12   File "/usr/local/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 1089, in _commit_impl
2025-07-03 08:37:12     self.engine.dialect.do_commit(self.connection)
2025-07-03 08:37:12   File "/usr/local/lib/python3.9/site-packages/sqlalchemy/engine/default.py", line 686, in do_commit
2025-07-03 08:37:12     dbapi_connection.commit()
2025-07-03 08:37:12 psycopg2.DatabaseError: error with status PGRES_TUPLES_OK and no message from the libpq
2025-07-03 08:37:12 
2025-07-03 08:37:12 The above exception was the direct cause of the following exception:
2025-07-03 08:37:12 
2025-07-03 08:37:12 Traceback (most recent call last):
2025-07-03 08:37:12   File "/usr/local/lib/python3.9/concurrent/futures/process.py", line 246, in _process_worker
2025-07-03 08:37:12     r = call_item.fn(*call_item.args, **call_item.kwargs)
2025-07-03 08:37:12   File "/app/src/handler/custom/sharepoint.py", line 112, in __extract
2025-07-03 08:37:12     integration_util.delta_load(
2025-07-03 08:37:12   File "/app/src/util/integration.py", line 99, in delta_load
2025-07-03 08:37:12     integration_record_manager.create(
2025-07-03 08:37:12   File "/app/submodules/model/integration_objects/manager.py", line 143, in create
2025-07-03 08:37:12     general.add(integration_record, with_commit)
2025-07-03 08:37:12   File "/app/submodules/model/business_objects/general.py", line 99, in add
2025-07-03 08:37:12     flush_or_commit(with_commit)
2025-07-03 08:37:12   File "/app/submodules/model/business_objects/general.py", line 142, in flush_or_commit
2025-07-03 08:37:12     session.commit()
2025-07-03 08:37:12   File "<string>", line 2, in commit
2025-07-03 08:37:12   File "/usr/local/lib/python3.9/site-packages/sqlalchemy/orm/session.py", line 1451, in commit
2025-07-03 08:37:12     self._transaction.commit(_to_root=self.future)
2025-07-03 08:37:12   File "/usr/local/lib/python3.9/site-packages/sqlalchemy/orm/session.py", line 836, in commit
2025-07-03 08:37:12     trans.commit()
2025-07-03 08:37:12   File "/usr/local/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 2459, in commit
2025-07-03 08:37:12     self._do_commit()
2025-07-03 08:37:12   File "/usr/local/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 2649, in _do_commit
2025-07-03 08:37:12     self._connection_commit_impl()
2025-07-03 08:37:12   File "/usr/local/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 2620, in _connection_commit_impl
2025-07-03 08:37:12     self.connection._commit_impl()
2025-07-03 08:37:12   File "/usr/local/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 1091, in _commit_impl
2025-07-03 08:37:12     self._handle_dbapi_exception(e, None, None, None, None)
2025-07-03 08:37:12   File "/usr/local/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 2124, in _handle_dbapi_exception
2025-07-03 08:37:12     util.raise_(
2025-07-03 08:37:12   File "/usr/local/lib/python3.9/site-packages/sqlalchemy/util/compat.py", line 208, in raise_
2025-07-03 08:37:12     raise exception
2025-07-03 08:37:12   File "/usr/local/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 1089, in _commit_impl
2025-07-03 08:37:12     self.engine.dialect.do_commit(self.connection)
2025-07-03 08:37:12   File "/usr/local/lib/python3.9/site-packages/sqlalchemy/engine/default.py", line 686, in do_commit
2025-07-03 08:37:12     dbapi_connection.commit()
2025-07-03 08:37:12 sqlalchemy.exc.DatabaseError: (psycopg2.DatabaseError) error with status PGRES_TUPLES_OK and no message from the libpq
2025-07-03 08:37:12 (Background on this error at: https://sqlalche.me/e/14/4xp6)
2025-07-03 08:37:12 """
2025-07-03 08:37:12 
2025-07-03 08:37:12 The above exception was the direct cause of the following exception:
2025-07-03 08:37:12 
2025-07-03 08:37:12 Traceback (most recent call last):
2025-07-03 08:37:12   File "/app/src/handler/base.py", line 25, in extract
2025-07-03 08:37:12     return handler.extract(integration=integration)
2025-07-03 08:37:12   File "/app/src/handler/manager.py", line 24, in extract
2025-07-03 08:37:12     return sharepoint.extract(integration)
2025-07-03 08:37:12   File "/app/src/handler/custom/sharepoint.py", line 177, in extract
2025-07-03 08:37:12     documents = _extract_multi_process(
2025-07-03 08:37:12   File "/app/src/handler/custom/sharepoint.py", line 281, in _extract_multi_process
2025-07-03 08:37:12     for document in map(lambda x: x.result(), futures):
2025-07-03 08:37:12   File "/app/src/handler/custom/sharepoint.py", line 281, in <lambda>
2025-07-03 08:37:12     for document in map(lambda x: x.result(), futures):
2025-07-03 08:37:12   File "/usr/local/lib/python3.9/concurrent/futures/_base.py", line 439, in result
2025-07-03 08:37:12     return self.__get_result()
2025-07-03 08:37:12   File "/usr/local/lib/python3.9/concurrent/futures/_base.py", line 391, in __get_result
2025-07-03 08:37:12     raise self._exception
2025-07-03 08:37:12 sqlalchemy.exc.DatabaseError: (psycopg2.DatabaseError) error with status PGRES_TUPLES_OK and no message from the libpq
2025-07-03 08:37:12 (Background on this error at: https://sqlalche.me/e/14/4xp6)
2025-07-03 08:37:12 
2025-07-03 08:37:12 During handling of the above exception, another exception occurred:
2025-07-03 08:37:12 
2025-07-03 08:37:12 Traceback (most recent call last):
2025-07-03 08:37:12   File "/usr/local/lib/python3.9/threading.py", line 980, in _bootstrap_inner
2025-07-03 08:37:12     self.run()
2025-07-03 08:37:12   File "/usr/local/lib/python3.9/threading.py", line 917, in run
2025-07-03 08:37:12     self._target(*self._args, **self._kwargs)
2025-07-03 08:37:12   File "/app/src/controller/integrations/manager.py", line 97, in __execute_langchain_integration
2025-07-03 08:37:12     documents, delta_criteria = handler.extract(integration=integration)
2025-07-03 08:37:12   File "/app/src/handler/base.py", line 35, in extract
2025-07-03 08:37:12     raise HTTPException(status_code=500, detail="Integration failed")
2025-07-03 08:37:12 fastapi.exceptions.HTTPException: 500: Integration failed
Error 2
2025-07-03 08:35:45 Traceback (most recent call last):
2025-07-03 08:35:45   File "/usr/local/lib/python3.9/concurrent/futures/process.py", line 246, in _process_worker
2025-07-03 08:35:45     r = call_item.fn(*call_item.args, **call_item.kwargs)
2025-07-03 08:35:45   File "/app/src/handler/custom/sharepoint.py", line 85, in __extract
2025-07-03 08:35:45     extract_kwargs = make_extract_kwargs(integration)
2025-07-03 08:35:45   File "/app/src/handler/custom/sharepoint.py", line 48, in make_extract_kwargs
2025-07-03 08:35:45     extract_kwargs["credentials"] = o365_util.get_credentials(
2025-07-03 08:35:45   File "/app/src/util/o365.py", line 106, in get_credentials
2025-07-03 08:35:45     certificate_passphrase = env_vars_db_bo.get_by_name_and_org_id(
2025-07-03 08:35:45   File "/app/submodules/model/cognition_objects/environment_variable.py", line 57, in get_by_name_and_org_id
2025-07-03 08:35:45     session.query(CognitionEnvironmentVariable)
2025-07-03 08:35:45   File "/usr/local/lib/python3.9/site-packages/sqlalchemy/orm/query.py", line 2823, in first
2025-07-03 08:35:45     return self.limit(1)._iter().first()
2025-07-03 08:35:45   File "/usr/local/lib/python3.9/site-packages/sqlalchemy/engine/result.py", line 1452, in first
2025-07-03 08:35:45     return self._only_one_row(
2025-07-03 08:35:45   File "/usr/local/lib/python3.9/site-packages/sqlalchemy/engine/result.py", line 559, in _only_one_row
2025-07-03 08:35:45     row = onerow(hard_close=True)
2025-07-03 08:35:45   File "/usr/local/lib/python3.9/site-packages/sqlalchemy/engine/result.py", line 1340, in _fetchone_impl
2025-07-03 08:35:45     return self._real_result._fetchone_impl(hard_close=hard_close)
2025-07-03 08:35:45   File "/usr/local/lib/python3.9/site-packages/sqlalchemy/engine/result.py", line 1743, in _fetchone_impl
2025-07-03 08:35:45     row = next(self.iterator, _NO_ROW)
2025-07-03 08:35:45   File "/usr/local/lib/python3.9/site-packages/sqlalchemy/orm/loading.py", line 147, in chunks
2025-07-03 08:35:45     fetch = cursor._raw_all_rows()
2025-07-03 08:35:45   File "/usr/local/lib/python3.9/site-packages/sqlalchemy/engine/result.py", line 393, in _raw_all_rows
2025-07-03 08:35:45     return [make_row(row) for row in rows]
2025-07-03 08:35:45   File "/usr/local/lib/python3.9/site-packages/sqlalchemy/engine/result.py", line 393, in <listcomp>
2025-07-03 08:35:45     return [make_row(row) for row in rows]
2025-07-03 08:35:45 RuntimeError: number of values in row (20) differ from number of column processors (9)
2025-07-03 08:35:45 """
2025-07-03 08:35:45 
2025-07-03 08:35:45 The above exception was the direct cause of the following exception:
2025-07-03 08:35:45 
2025-07-03 08:35:45 Traceback (most recent call last):
2025-07-03 08:35:45   File "/app/src/handler/base.py", line 25, in extract
2025-07-03 08:35:45     return handler.extract(integration=integration)
2025-07-03 08:35:45   File "/app/src/handler/manager.py", line 24, in extract
2025-07-03 08:35:45     return sharepoint.extract(integration)
2025-07-03 08:35:45   File "/app/src/handler/custom/sharepoint.py", line 177, in extract
2025-07-03 08:35:45     documents = _extract_multi_process(
2025-07-03 08:35:45   File "/app/src/handler/custom/sharepoint.py", line 281, in _extract_multi_process
2025-07-03 08:35:45     for document in map(lambda x: x.result(), futures):
2025-07-03 08:35:45   File "/app/src/handler/custom/sharepoint.py", line 281, in <lambda>
2025-07-03 08:35:45     for document in map(lambda x: x.result(), futures):
2025-07-03 08:35:45   File "/usr/local/lib/python3.9/concurrent/futures/_base.py", line 439, in result
2025-07-03 08:35:45     return self.__get_result()
2025-07-03 08:35:45   File "/usr/local/lib/python3.9/concurrent/futures/_base.py", line 391, in __get_result
2025-07-03 08:35:45     raise self._exception
2025-07-03 08:35:45 RuntimeError: number of values in row (20) differ from number of column processors (9)
2025-07-03 08:35:45 
2025-07-03 08:35:45 Exception in thread Thread-2:
2025-07-03 08:35:45 concurrent.futures.process._RemoteTraceback: 
2025-07-03 08:35:45 """
2025-07-03 08:35:45 Traceback (most recent call last):
2025-07-03 08:35:45   File "/usr/local/lib/python3.9/concurrent/futures/process.py", line 246, in _process_worker
2025-07-03 08:35:45     r = call_item.fn(*call_item.args, **call_item.kwargs)
2025-07-03 08:35:45   File "/app/src/handler/custom/sharepoint.py", line 85, in __extract
2025-07-03 08:35:45     extract_kwargs = make_extract_kwargs(integration)
2025-07-03 08:35:45   File "/app/src/handler/custom/sharepoint.py", line 48, in make_extract_kwargs
2025-07-03 08:35:45     extract_kwargs["credentials"] = o365_util.get_credentials(
2025-07-03 08:35:45   File "/app/src/util/o365.py", line 106, in get_credentials
2025-07-03 08:35:45     certificate_passphrase = env_vars_db_bo.get_by_name_and_org_id(
2025-07-03 08:35:45   File "/app/submodules/model/cognition_objects/environment_variable.py", line 57, in get_by_name_and_org_id
2025-07-03 08:35:45     session.query(CognitionEnvironmentVariable)
2025-07-03 08:35:45   File "/usr/local/lib/python3.9/site-packages/sqlalchemy/orm/query.py", line 2823, in first
2025-07-03 08:35:45     return self.limit(1)._iter().first()
2025-07-03 08:35:45   File "/usr/local/lib/python3.9/site-packages/sqlalchemy/engine/result.py", line 1452, in first
2025-07-03 08:35:45     return self._only_one_row(
2025-07-03 08:35:45   File "/usr/local/lib/python3.9/site-packages/sqlalchemy/engine/result.py", line 559, in _only_one_row
2025-07-03 08:35:45     row = onerow(hard_close=True)
2025-07-03 08:35:45   File "/usr/local/lib/python3.9/site-packages/sqlalchemy/engine/result.py", line 1340, in _fetchone_impl
2025-07-03 08:35:45     return self._real_result._fetchone_impl(hard_close=hard_close)
2025-07-03 08:35:45   File "/usr/local/lib/python3.9/site-packages/sqlalchemy/engine/result.py", line 1743, in _fetchone_impl
2025-07-03 08:35:45     row = next(self.iterator, _NO_ROW)
2025-07-03 08:35:45   File "/usr/local/lib/python3.9/site-packages/sqlalchemy/orm/loading.py", line 147, in chunks
2025-07-03 08:35:45     fetch = cursor._raw_all_rows()
2025-07-03 08:35:45   File "/usr/local/lib/python3.9/site-packages/sqlalchemy/engine/result.py", line 393, in _raw_all_rows
2025-07-03 08:35:45     return [make_row(row) for row in rows]
2025-07-03 08:35:45   File "/usr/local/lib/python3.9/site-packages/sqlalchemy/engine/result.py", line 393, in <listcomp>
2025-07-03 08:35:45     return [make_row(row) for row in rows]
2025-07-03 08:35:45 RuntimeError: number of values in row (20) differ from number of column processors (9)
2025-07-03 08:35:45 """
2025-07-03 08:35:45 
2025-07-03 08:35:45 The above exception was the direct cause of the following exception:
2025-07-03 08:35:45 
2025-07-03 08:35:45 Traceback (most recent call last):
2025-07-03 08:35:45   File "/app/src/handler/base.py", line 25, in extract
2025-07-03 08:35:45     return handler.extract(integration=integration)
2025-07-03 08:35:45   File "/app/src/handler/manager.py", line 24, in extract
2025-07-03 08:35:45     return sharepoint.extract(integration)
2025-07-03 08:35:45   File "/app/src/handler/custom/sharepoint.py", line 177, in extract
2025-07-03 08:35:45     documents = _extract_multi_process(
2025-07-03 08:35:45   File "/app/src/handler/custom/sharepoint.py", line 281, in _extract_multi_process
2025-07-03 08:35:45     for document in map(lambda x: x.result(), futures):
2025-07-03 08:35:45   File "/app/src/handler/custom/sharepoint.py", line 281, in <lambda>
2025-07-03 08:35:45     for document in map(lambda x: x.result(), futures):
2025-07-03 08:35:45   File "/usr/local/lib/python3.9/concurrent/futures/_base.py", line 439, in result
2025-07-03 08:35:45     return self.__get_result()
2025-07-03 08:35:45   File "/usr/local/lib/python3.9/concurrent/futures/_base.py", line 391, in __get_result
2025-07-03 08:35:45     raise self._exception
2025-07-03 08:35:45 RuntimeError: number of values in row (20) differ from number of column processors (9)
2025-07-03 08:35:45 
2025-07-03 08:35:45 During handling of the above exception, another exception occurred:
2025-07-03 08:35:45 
2025-07-03 08:35:45 Traceback (most recent call last):
2025-07-03 08:35:45   File "/usr/local/lib/python3.9/threading.py", line 980, in _bootstrap_inner
2025-07-03 08:35:45     self.run()
2025-07-03 08:35:45   File "/usr/local/lib/python3.9/threading.py", line 917, in run
2025-07-03 08:35:45     self._target(*self._args, **self._kwargs)
2025-07-03 08:35:45   File "/app/src/controller/integrations/manager.py", line 97, in __execute_langchain_integration
2025-07-03 08:35:45     documents, delta_criteria = handler.extract(integration=integration)
2025-07-03 08:35:45   File "/app/src/handler/base.py", line 35, in extract
2025-07-03 08:35:45     raise HTTPException(status_code=500, detail="Integration failed")
2025-07-03 08:35:45 fastapi.exceptions.HTTPException: 500: Integration failed

The PGRES_TUPLES_OK error seems to be related to the process pool working wrongly with the session object (at least according to chatgpt :D )

GPT Answer

The root of what you’re seeing is actually two different but related problems:


1. “number of values in row (20) differ from number of column processors (9)”

This comes straight out of SQLAlchemy’s row‐mapping machinery: your ORM class definition for CognitionEnvironmentVariable declares 9 mapped columns, but the actual query is returning 20 columns. In other words, your Python model and your PostgreSQL table schema are out of sync.

How to verify and fix

  1. Inspect the real table schema:

    \d cognition_environment_variable

    in psql (or

    SELECT column_name, data_type
      FROM information_schema.columns
     WHERE table_name='cognition_environment_variable';

    if you’re not in psql).

  2. Compare to your ORM model:
    In Python, open app/submodules/model/cognition_objects/environment_variable.py and count how many Column(...) definitions the class has, and what their names/types are.

  3. Reconcile:

    • If the database has columns you no longer need, either drop them in a migration or explicitly exclude them in your ORM.
    • If your code added new columns that the DB doesn’t yet know about, create an Alembic (or whatever migration tool you’re using) revision to ALTER TABLE ADD COLUMN ….

After you’ve aligned model ←→ table, that 20-vs‑9 error will go away.


2. “psycopg2.DatabaseError: error with status PGRES_TUPLES_OK and no message from the libpq” in child processes

This one is a classic side‑effect of using a forked worker (via ProcessPoolExecutor) after you’ve already established a connection pool in the parent. Post‑fork you end up with two processes sharing the same sockets/pool internals, and psycopg2 can enter a weird state where the libpq call succeeds but doesn’t return a proper message.

The cure is: make sure each worker process creates its own engine/session, and never inherits an open pool from the parent.

Two straightforward strategies:

  1. Use spawn instead of the default fork
    On Python 3.8+ you can do:

    import multiprocessing as mp
    
    mp.set_start_method("spawn", force=True)

    before you import or create your SQLAlchemy engine. “spawn” will import and initialize everything anew, so no shared sockets.

  2. Re-create your session/engine inside each worker
    Instead of passing a pre-built Session() into your __extract function, have the worker do:

    # Inside __extract(), at the very top:
    from your_app.db import create_engine_and_sessionmaker
    
    engine, SessionLocal = create_engine_and_sessionmaker()
    session = SessionLocal()
    try:
        # … do your queries/commits …
    finally:
        session.close()
        engine.dispose()

    That guarantees no connection is ever shared across process boundaries.

Either of those approaches will eliminate the “PGRES_TUPLES_OK” commit‐without‐message error.


Putting it all together

  1. Synchronize your model and your DB schema
    ‣ Run an ALTER TABLE … ADD COLUMN or drop extra columns until your model’s 9 columns match the table’s 9 columns.

  2. Isolate your DB connections per worker
    ‣ Either switch to the “spawn” start method or reinstantiate your Engine and Session in each child process so you don’t inadvertently share a socket/pool.

Once those are both addressed, your sharepoint extraction should proceed cleanly:

  • no more “column processor” mismatch,
  • no more silent commit failures.

Let me know if you hit any snags while reconciling your schema or refactoring your multiprocessing setup!


I'll work on this (jwittmeyer).

  • resolved

@JWittmeyer
Copy link
Member

JWittmeyer commented Jul 3, 2025

I think currently we can remove language form the calculated attributes since it's calculating from the summary which we force to be in the language of the tokenizer

  • resolved
  • kept because of overhead

@JWittmeyer
Copy link
Member

JWittmeyer commented Jul 3, 2025

Filter merging with access management doesn't work (filter e.g. extension + ANNOTATOR user)

i'll look into it

  • resolved

@JWittmeyer JWittmeyer merged commit d010f94 into dev Jul 4, 2025
1 check passed
@JWittmeyer JWittmeyer deleted the cognition-integration-provider branch July 4, 2025 10:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants