Skip to content

Conversation

liuzongyue6
Copy link
Collaborator

@liuzongyue6 liuzongyue6 commented May 14, 2025

PostgreSQL Integration for Distributed Two-View Estimation

Description

This PR introduces a Dask with PostgreSQL database integration for storing computation results and performance monitoring

Key Features

  • Local/Remote Computing: Local Dask scheduler with remote workers connected via SSH tunnels
  • Database Integration: PostgreSQL from Docker storage for two-view estimation results and detailed reports

Files Added

  • gtsfm/common/dask_db_module_base.py - Base class for database
  • gtsfm/common/postgres_client.py - PostgreSQL client with connection management
  • gtsfm/configs/local_scheduler_postgres_remote_cluster.yaml - Cluster and Database, related port configuration
  • scripts/local_scheduler_db_remote_worker_demo.py - Toy example: distributed cluster test
  • scripts/two_view_estimator_dask_postgres.py - Full two-view estimator test with database
  • gtsfm/utils/ssh_tunneling.py - Manages SSH tunnels and Dask scheduler/worker start
  • scripts/README_DASK_LOCAL_REMOTE.md added for running local scheduler and remote work

Files Modified

  • gtsfm/two_view_estimator.py - Added database integration and result storage
  • gtsfm/two_view_estimator/cacher.py - Added default i1 and i2 parameters
  • environment_linux.yml / environment_linux_cpuonly.yml - Added psycopg2 dependency

Setup and Usage

  • PostgreSQL Database: Ensure PostgreSQL is set up according to the *.yaml and running locally before GTSFM
  • Remote Machines: Ensure all remote worker machines are on the same branch
  • SSH Access: Configure passwordless SSH access to remote machines

Configuration

Edit gtsfm/configs/local_scheduler_postgres_remote_cluster.yaml:

username: your_username
workers:
  - your-remote-server.com
database:
  host: localhost
  port: 5432
  database: postgres
  user: postgres
  password: "your_password"

Running Tests

# Basic cluster functionality test
python tests/test_local_scheduler_db_remote_worker_config.py

# Full two-view estimator test
python tests/test_two_view_estimator_dask_postgres.py

Architecture

[Local Machine]
├── Dask Scheduler port 8788, dashboard 8787
├── PostgreSQL Database (port 5432)
└── SSH Tunnels to Remote Workers
[Remote Workers]
├── Dask Workers port 9000
├── SSH Tunnel Connection
└── Database Access via Tunnel

Known Issues

  • Test only only lab cluster eagle
  • The port conflicted resolution always fail

@dellaert dellaert requested a review from Copilot May 14, 2025 14:00
Copilot

This comment was marked as outdated.

Copy link
Member

@dellaert dellaert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very cool! Added some comments (on top of reasonable co-pilot review).
CI seems to fail (more than usual).

@akshay-krishnan
Copy link
Collaborator

@liuzongyue6 let me know when all comments are addressed and you need another review.

@akshay-krishnan
Copy link
Collaborator

before I re-review:

  • add type hints for all methods/functions you introduce.
  • make sure type hints for existing methods are not removed, if youre adding a new input / output to existing methods, please add the type hint as well.
  • update method docstrings accordingly
  • make sure no code is removed / changed without good reason.

@liuzongyue6 liuzongyue6 requested a review from Copilot June 10, 2025 12:51
Copilot

This comment was marked as outdated.

Copy link
Collaborator

@akshay-krishnan akshay-krishnan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good improvements, some concerns still remain

@liuzongyue6 liuzongyue6 requested a review from Copilot June 13, 2025 03:22
Copilot

This comment was marked as outdated.

Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR integrates PostgreSQL database support and distributed computing via Dask into the two-view estimation pipeline while introducing SSH tunneling for secure remote worker connections. Key changes include new modules for database connectivity and SSH tunnel management, updates to the two-view estimator for database result storage, and adjustments in configuration and environment files to support PostgreSQL.

Reviewed Changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
scripts/two_view_estimator_dask_postgres.py Implements Dask-based distributed two-view estimation with database integration and process management.
scripts/local_scheduler_db_remote_worker_demo.py Provides a demo for setting up a distributed cluster with SSH tunnels and PostgreSQL.
gtsfm/two_view_estimator.py Extends the two-view estimator to support PostgreSQL result storage and schema initialization.
gtsfm/common/postgres_client.py Introduces a PostgreSQL client for managing database connections and queries.
gtsfm/common/dask_db_module_base.py Creates a base class for Dask modules interacting with the PostgreSQL database.
Other files (configs, environment, docs) Update configuration and documentation to support new PostgreSQL and SSH tunneling features.
Comments suppressed due to low confidence (1)

gtsfm/two_view_estimator.py:456

  • The start_time value is passed directly to the result storage function without computing the elapsed computation time. It is recommended to calculate the duration (e.g. time.time() - start_time) before storing to ensure that the recorded 'computation_time' is meaningful.
self.store_computation_results(

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants