A comprehensive tool for DAO governance analysis and management.
Dennison Bertram
Email: [email protected]
MIT License
Copyright (c) 2024 Dennison Bertram
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
This repository provides a comprehensive pipeline for analyzing DAO governance data. It integrates with multiple data sources including Discourse forums, Snapshot, Tally, and news APIs to provide a complete view of DAO activities and governance.
-
Forum Analysis:
- Fetch and analyze topics, posts, and user data from configured Discourse forums
- Vector embeddings for semantic search using OpenAI's
text-embedding-ada-002
- LLM-powered content evaluation and quality scoring
-
Governance Analysis:
- Integration with Snapshot proposals
- Integration with Tally governance data
- Historical data processing and analysis
-
News & Market Data:
- Automated news collection and analysis for DAOs
- Market cap tracking and analysis
- News article evaluation using LLMs
-
Search & Analytics:
- Semantic search across all content types
- Materialized views for analytics
- Common topics identification and tracking
- Real-time content evaluation
-
Management Interface:
- Web-based management dashboard
- Real-time status monitoring
- Crawl and job control
- Log viewing and analysis
-
Data Ingestion:
- Discourse Forums: Uses API keys to fetch
latest.json
, topics, and posts - Snapshot & Tally: Uses GraphQL or REST APIs to fetch governance proposals
- News API: Fetches and analyzes DAO-related news
- Market Data: Tracks market cap and related metrics
- Discourse Forums: Uses API keys to fetch
-
Database & Storage:
- PostgreSQL with pgvector: Stores all content with vector embeddings
- Knex Migrations: Database schema management
- Materialized Views: Pre-computed analytics and metrics
-
AI & Analysis:
- OpenAI Integration: For embeddings and content evaluation
- LLM Processing: Quality scoring, summarization, and analysis
- Vector Search: Semantic similarity search across all content
-
API & Interface:
- REST API: Comprehensive endpoints for all functionality
- Web Dashboard: Management interface for all features
- Monitoring: Real-time status and health checks
This repository provides a pipeline for crawling, processing, vectorizing, and analyzing Discourse forum data. It integrates with OpenAI for generating embeddings and evaluating post quality, as well as external data sources like Snapshot and Tally for proposal information. The code supports storing, searching, and analyzing content in a vector database, enabling advanced similarity searches and semantic analysis.
- Forum Crawling: Fetch topics, posts, and user data from configured Discourse forums.
- Vector Embeddings: Use OpenAI embeddings (
text-embedding-ada-002
) to vectorize textual content (topics, posts, proposals). - Semantic Search: Perform similarity search using PostgreSQL + pgvector extension.
- LLM Evaluations: Evaluate topics and posts for quality, relevance, and other metrics using GPT-based models.
- External Integrations:
- Snapshot Proposals: Fetch and evaluate Snapshot proposals.
- Tally Proposals: Fetch, update, and evaluate Tally proposals.
- Historical Processing: Reprocess older or previously unevaluated content for updated evaluations and embeddings.
- Materialized Views & Analytics: Generate comprehensive materialized views for forum activity, user engagement, topic quality, etc.
-
Data Ingestion:
- Discourse Forums: Uses API keys to fetch
latest.json
, topics, and posts. - Snapshot & Tally: Uses GraphQL or REST APIs to fetch governance proposals and their metadata.
- Discourse Forums: Uses API keys to fetch
-
Database & Storage:
- PostgreSQL with pgvector: Stores topics, posts, evaluations, vectors, proposals, and analytics tables.
- Knex Migrations: Database schema managed via migration files in
/db/migrations
.
-
Vectorization & Analysis:
- Embeddings:
services/llm/embeddings
generates embeddings and stores them in vector columns. - LLM Evaluations:
services/llm
folder handles OpenAI-based post and topic evaluations, summaries, and scoring.
- Embeddings:
-
Search & API:
- Hono-based server (
server.ts
): Serves API endpoints for searching content (/api/searchAll
,/api/searchByType
) and managing crawls (/api/crawl/*
), cron jobs, and health checks. - Search Service:
services/search/vectorSearchService.ts
provides semantic search capabilities over vector embeddings.
- Hono-based server (
-
Historical Processing:
- Scripts like
processRecentPosts.ts
,historicalCrawler.ts
, andhistoricalPostEvals.ts
reprocess old posts, topics, and proposals to generate updated evaluations and embeddings.
- Scripts like
app.ts
/server.ts
: Entry points to the server application and crawler manager.db/
: Database configuration (knexfile.js
), migrations, and model definitions.services/crawling/
: Logic to crawl Discourse forums and store data.services/llm/
: LLM utilities (OpenAI client, evaluation prompts, embeddings).services/search/
: Vector search logic and related services.config/
: Forum configurations and logging settings.utils/
: Utility functions for date formatting, request retries, token estimation, etc.demo/
: Example scripts (searchDemo.ts
,testSearch.ts
,debug.ts
) for debugging and demonstrating functionality.
- Node.js & Bun: The server and scripts may use Bun as the runtime. Ensure
bun
is installed. - PostgreSQL with pgvector: Database must have
pgvector
extension enabled. - OpenAI API Key: Set
OPENAI_API_KEY
andOPENAI_ORG_ID
environment variables. - Forum API Keys: For each configured forum in
forumConfig.ts
, set the necessaryAPI_KEY
,API_USERNAME
, andDISCOURSE_URL
. - Snapshot & Tally Credentials: If using Snapshot or Tally integrations, set corresponding keys in
.env
.
-
Install Dependencies:
bun install
-
Configure Environment: Create a
.env
file with at least:OPENAI_API_KEY=your-openai-api-key OPENAI_ORG_ID=your-openai-org-id DISCOURSE_URL=https://your-forum.discourse.example API_KEY=your-discourse-api-key API_USERNAME=your-discourse-api-username SUPABASE_CONNECTION_STRING=your-postgres-connection-string
Also set forum-specific env vars as required in
forumConfig.ts
. -
Database Migrations: Run migrations to set up tables:
bun run knex migrate:latest
-
Enable pgvector: Ensure
pgvector
extension is enabled. Example script:node enable_vector_supabase.js
Start the server (Hono + Bun):
bun run server.ts
This will start the server on a specified port (default: 3000). Visit http://localhost:3000/health
to check the health status.
-
Health Check:
GET /health
-
Search by Type:
POST /api/searchByType
- Request Body:
{ "query": "governance", "type": "post", "forum": "ARBITRUM", "limit": 50, "threshold": 0.5 }
- Request Body:
-
Search All Types:
POST /api/searchAll
- Request Body:
{ "query": "grant", "forum": "ZKSYNC", "limit": 10, "threshold": 0.7 }
- Request Body:
-
Common Topics:
POST /api/common-topics/generate
- Generate common topics from recent forum posts- Parameters:
forum
(required): The forum name to generate topics fortimeframe
(optional): Time range in PostgreSQL interval format (e.g., '7d', '2 weeks', '1 month'). Defaults to '14d'
- Parameters:
GET /api/common-topics
- Retrieve all generated common topicsGET /api/common-topics/:id
- Retrieve a specific common topic by ID
The common topics feature analyzes recent forum posts to identify and summarize frequently discussed themes and topics. This is useful for understanding the current focus of community discussions and trending subjects.
Example response:
{ "id": "123", "topic": "Governance Proposals", "summary": "Recent discussions about active governance proposals and voting mechanisms", "relevance_score": 0.85, "created_at": "2025-02-02T20:06:37.330Z" }
-
Crawl Management:
POST /api/crawl/start/:forumName
- Start crawling a specific forum.POST /api/crawl/stop/:forumName
- Stop an ongoing crawl.GET /api/crawl/status
- Get overall crawl statuses.GET /api/crawl/status/:forumName
- Get status of a specific forum crawl.
-
Cron Management:
POST /api/cron/start
- Start scheduled crawls.POST /api/cron/stop
- Stop scheduled crawls.GET /api/cron/status
- Check cron job status.
-
Reprocessing Old Posts:
processRecentPosts.ts
queries the database for unevaluated posts and uses LLM services to evaluate and store post quality metrics and embeddings.- Run:
bun run processRecentPosts.ts FORUM_NAME [BATCH_SIZE] [MAX_BATCHES]
- Run:
-
Historical Crawler:
historicalCrawler.ts
fetches older topics and posts from the forum, vectorizes and evaluates them.- Run:
bun run historicalCrawler.ts FORUM_NAME
- Run:
-
Evaluating Old Post Batches:
historicalPostEvals.ts
re-evaluates older posts in batches.- Run:
bun run historicalPostEvals.ts [BATCH_SIZE=100] [MAX_BATCHES] [FORUM_NAME]
- Run:
-
Cleanup Scripts:
- cleanDatabase.ts: Truncates all tables, useful for starting fresh.
bun run cleanDatabase.ts
- cleanDatabase.ts: Truncates all tables, useful for starting fresh.
-
Reset Database:
resetDatabase.ts
drops and recreates the public schema.- Use with caution:
bun run resetDatabase.ts
- Use with caution:
-
Enable pgvector:
enable_vector_supabase.js
ensuresvector
extension is available.- Run:
node enable_vector_supabase.js
- Run:
Migrations create materialized views for:
- Forum activity trends
- User engagement metrics
- Topic quality analysis
- Community health scores
- Leaderboards
These views can be refreshed via refresh_all_views()
function or scheduled with pg_cron
.
- Linting & Formatting: Uses ESLint and Prettier. Run
bun run lint
orbun run prettier
to check code style. - Testing: Add tests in
bun:test
compatible format. Some test files are present as examples (_rssFeed.test.ts
).
- API Keys/Configuration: Check
.env
andforumConfig.ts
if unable to fetch forum data. - Database Issues: Ensure migrations are up-to-date and pgvector is enabled.
- OpenAI Errors (Rate Limits, Insufficient Quota): LLM evaluations are wrapped with error handling. If insufficient credits, evaluations will skip.
This project is provided as-is. Review and adjust for your own use.
Contributions are welcome! Please open issues or PRs for bug fixes and enhancements.
The application is configured for deployment on Railway.app, which provides:
- Automatic deployments on git push
- PostgreSQL with pgvector support
- Environment variable management
- Health check monitoring
- Automatic restarts on failure
To deploy:
-
Create a Railway account and install the Railway CLI:
npm i -g @railway/cli
-
Login to Railway:
railway login
-
Create a new project:
railway init
-
Add PostgreSQL:
- Go to Railway dashboard
- Click "New"
- Select "Database" → "PostgreSQL"
- Enable pgvector extension in the PostgreSQL settings
-
Configure environment variables in Railway dashboard:
- Copy all variables from your
.env
file - Update
DATABASE_URL
to use Railway's PostgreSQL connection string
- Copy all variables from your
-
Deploy:
railway up
The deployment process is managed by:
railway.toml
: Configuration for build and deploy settingsProcfile
: Defines process types and commands.dockerignore
: Optimizes Docker builds by excluding unnecessary files
The server includes a /health
endpoint that Railway uses to monitor the application's status.