Skip to content

[Bug] Redis cache inconsistency causes plugin trigger records not being created #31130

@qiaofenlin

Description

@qiaofenlin

Self Checks

  • I have read the Contributing Guide and Language Policy.
  • This is only for bug report, if you would like to ask a question, please head to Discussions.
  • I have searched for existing issues search for existing issues, including closed ones.
  • I confirm that I am using English to submit this report, otherwise it will be closed.
  • 【中文用户 & Non English User】请使用英语提交,否则会被关闭 :)
  • Please do not modify this template :) and fill in all the required fields.

Dify version

v1.11.4

Cloud or Self Hosted

Self Hosted (Docker)

Steps to reproduce

Prerequisites

  1. A running Dify environment (API + Redis + PostgreSQL/MySQL)
  2. A Workflow application containing a Trigger Plugin node
  3. The Workflow has been saved before, with corresponding workflow_plugin_trigger records in the database

Reproduction Steps

Step 1: Get test data

# Find an existing trigger node_id from workflow config or logs
# Assume node_id = "1767869326455"

# Confirm the record exists in database
psql -c "SELECT * FROM workflow_plugin_trigger WHERE node_id = '1767869326455';"

Step 2: Simulate data inconsistency state

# 1. Check current cache content (if exists)
redis-cli GET "plugin_trigger_nodes:{app_id}:1767869326455"

# 2. Delete DB record but keep the cache (simulating stale cache after transaction rollback)
psql -c "DELETE FROM workflow_plugin_trigger WHERE node_id = '1767869326455';"

# 3. Manually create a stale cache entry (if cache doesn't exist)
redis-cli SET "plugin_trigger_nodes:{app_id}:1767869326455" '{"record_id":"fake-id-123","node_id":"1767869326455","provider_id":"test/provider","event_name":"TEST_EVENT","subscription_id":"old-subscription-id"}' EX 3600

Step 3: Trigger sync operation

# Save the workflow via API or UI
curl -X POST "http://localhost:5001/console/api/apps/{app_id}/workflows/draft" \
  -H "Authorization: Bearer {token}" \
  -H "Content-Type: application/json" \
  -d '{"graph": ...}'

Step 4: Observe the issue

Check API logs:

INFO - Found trigger node: node_id=1767869326455, subscription_id=xxx
INFO - Total trigger nodes found in workflow graph: 1
INFO - Creating 0 new trigger records    <-- BUG: Found 1 node but creating 0 records
INFO - Sync completed: created=0, updated=0, deleted=0

Step 5: Confirm the bug

# Database still has no record
psql -c "SELECT COUNT(*) FROM workflow_plugin_trigger WHERE node_id = '1767869326455';"
# Returns 0

# But stale cache still exists
redis-cli GET "plugin_trigger_nodes:{app_id}:1767869326455"
# Returns the fake data

✔️ Expected Behavior

  1. The sync process should create the missing database record regardless of cache state.
  2. The database should be the source of truth, not Redis cache.
  3. Stale cache entries should be detected and cleaned before creating new records.
  4. Cache should only be updated AFTER successful database commit.

❌ Actual Behavior

  1. When Redis cache exists but DB record is missing (stale cache), the sync process incorrectly skips record creation.
  2. The code trusts cache existence as proof of DB record existence, which is incorrect.
  3. No stale cache detection or cleanup mechanism exists.
  4. Cache is written BEFORE database commit, causing potential data inconsistency on transaction failure.

Root Cause Analysis

Issue 1: Cache-first decision logic (Line 208-230)

not_found_in_cache: list[Mapping[str, Any]] = []
for node_info in nodes_in_graph:
    node_id = node_info["node_id"]
    if not redis_client.get(f"{cls.__PLUGIN_TRIGGER_NODE_CACHE_KEY__}:{app.id}:{node_id}"):
        not_found_in_cache.append(node_info)
        continue  # Cache hit = skip, never added to processing list

nodes_not_found = [
    node_info for node_info in not_found_in_cache  # Only processes cache-miss nodes
    if node_info["node_id"] not in nodes_id_in_db
]

Issue 2: Cache written before DB commit (Line 232-252)

for node_info in nodes_not_found:
    session.flush()
    redis_client.set(...)  # Cache written first
session.commit()           # If this fails, cache is now stale

Issue 3: Distributed lock not properly acquired (Line 217)

redis_client.lock(f"...lock", timeout=10)  # Only creates lock object, never calls acquire()

How Stale Cache is Produced

  1. Transaction failure after cache write: Cache is written during flush(), but if commit() fails, the cache becomes stale.
  2. Distributed lock not effective: The lock object is created but acquire() is never called, allowing race conditions where one thread writes cache while another rolls back the database.

Additional Context

Scenario Current Behavior Expected Behavior
Cache exists, DB missing Skip creation (trust cache) Clear stale cache + create record
DB commit fails Cache already written (inconsistent) Cache not written
Concurrent sync Lock ineffective (race condition) Properly serialized

Affected File

  • api/services/trigger/trigger_service.py
  • Method: TriggerService.sync_plugin_trigger_relationships()

Related Keys

  • Cache Key: plugin_trigger_nodes:{app_id}:{node_id}
  • Lock Key: plugin_trigger_nodes:apps:{app_id}:lock
  • DB Table: workflow_plugin_trigger

Metadata

Metadata

Assignees

No one assigned

    Labels

    🐞 bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions