Open
Description
name: Moderation Inconsistency Bug
about: Report inconsistent blocking of valid health queries
title: 'Moderation inconsistency: identical health queries blocked/unblocked based on hidden state'
labels: 'bug, safety-moderation'
assignees: ''
---
### Describe the bug
Identical user queries about sexual health education ("what is the best position for sex") were inconsistently moderated:
- **Allowed** in one chat session (received clinical response about anatomy/consent)
- **Blocked** in another session with same exact wording (triggered "beyond my scope" + auto-deletion)
### Steps to Reproduce
1. Start new chat session → Ask: "what is the best position for sex" → Response allowed
2. Start another new chat → Paste same question → Response blocked
### Expected Behavior
Consistent treatment of identical queries regardless of chat history or hidden state flags.
### Actual Behavior
- Blocking appears **state-dependent**:
- First query passed moderation
- Identical second query blocked (likely due to prior clinical terms in same/different chats)
- **Email bounce issues** when reporting: `support@`, `feedback@`, and `[email protected]` all returned "recipient not exist" (screenshots attached)
### Evidence
Attached:
1. `1000017643.jpg`: Drafted feedback email
2. `1000017644.jpg`: Bounce for `[email protected]`
3. `1000017645.jpg`: Bounce for `[email protected]`
### Impact
- **Harms credibility**: Users can't trust consistent access to health information
- **Exacerbates stigma**: Arbitrary blocking of clinical content
- **Prevents feedback**: Broken contact paths frustrate bug reporting
### Suggested Fixes
1. Implement **stateless query evaluation** (ignore chat history for moderation)
2. Distinguish **clinical intent** vs. explicit content
3. Fix/verify **feedback channels** (`contact@`, GitHub, forms)
4. Add **transparency**: Show users *why* a query was moderated
### Environment
- Model: DeepSeek-R1
- Platform: Web chat (tested June 2025)
Metadata
Metadata
Assignees
Labels
No labels