TACC
diff --git a/‎rt/markdown/LMOD_TESTING.md‎
Lines changed: 446 additions & 0 deletions b/‎rt/markdown/LMOD_TESTING.md‎
Lines changed: 446 additions & 0 deletions
diff --git a/‎rt/markdown/TEST_COVERAGE_ANALYSIS.md‎
Lines changed: 124 additions & 0 deletions b/‎rt/markdown/TEST_COVERAGE_ANALYSIS.md‎
Lines changed: 124 additions & 0 deletions
diff --git a/‎rt/markdown/TEST_IMPROVEMENTS_SUMMARY.md‎
Lines changed: 186 additions & 0 deletions b/‎rt/markdown/TEST_IMPROVEMENTS_SUMMARY.md‎
Lines changed: 186 additions & 0 deletions
@@ -0,0 +1,124 @@
+# Markdown Test Coverage Analysis
+
+## Current Coverage
+
+### ✅ Covered
+1. **Plain text** - Basic non-markdown content
+2. **Markdown content** - Full markdown with headers, lists, emphasis, code, links
+3. **Mixed content** - Borderline case with some markdown-like patterns
+4. **Short content** - Content below 30 character threshold
+5. **Color support** - With/without LMOD_COLORIZE
+
+### ❌ Critical Gaps (False Positive Prevention)
+
+Since markdown detection is **enabled by default**, we must prevent false positives at all costs. The following scenarios need testing:
+
+## False Positive Scenarios to Test
+
+### 1. Code/Technical Content (High Risk)
+- **C includes**: `#include <stdio.h>` - hash at start but not header
+- **C preprocessor**: `#define`, `#ifdef`, `#pragma` - hash patterns
+- **Shell scripts**: `#!/bin/bash` - shebang lines
+- **Comments**: `# This is a comment` - hash but not markdown header
+- **Version numbers**: `1.2.3`, `v2.0.1` - numbers with dots
+- **Ranges**: `1.0-2.0`, `version 1.0 * beta` - dashes/asterisks
+- **File paths**: `/usr/bin/program`, `./script.sh` - paths with slashes
+- **Command options**: `--help`, `-v`, `-D` - dashes that aren't lists
+
+### 2. Variable/Environment Content (High Risk)
+- **Environment variables**: `PATH=/usr/bin`, `MODULE_VERSION=1.0` - equals signs
+- **Variable names**: `MODULE_VERSION`, `LUA_PATH` - underscores
+- **Variable references**: `${VAR}`, `$VAR` - dollar signs
+- **Assignment**: `export VAR=value` - equals signs
+
+### 3. URL/Path Content (Medium Risk)
+- **URLs without markdown**: `Visit https://example.com for more info`
+- **Email addresses**: `Contact: [email protected]`
+- **File paths**: `See /usr/local/bin/program`
+- **Relative paths**: `./scripts/install.sh`
+
+### 4. List-like Patterns (Medium Risk)
+- **Version ranges**: `1.0 - 2.0` - dash but not list
+- **Options**: `Options: -v, -h, -D` - dash-separated but not list
+- **Numbered items without markdown**: `Step 1. Do this` (not at start)
+- **Bullet points in prose**: `The - symbol is used...`
+
+### 5. Emphasis-like Patterns (Low-Medium Risk)
+- **Asterisks in text**: `version 1.0 * beta release` - single asterisk
+- **Underscores in names**: `MODULE_NAME`, `file_name.txt`
+- **Multiplication**: `2 * 3 = 6` - asterisk for math
+- **Wildcards**: `*.lua`, `file_*.txt` - asterisks/underscores
+
+### 6. Code-like Patterns (Low-Medium Risk)
+- **Backticks in prose**: `Use the 'module' command` - single quotes
+- **Code mentions**: `See the load() function` - parentheses
+- **File extensions**: `.lua`, `.so`, `.a` - dots
+
+### 7. Structure-like Patterns (Low Risk)
+- **Multi-line content**: Long paragraphs that might trigger structure detection
+- **Empty lines**: Content with blank lines but not markdown structure
+- **Long lines**: Single very long line (>60 chars) but not structured
+
+### 8. Edge Cases
+- **Exactly 30 characters**: At threshold boundary
+- **29 characters**: Just below threshold
+- **31 characters**: Just above threshold
+- **Empty content**: Empty help/whatis
+- **Whitespace only**: Only spaces/tabs/newlines
+- **Special characters**: Unicode, non-ASCII
+- **Multi-line whatis**: Whatis entries with newlines
+
+### 9. Real-world Module Examples
+- **Compiler modules**: Often have version numbers, paths, options
+- **Library modules**: Often have URLs, version info
+- **Tool modules**: Often have command examples, options
+- **Environment modules**: Often have variable assignments
+
+## Test Modules Needed
+
+1. **false_positive_code** - C includes, preprocessor, shebangs
+2. **false_positive_vars** - Environment variables, assignments
+3. **false_positive_urls** - URLs without markdown links
+4. **false_positive_lists** - List-like patterns that aren't lists
+5. **false_positive_emphasis** - Asterisks/underscores that aren't emphasis
+6. **false_positive_structure** - Structured text that isn't markdown
+7. **false_positive_edge** - Edge cases (30 chars, empty, whitespace)
+8. **false_positive_realworld** - Real-world module examples
+
+## Detection Threshold Analysis
+
+Current threshold: **score >= 3**
+
+**Strong indicators (+3 each):**
+- ATX headers (`# Header`)
+- Setext headers (`===`)
+- Code blocks (` ``` `)
+
+**Medium indicators (+2 each):**
+- Links (`[text](url)`)
+- Images (`![alt](url)`)
+- Multiple lists (`- item`)
+
+**Weak indicators (+1 each):**
+- Emphasis (`**bold**`, `*italic*`)
+- Structure (paragraphs, long lines)
+
+**To trigger false positive, need:**
+- 1 strong indicator (score = 3), OR
+- 2 medium indicators (score = 4), OR
+- 1 medium + 1 weak (score = 3), OR
+- 3 weak indicators (score = 3)
+
+**Critical test cases:**
+- Content with 1-2 weak indicators should NOT trigger (score < 3)
+- Content with patterns that look like markdown but aren't should NOT trigger
+
+## Test Strategy
+
+1. **Create comprehensive false positive tests** - Ensure common patterns don't trigger
+2. **Test threshold boundaries** - Verify 30 char limit works
+3. **Test score boundaries** - Verify score < 3 doesn't trigger
+4. **Test real-world examples** - Use actual module content patterns
+5. **Test all markdown features** - Ensure true positives work correctly
+6. **Test image support** - New feature needs coverage
+
@@ -0,0 +1,186 @@
+# Markdown Test Improvements Summary
+
+## Overview
+
+Comprehensive test coverage has been added to prevent false positive markdown detection. Since markdown detection is **enabled by default**, it's critical that common module content patterns don't trigger false positives.
+
+## Test Coverage Expansion
+
+### Before: 4 test modules, 14 test steps
+### After: 12 test modules, 39 test steps
+
+## New Test Modules Added
+
+### 1. False Positive Prevention Tests (CRITICAL)
+
+#### `false_positive_code/1.0.lua`
+**Purpose**: Test code-like patterns that should NOT trigger markdown detection
+**Patterns tested**:
+- C preprocessor: `#include <stdio.h>`, `#define MAX_SIZE 1024`
+- Shell shebangs: `#!/bin/bash`
+- Version numbers: `1.2.3`
+- File paths: `/usr/local/bin/program`, `./scripts/install.sh`
+- Command options: `--help`, `-v`, `-D`
+- Version ranges: `1.0-2.0`
+- Multiplication: `2 * 3 = 6`
+- Wildcards: `*.lua`, `file_*.txt`
+
+**Why critical**: Hash symbols, dashes, and asterisks are common in technical content but shouldn't trigger markdown detection unless they follow proper markdown syntax.
+
+#### `false_positive_vars/1.0.lua`
+**Purpose**: Test variable/environment patterns that should NOT trigger detection
+**Patterns tested**:
+- Environment variables: `PATH=/usr/bin:/usr/local/bin`
+- Variable names: `MODULE_VERSION`, `LUA_PATH`
+- Variable references: `$HOME`, `${PATH}`
+- Assignments: `export VAR=value`, `setenv("KEY", "VALUE")`
+- Configuration: `option=value`, `setting = enabled`
+
+**Why critical**: Equals signs and underscores are common in module files but shouldn't trigger markdown unless they're part of proper markdown syntax.
+
+#### `false_positive_urls/1.0.lua`
+**Purpose**: Test URLs and paths without markdown link syntax
+**Patterns tested**:
+- Plain URLs: `https://example.com/documentation`
+- Email addresses: `[email protected]`
+- File paths: `/usr/local/share/data`
+- Relative paths: `./config/settings.txt`, `../scripts/run.sh`
+
+**Why critical**: URLs are common in module help text but shouldn't trigger markdown detection unless formatted as `[text](url)`.
+
+#### `false_positive_lists/1.0.lua`
+**Purpose**: Test list-like patterns that aren't markdown lists
+**Patterns tested**:
+- Version ranges: `1.0 - 2.0`
+- Command options: `-v, -h, -D`
+- Numbered prose: `Step 1. Do this first`
+- Dashes in text: `The - symbol is used`
+
+**Why critical**: Dashes and numbers with dots are common but shouldn't trigger list detection unless at line start with proper spacing.
+
+#### `false_positive_emphasis/1.0.lua`
+**Purpose**: Test emphasis-like patterns that aren't markdown
+**Patterns tested**:
+- Asterisks in versions: `version 1.0 * beta release`
+- Multiplication: `2 * 3 = 6`
+- Wildcards: `*.lua`, `file_*.txt`
+- Underscores in names: `MODULE_NAME`, `FILE_PATH`
+- Function names: `load_module()`, `get_version()`
+
+**Why critical**: Asterisks and underscores are common in technical text but shouldn't trigger emphasis detection unless they follow proper markdown patterns.
+
+#### `false_positive_structure/1.0.lua`
+**Purpose**: Test structured text without markdown syntax
+**Patterns tested**:
+- Multiple paragraphs
+- Empty lines between paragraphs
+- Long sentences (>60 chars)
+- Well-organized content
+
+**Why critical**: Structure detection could trigger on well-formatted plain text, but should only activate with actual markdown syntax.
+
+### 2. Edge Case Tests
+
+#### `false_positive_edge/1.0.lua` - Exactly 30 characters
+**Purpose**: Test content at the detection threshold boundary
+**Content**: `"Exactly thirty chars!!"` (exactly 30 chars)
+
+#### `false_positive_edge/2.0.lua` - 29 characters
+**Purpose**: Test content just below threshold
+**Content**: `"Twenty-nine characters here!"` (29 chars)
+
+#### `false_positive_edge/3.0.lua` - 31 characters, no markdown
+**Purpose**: Test content just above threshold but without markdown indicators
+**Content**: `"This is exactly thirty-one characters long!"` (31 chars)
+
+**Why critical**: The 30-character threshold is a hard boundary - content below should never be processed, content above needs markdown indicators.
+
+### 3. Feature Tests
+
+#### `markdown_with_images/1.0.lua`
+**Purpose**: Test markdown with image syntax (new feature)
+**Patterns tested**:
+- Image syntax: `![alt text](url)`
+- Multiple images
+- Images with empty alt text
+- Images combined with other markdown
+
+**Why critical**: New image support feature needs comprehensive testing.
+
+## Test Organization
+
+Tests are organized into logical groups:
+
+1. **Basic Functionality** (steps 1-12)
+   - Plain text, markdown, mixed content, short content
+
+2. **False Positive Prevention** (steps 13-33)
+   - Code patterns, variables, URLs, lists, emphasis, structure, edge cases
+
+3. **Feature Tests** (steps 34-36)
+   - Image support
+
+4. **Color Support** (steps 37-39)
+   - Colorized output with various content types
+
+## Detection Threshold Analysis
+
+The detection system uses a scoring mechanism:
+- **Threshold**: score >= 3 triggers markdown processing
+- **Strong indicators** (+3): ATX headers, setext headers, code blocks
+- **Medium indicators** (+2): Links, images, multiple lists
+- **Weak indicators** (+1): Emphasis, structure
+
+**False positive prevention strategy**:
+- Single weak indicator (score = 1) → NOT markdown ✓
+- Two weak indicators (score = 2) → NOT markdown ✓
+- One medium indicator (score = 2) → NOT markdown ✓
+- Need 3+ points to trigger → Prevents most false positives ✓
+
+## Expected Behavior
+
+### Should NOT Trigger Markdown Detection:
+- ✅ Code-like patterns (C includes, shebangs, paths)
+- ✅ Variable patterns (env vars, assignments)
+- ✅ URLs without markdown syntax
+- ✅ List-like patterns in prose
+- ✅ Emphasis-like patterns (asterisks/underscores in names)
+- ✅ Structured text without markdown syntax
+- ✅ Content below 30 characters
+- ✅ Content above 30 chars but score < 3
+
+### SHOULD Trigger Markdown Detection:
+- ✅ Proper markdown with headers, lists, emphasis, code, links
+- ✅ Markdown with images
+- ✅ Content with score >= 3
+
+## Running the Tests
+
+```bash
+cd rt/markdown
+t .
+```
+
+The test suite will:
+1. Verify plain text remains plain text
+2. Verify markdown is processed correctly
+3. **Verify false positive scenarios don't trigger** (CRITICAL)
+4. Verify edge cases work correctly
+5. Verify image support works
+6. Verify color support works
+
+## Next Steps
+
+1. **Run tests** to generate golden files
+2. **Review output** to ensure false positives don't occur
+3. **Update golden files** if output is correct
+4. **Monitor** for any false positives in production
+
+## Critical Success Criteria
+
+✅ **Zero false positives** in false_positive_* test modules
+✅ All markdown content properly detected and processed
+✅ Edge cases handled correctly
+✅ Image support works as expected
+✅ Color support works correctly
+