A comprehensive, rigorous research workflow system for foundational AI models using specialized agents.
This system provides a complete workflow from literature review to publication-ready research, preventing common methodological failures like circular logic, unvalidated measures, and non-reproducible results.
- β 4 Specialized Agents - Clear separation of concerns
- β Prevents Circular Logic - Automated detection and blocking
- β Flexible Validation - Expert annotation or rigorous alternatives
- β Pre-specified Analysis - Prevents p-hacking and HARKing
- β Multi-stage Review - 5 comprehensive review stages
- β Fully Reproducible - Complete documentation and verification
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β RESEARCH AGENT SYSTEM β
β 4 Specialized Agents β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Agent 1: RESEARCH SCOUT π
ββ Literature review
ββ Gap identification
ββ Research question formulation
β
Agent 2: METHODOLOGY ARCHITECT ποΈ
ββ Methodology design
ββ Validation strategy (expert or alternative)
ββ Circular logic detection
ββ Statistical analysis plan
β
Agent 3: EXPERIMENT EXECUTOR βοΈ
ββ Ground truth collection
ββ Measure validation (blocks until validated)
ββ Experiment execution
ββ Quality control
β
Agent 4: RESULTS ANALYST & REVIEWER π
ββ Statistical analysis
ββ Honest interpretation
ββ Multi-stage review (5 stages)
ββ Publication preparation
Want to use this with Claude right now?
- Download pre-packaged skills: See
claude-skills-zips/folder - Upload to Claude: Settings > Capabilities > Upload skill
- Start researching: Claude will automatically use the skills!
π Full setup guide: CLAUDE_SKILLS_SETUP.md
Read the documentation in order:
- AGENT_SYSTEM.md - System overview and architecture
- RESEARCH_WORKFLOW.md - Complete workflow with all phases
- VALIDATION_OPTIONS.md - Validation strategies when experts unavailable
Each agent has a detailed skill document:
- Research Scout Agent - Literature review and gap identification
- Methodology Architect Agent - Methodology design and validation planning
- Experiment Executor Agent - Validation execution and experiments
- Results Analyst & Reviewer Agent - Analysis and comprehensive review
Skills are organized by research phase:
- Literature Review Skill - Systematic literature search and synthesis
- Research Methodology Validator - Prevents circular logic, enforces validation
- Validation Without Humans Skill - Alternative validation strategies
- Experiment Design Skill - Rigorous experiment design
- Results Analysis Skill - Statistical analysis with effect sizes
- Research Review Skill - Multi-stage review process
| Phase | Agent | Duration | Output |
|---|---|---|---|
| 1. Literature Review | Research Scout | 2-4 weeks | Research questions |
| 2. Methodology Design | Methodology Architect | 2-3 weeks | Methodology + validation plan |
| 3. Validation & Experiments | Experiment Executor | 5-7 weeks | Validated data |
| 4. Analysis & Review | Results Analyst & Reviewer | 3-4 weeks | Publication-ready research |
Total: 12-18 weeks (3-4.5 months)
The system automatically detects and prevents circular validation:
# Agent 2: Methodology Architect
circular_check = self.check_circular_logic(validation_strategy)
if circular_check['has_circular_logic']:
raise ValueError("CANNOT PROCEED - Redesign validation")All measures must be validated before use:
# Agent 3: Experiment Executor
measure.validate_against_ground_truth(ground_truth)
if not measure.is_validated:
raise ValueError("Cannot use unvalidated measure")Statistical analysis plan must be written before data collection:
# Agent 2: Methodology Architect
analysis_plan = self.prespecify_statistical_analysis()
# Prevents p-hacking and HARKingAll results must be reported, not just significant ones:
# Agent 4: Results Analyst & Reviewer
if self.is_overclaiming(interpretation):
raise ValueError("Overclaiming detected - revise interpretation")- Confidence: HIGH
- Cost: $3k-9k
- Time: 2-3 months
- Requirements: nβ₯100, β₯3 experts, ΞΊβ₯0.7
- Behavioral Ground Truth - Use actual outcomes (MEDIUM confidence)
- Comparative Validation - Compare to established measures (MEDIUM confidence)
- Crowdsourced Validation - Many non-experts with quality control (MEDIUM confidence)
- Hybrid Approach - Small expert + large behavioral (MEDIUM-HIGH confidence) β Recommended
- Multiple Strategies - Combine 2-3 approaches (MEDIUM-HIGH confidence) β Best
See VALIDATION_OPTIONS.md for complete details.
Each phase has mandatory quality gates:
- Comprehensive search completed
- Gaps validated (genuine, significant, feasible, novel)
- Research questions formulated
- Constructs clearly defined
- Validation strategy independent (NO circular logic)
- Statistical plan pre-specified
- All measures validated (F1β₯0.7 or equivalent)
- Pilot successful
- Data verified
- Pre-specified plan followed
- Effect sizes with CIs reported
- All 5 review stages passed
- Independent reproduction successful
Agent 4 conducts 5 comprehensive review stages:
- Methodology Review - Constructs, validation, circular logic check
- Implementation Review - Code quality, reproducibility
- Results Review - Statistical validity, effect sizes, honest interpretation
- Contribution Review - Novelty, significance, quality
- Reproducibility Review - Independent verification
All stages must be approved before publication.
# Initialize orchestrator
orchestrator = ResearchOrchestrator()
# Run complete research workflow
research_interest = "Self-correction mechanisms in large language models"
final_output = orchestrator.run_research_workflow(research_interest)
if final_output['status'] == 'approved_for_publication':
print("β Research ready for publication!")
print(f"Publication package: {final_output['publication_package']}")
else:
print("β οΈ Revisions needed")
for issue in final_output['remaining_issues']:
print(f" - {issue}")agentic-research-prototyping/
βββ README.md (this file)
βββ docs/
β βββ AGENT_SYSTEM.md (system architecture)
β βββ RESEARCH_WORKFLOW.md (complete workflow)
β βββ VALIDATION_OPTIONS.md (validation strategies)
βββ agents/
β βββ research-scout-agent.md
β βββ methodology-architect-agent.md
β βββ experiment-executor-agent.md
β βββ results-analyst-reviewer-agent.md
βββ skills/
β βββ literature-review-skill/
β βββ research-methodology-validator/
β βββ validation-without-humans-skill/
β βββ experiment-design-skill/
β βββ results-analysis-skill/
β βββ research-review-skill/
βββ examples/
βββ (coming soon)
| Pitfall | Prevention |
|---|---|
| Circular validation | Methodology Architect detects, requires independent ground truth |
| Unvalidated measures | Experiment Executor blocks usage until validated |
| P-hacking | Pre-specified analysis plan enforced |
| HARKing | Hypotheses documented before data collection |
| Overclaiming | Results Analyst checks interpretation |
| Non-reproducible | Reproducibility review with independent verification |
| Cherry-picking | All results must be reported |
| Missing effect sizes | Analysis skill requires them |
This system is designed for rigorous research in foundational AI models. Contributions welcome:
- Additional validation strategies
- Example implementations
- Case studies
- Improvements to detection algorithms
MIT License - See LICENSE file for details
If you use this system in your research, please cite:
@software{agentic_research_prototyping,
title = {Agentic Research Prototyping: A Rigorous Workflow for Foundational AI Models},
author = {Baena, E.},
year = {2025},
url = {https://github.com/ebaenamar/agentic-research-prototyping}
}For questions or issues, please open a GitHub issue.
Status: Active Development
Last Updated: October 2025