Skip to content

ebaenamar/agentic-research-prototyping

Repository files navigation

Agentic Research Prototyping

A comprehensive, rigorous research workflow system for foundational AI models using specialized agents.

Overview

This system provides a complete workflow from literature review to publication-ready research, preventing common methodological failures like circular logic, unvalidated measures, and non-reproducible results.

Key Features

  • βœ… 4 Specialized Agents - Clear separation of concerns
  • βœ… Prevents Circular Logic - Automated detection and blocking
  • βœ… Flexible Validation - Expert annotation or rigorous alternatives
  • βœ… Pre-specified Analysis - Prevents p-hacking and HARKing
  • βœ… Multi-stage Review - 5 comprehensive review stages
  • βœ… Fully Reproducible - Complete documentation and verification

System Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    RESEARCH AGENT SYSTEM                         β”‚
β”‚                  4 Specialized Agents                            β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Agent 1: RESEARCH SCOUT πŸ”
β”œβ”€ Literature review
β”œβ”€ Gap identification
└─ Research question formulation
    ↓
Agent 2: METHODOLOGY ARCHITECT πŸ—οΈ
β”œβ”€ Methodology design
β”œβ”€ Validation strategy (expert or alternative)
β”œβ”€ Circular logic detection
└─ Statistical analysis plan
    ↓
Agent 3: EXPERIMENT EXECUTOR βš™οΈ
β”œβ”€ Ground truth collection
β”œβ”€ Measure validation (blocks until validated)
β”œβ”€ Experiment execution
└─ Quality control
    ↓
Agent 4: RESULTS ANALYST & REVIEWER πŸ“Š
β”œβ”€ Statistical analysis
β”œβ”€ Honest interpretation
β”œβ”€ Multi-stage review (5 stages)
└─ Publication preparation

πŸš€ Quick Start with Claude Desktop/Code

Want to use this with Claude right now?

  1. Download pre-packaged skills: See claude-skills-zips/ folder
  2. Upload to Claude: Settings > Capabilities > Upload skill
  3. Start researching: Claude will automatically use the skills!

πŸ“– Full setup guide: CLAUDE_SKILLS_SETUP.md


Quick Start (Documentation)

1. Understand the System

Read the documentation in order:

  1. AGENT_SYSTEM.md - System overview and architecture
  2. RESEARCH_WORKFLOW.md - Complete workflow with all phases
  3. VALIDATION_OPTIONS.md - Validation strategies when experts unavailable

2. Review Agent Skills

Each agent has a detailed skill document:

3. Use the Skills

Skills are organized by research phase:

Timeline

Phase Agent Duration Output
1. Literature Review Research Scout 2-4 weeks Research questions
2. Methodology Design Methodology Architect 2-3 weeks Methodology + validation plan
3. Validation & Experiments Experiment Executor 5-7 weeks Validated data
4. Analysis & Review Results Analyst & Reviewer 3-4 weeks Publication-ready research

Total: 12-18 weeks (3-4.5 months)

Key Principles

1. No Circular Logic

The system automatically detects and prevents circular validation:

# Agent 2: Methodology Architect
circular_check = self.check_circular_logic(validation_strategy)

if circular_check['has_circular_logic']:
    raise ValueError("CANNOT PROCEED - Redesign validation")

2. Mandatory Validation

All measures must be validated before use:

# Agent 3: Experiment Executor
measure.validate_against_ground_truth(ground_truth)

if not measure.is_validated:
    raise ValueError("Cannot use unvalidated measure")

3. Pre-specified Analysis

Statistical analysis plan must be written before data collection:

# Agent 2: Methodology Architect
analysis_plan = self.prespecify_statistical_analysis()
# Prevents p-hacking and HARKing

4. Honest Reporting

All results must be reported, not just significant ones:

# Agent 4: Results Analyst & Reviewer
if self.is_overclaiming(interpretation):
    raise ValueError("Overclaiming detected - revise interpretation")

Validation Strategies

Expert Annotation (Preferred)

  • Confidence: HIGH
  • Cost: $3k-9k
  • Time: 2-3 months
  • Requirements: nβ‰₯100, β‰₯3 experts, ΞΊβ‰₯0.7

Alternative Validation (When Experts Unavailable)

  1. Behavioral Ground Truth - Use actual outcomes (MEDIUM confidence)
  2. Comparative Validation - Compare to established measures (MEDIUM confidence)
  3. Crowdsourced Validation - Many non-experts with quality control (MEDIUM confidence)
  4. Hybrid Approach - Small expert + large behavioral (MEDIUM-HIGH confidence) ⭐ Recommended
  5. Multiple Strategies - Combine 2-3 approaches (MEDIUM-HIGH confidence) ⭐ Best

See VALIDATION_OPTIONS.md for complete details.

Quality Gates

Each phase has mandatory quality gates:

Gate 1: After Literature Review

  • Comprehensive search completed
  • Gaps validated (genuine, significant, feasible, novel)
  • Research questions formulated

Gate 2: After Methodology Design

  • Constructs clearly defined
  • Validation strategy independent (NO circular logic)
  • Statistical plan pre-specified

Gate 3: After Validation & Experiments

  • All measures validated (F1β‰₯0.7 or equivalent)
  • Pilot successful
  • Data verified

Gate 4: After Analysis & Review

  • Pre-specified plan followed
  • Effect sizes with CIs reported
  • All 5 review stages passed
  • Independent reproduction successful

Multi-Stage Review

Agent 4 conducts 5 comprehensive review stages:

  1. Methodology Review - Constructs, validation, circular logic check
  2. Implementation Review - Code quality, reproducibility
  3. Results Review - Statistical validity, effect sizes, honest interpretation
  4. Contribution Review - Novelty, significance, quality
  5. Reproducibility Review - Independent verification

All stages must be approved before publication.

Example Usage

# Initialize orchestrator
orchestrator = ResearchOrchestrator()

# Run complete research workflow
research_interest = "Self-correction mechanisms in large language models"

final_output = orchestrator.run_research_workflow(research_interest)

if final_output['status'] == 'approved_for_publication':
    print("βœ“ Research ready for publication!")
    print(f"Publication package: {final_output['publication_package']}")
else:
    print("⚠️ Revisions needed")
    for issue in final_output['remaining_issues']:
        print(f"  - {issue}")

Documentation Structure

agentic-research-prototyping/
β”œβ”€β”€ README.md (this file)
β”œβ”€β”€ docs/
β”‚   β”œβ”€β”€ AGENT_SYSTEM.md (system architecture)
β”‚   β”œβ”€β”€ RESEARCH_WORKFLOW.md (complete workflow)
β”‚   └── VALIDATION_OPTIONS.md (validation strategies)
β”œβ”€β”€ agents/
β”‚   β”œβ”€β”€ research-scout-agent.md
β”‚   β”œβ”€β”€ methodology-architect-agent.md
β”‚   β”œβ”€β”€ experiment-executor-agent.md
β”‚   └── results-analyst-reviewer-agent.md
β”œβ”€β”€ skills/
β”‚   β”œβ”€β”€ literature-review-skill/
β”‚   β”œβ”€β”€ research-methodology-validator/
β”‚   β”œβ”€β”€ validation-without-humans-skill/
β”‚   β”œβ”€β”€ experiment-design-skill/
β”‚   β”œβ”€β”€ results-analysis-skill/
β”‚   └── research-review-skill/
└── examples/
    └── (coming soon)

Common Pitfalls Prevented

Pitfall Prevention
Circular validation Methodology Architect detects, requires independent ground truth
Unvalidated measures Experiment Executor blocks usage until validated
P-hacking Pre-specified analysis plan enforced
HARKing Hypotheses documented before data collection
Overclaiming Results Analyst checks interpretation
Non-reproducible Reproducibility review with independent verification
Cherry-picking All results must be reported
Missing effect sizes Analysis skill requires them

Contributing

This system is designed for rigorous research in foundational AI models. Contributions welcome:

  • Additional validation strategies
  • Example implementations
  • Case studies
  • Improvements to detection algorithms

License

MIT License - See LICENSE file for details

Citation

If you use this system in your research, please cite:

@software{agentic_research_prototyping,
  title = {Agentic Research Prototyping: A Rigorous Workflow for Foundational AI Models},
  author = {Baena, E.},
  year = {2025},
  url = {https://github.com/ebaenamar/agentic-research-prototyping}
}

Contact

For questions or issues, please open a GitHub issue.


Status: Active Development

Last Updated: October 2025

About

No description, website, or topics provided.

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published