Meaning-Typed Programming: Language Abstraction and Runtime for Model-Integrated Applications
This artifact accompanies the OOPSLA 2025 paper "Meaning-Typed Programming: Language Abstraction and Runtime for Model-Integrated Applications". It provides a complete implementation of MTP a novel programming language abstraction that enables type-safe integration of Large Language Models (LLMs) into traditional programming workflows.
The Meaning-Typed Programming (MTP) paradigm is implemented in the Jaseci Open-Sourced ecosystem as the MTLLM plugin to the Jac programming language. What's being referred to as the 'MTP' implementation in the paper is this MTLLM plugin.
Key Innovation: MTLLM bridges the gap between the structured world of programming languages and the unstructured outputs of LLMs through a type system that captures both structural types and semantic meaning, enabling compile-time guarantees for AI-powered functions.
- Type-Safe LLM Integration: Compile-time type checking for LLM-powered functions with runtime output validation
- Automatic Output Transformation: Runtime system that converts unstructured LLM outputs into typed programming language objects
- Semantic Type System: Type annotations that capture both structural types (
int
,str
) and semantic meaning for precise LLM guidance - Language-Integrated AI: Native
by llm()
syntax in the Jac programming language for seamless AI integration
This repository contains:
- Complete MTLLM(MTP) implementation for the Jac programming language (version 0.3.8)
- Comprehensive benchmark suite with 12 tasks comparing MTLLM(MTP) against DSPy and LMQL baselines
- Evaluation scripts for reproducing all experimental results from the paper
- Documentation and examples demonstrating all key features
- Docker environment for reproducible evaluation
The implementation is based on the open-source Jaseci ecosystem and represents the exact version used for paper evaluation.
- Python 3.12+: Required for the Jac language runtime
- OpenAI API Key: Required for evaluation benchmarks using GPT models
- Operating System: Linux or macOS (Windows not currently supported)
- Docker (optional): For containerized evaluation environment
# Clone the repository with submodules
git clone --recurse-submodules https://github.com/Jayanaka-98/mtllm-oopsla2025.git
cd mtllm-oopsla2025
# Install MTLLM(MTP) with all required dependencies
pip install "mtllm[openai,ollama,tools]==0.3.8"
# Install evaluation dependencies
pip install -r eval/requirements.txt
# Set up your OpenAI API key
export OPENAI_API_KEY="your-api-key-here"
# Optional: Install Ollama for local model evaluation
curl -fsSL https://ollama.ai/install.sh | sh
For a fully reproducible environment:
# Clone the repository
git clone --recurse-submodules https://github.com/Jayanaka-98/mtllm-oopsla2025.git
cd mtllm-oopsla2025
# Build and start the Docker container
chmod +x setup.sh
./setup.sh
# Inside the container, set your API key
export OPENAI_API_KEY="your-api-key-here"
Test your installation by running a simple MTLLM(MTP) example:
# Create a test file
cat > test.jac << 'EOF'
import from mtllm.llms {OpenAI}
glob llm = OpenAI(model_name="gpt-4o");
def greet(name: str) -> str by llm();
with entry {
print(greet("OOPSLA reviewers"));
}
EOF
# Run the test
jac run test.jac
If successful, you should see a greeting message generated by the LLM.
The following examples demonstrate the three main usage patterns of MTLLM(MTP), corresponding to Figures 8(a), 8(b), and 8(c) in the paper.
MTLLM(MTP) functions allow you to define function signatures with traditional type annotations while delegating implementation to an LLM. The runtime ensures type safety by validating and converting LLM outputs.
Example: Basic function with type enforcement
import from mtllm.llms {OpenAI}
# Initialize the LLM
glob llm = OpenAI(model_name="gpt-4o");
# Define a type-safe LLM function
def calculate_age(cur_year: int, dob: str) -> int by llm();
with entry {
age = calculate_age(cur_year=2025, dob="1998");
print(f"Age: {age}"); # Output is guaranteed to be an integer
}
Run: jac run examples/func.jac
MTLLM(MTP) can generate object fields automatically while maintaining type constraints, enabling AI-driven object initialization with structural guarantees.
Example: Automatic field generation
import from mtllm.llms {OpenAI}
glob llm = OpenAI(model_name="gpt-4o");
obj Person {
has name: str;
has dob: str;
}
with entry {
# LLM fills in missing field based on partial information
einstein = Person(name="Einstein" by llm());
print(f"{einstein.name} was born on {einstein.dob}");
}
Run: jac run examples/object.jac
Methods can leverage LLM capabilities while accessing object state, enabling context-aware AI computations with type safety.
Example: Context-aware method with object state access
import from mtllm.llms {OpenAI}
glob llm = OpenAI(model_name="gpt-4o");
obj Person {
has name: str;
has dob: str;
# Method uses object state (self) for computation
def calculate_age(cur_year: int) -> int by llm(incl_info=(self), temperature=0.7);
}
with entry {
einstein = Person(name="Einstein", dob="March 14, 1879");
print(f"Einstein's age in 2024: {einstein.calculate_age(2024)}");
}
Run: jac run examples/method.jac
- Multiple LLM Support: OpenAI GPT, Anthropic Claude, local models via Ollama
- Type Coercion: Automatic parsing and validation of complex types (lists, objects, enums)
- Error Recovery: Robust handling of malformed LLM outputs with retry mechanisms
- Native Agentic Support: MTLLM(MTP) supports ReAct to be used to build agentic applications
- Vision Model Support: MTLLM can infer with multi-modal models which can take images and videos as inputs.
π Complete Documentation: MTLLM User Guide
This artifact includes a comprehensive evaluation suite that reproduces all experimental results from the paper. The benchmarks compare MTLLM(MTP) against two state-of-the-art frameworks: DSPy and LMQL.
The evaluation covers 13 diverse tasks across different domains:
Category | Task | Description |
---|---|---|
Text Processing | translation |
Multi-language text translation |
text_to_type |
Converting unstructured text to typed objects | |
template |
Give Output according to a predefined template | |
Reasoning | mcq_reason |
Multiple-choice question reasoning |
math_problem |
Mathematical word problem solving | |
odd_word_out |
Pattern recognition and categorization | |
Content Generation | joke_gen |
Creative content generation |
essay_reviewer |
Academic text analysis | |
expert_answer |
Domain-specific question answering | |
Applications | taskman |
Task management and scheduling |
rpg_level_gen |
Game content generation | |
personality_finder |
Personality analysis | |
wikipedia |
Information extraction and summarization |
The evaluation measures:
- Accuracy: Task-specific correctness metrics
- Token Usage: Total tokens consumed per task
- Runtime: Execution time per benchmark
- Cost: Estimated API costs (USD)
- Sensitivity: Impact on Accuracy from coding practices
The paper makes four key claims that this artifact validates:
MTLLM(MTP) reduces development complexity for model-integrated applications
This evaluation is done mainly through a case study of comparing the code using Lines-of-code as the metric. The three versions of the benchmark programs used in the paper are included in the benchmarks/
directory. We also have a user study evaluation which supports this claim as well documented in the paper.
Evidence: Compare MTLLM implementations with DSPy/LMQL baselines in the benchmarks/
directory. MTLLM consistently requires fewer lines of code and less boilerplate.
MTLLM(MTP) achieves similar or better accuracy than baseline frameworks
To support this claim we do an evaluation where we run the benchmark programs 20 trials and take the average success rate. In addition to this, we conduct a thorough evaluation with multiple LLMs using the GSM8k dataset for the math problem benchmark. However, this requires running llama models on local hardware which would produce variable results. We have reasonable timeout limits. Hence, we only include the scripts for running the experiments with OpenAI GPT models.
Evidence: Run the evaluation suite to reproduce accuracy results from Table 2 in the paper.
# (requires OpenAI API key)
cd eval
# Generate accuracy summary statistics
python overall_accuracy.py
# Generate evaluation results for the math problem benchmark for the GSM8k dataset.
python GSM8k_accuracy.py
MTLLM(MTP) demonstrates similar or lower token usage, cost, and runtime compared to baselines
The cost is calculated using the OpenAI cost equation as discussed in the paper. To measure token usage we used custom versions of LMQL, DSPy, and MTLLM where the prompts and LLM responses are recorded. In this artifact, we do not include these custom versions. Hence the token usage and cost evaluation is not available in the artifact. Still, we include runtime evaluation scripts.
Evidence: Resource usage metrics are captured during evaluation and match paper results.
cd eval
# The following command runs the evaluation suite and measures runtime for both MTLLM and baseline implementations:
python eval.py --config eval.config.json --impl both
MTLLM(MTP) demonstrates resilience to suboptimal coding practices
We evaluate the robustness of MTLLM(MTP) against bad coding practices of developers. For this, we introduced seven variations of the level generator benchmark with different degrees of coding practices.
Evidence: Robustness tests show MTLLM maintains performance across different implementation styles.
cd eval/sensitivity_eval
# Run the following script to generate the results.
python exp.py
Experience MTLLM(MTP) with the included RPG game that uses LLM-powered procedural level generation:
# Install game dependencies
pip install pygame
# Run the interactive RPG demo
cd jaseci/jac/examples/rpg_game/jac_impl/jac_impl_6
jac run main.jac
This demonstrates the real-world application of MTLLM for dynamic content generation in an interactive environment.
mtllm-oopsla2025/
βββ README.md # This file
βββ Dockerfile # Docker environment setup
βββ setup.bash # Automated setup script
βββ benchmarks/ # Evaluation benchmarks
β βββ translation/ # Translation task implementations
β βββ text_to_type/ # Text-to-type conversion tasks
β βββ mcq_reason/ # Multiple choice reasoning
β βββ math_problem/ # Mathematical problem solving
β βββ joke_gen/ # Content generation tasks
β βββ essay_reviewer/ # Text analysis tasks
β βββ expert_answer/ # Domain-specific QA
β βββ taskman/ # Task management
β βββ rpg_level_gen/ # Game content generation
β βββ personality_finder/ # Personality analysis
β βββ odd_word_out/ # Pattern recognition
β βββ wikipedia/ # Information extraction
β βββ template/ # Template for new benchmarks
βββ eval/ # Evaluation scripts and results
β βββ eval.py # Main evaluation runner
β βββ overall_accuracy.py # Results aggregation
β βββ requirements.txt # Python dependencies
β βββ local_cache/ # Cached compilation artifacts
βββ jaseci/ # Core Jaseci ecosystem
βββ jac/ # Jac language implementation
βββ jac-mtllm/ # MTLLM(MTP) plugin source
βββ jac-cloud/ # Cloud deployment tools
βββ scripts/ # Utility scripts
Each benchmark directory contains three implementations:
*_mtllm.jac
: MTP implementation*_dspy.py
: DSPy baseline implementation*_lmql.py
: LMQL baseline implementation
Python Version Error
ERROR: Python 3.12+ required
Solution: Upgrade Python or use the Docker environment.
API Key Error
openai.AuthenticationError: Invalid API key
Solution: Verify your OpenAI API key is set correctly:
echo $OPENAI_API_KEY # Should display your key
export OPENAI_API_KEY="your-actual-key-here"
Package Installation Error
ERROR: Could not find a version that satisfies mtllm
Solution: Ensure you're using Python 3.12+ and run:
pip install --upgrade pip
pip install "mtllm[openai,ollama,tools]==0.3.8"
Ollama Connection Error
ConnectionError: Could not connect to Ollama
Solution: Start the Ollama service:
ollama serve
# In another terminal:
ollama pull llama2 # or your preferred model
- MTP Documentation: https://www.jac-lang.org/learn/jac-mtllm/
- Jac Language Guide: https://www.jac-lang.org
- Issues: Report bugs or ask questions via GitHub Issues