Replication package

for the paper Complex Model Transformations by Reinforcement Learning with Uncertain Human Guidance.

(Accepted for MODELS 2025.)

About

Model-driven engineering problems often require complex model transformations (MTs), i.e., MTs that are chained in extensive sequences. Pertinent examples of such problems include model synchronization, automated model repair, and design space exploration. Manually developing complex MTs is an error-prone and often infeasible process. Reinforcement learning (RL) is an apt way to alleviate these issues. In RL, an autonomous agent explores the state space through trial and error to identify beneficial sequences of actions, such as MTs. However, RL methods exhibit performance issues in complex problems. In these situations, human guidance can be of high utility. In this paper, we present an approach and technical framework for developing complex MT sequences through RL, guided by potentially uncertain human advice. Our framework allows user-defined MTs to be mapped onto RL primitives, and executes them as RL programs to find optimal MT sequences. Our evaluation shows that human guidance, even if uncertain, substantially improves RL performance, and results in more efficient development of complex MTs. Through a sensible trade-off between the certainty and timeliness of human advice, our method takes a firm step towards machine learning-driven human-in-the-loop engineering methods.

Content description

01-advice - Contains all the experimental artifacts and visualizations used in our experiments (map, advice files, and advice visualized on the map)
02-data - Contains experimental data produced in accordance with the Experiment settings
- randomRewardData.csv - Cumulative rewards of a random-walk agent
- unadvisedRewardData.csv - Cumulative rewards of an unadvised (but not random) agent
- allRewardData.csv - Cumulative rewards of an agent advised by information about every state
- holesAndGoalRewardData.csv - Cumulative rewards of an agent advised by information about terminating states (negative termination and positive termination, i.e., goal)
- human10RewardData.csv - Cumulative rewards of an agent advised by a single human advisor about 10% of the states
- human5RewardData.csv - Cumulative rewards of an agent advised by a single human advisor about 5% of the states
- coop10SequentialRewardData.csv - Cumulative rewards of an agent advised by two cooperating human advisors (one located at top left, one located at bottom right) who each advise about 10% of the states
- coop10ParallelRewardData.csv - Cumulative rewards of an agent advised by two cooperating human advisors (one located at top right, one located at bottom left) who each advise about 10% of the states
- coop5SequentialRewardData.csv - Cumulative rewards of an agent advised by two cooperating human advisors (one located at top left, one located at bottom right) who each advise about 5% of the states
- coop5ParallelRewardData.csv - Cumulative rewards of an agent advised by two cooperating human advisors (one located at top right, one located at bottom left) who each advise about 5% of the states
03-analysis - Contains Python analysis scripts to obtain the results in the 04-results folder
04-results - Contains the plots and statistical significance values that are used in the publication

Reproduction of analysis

Install the required Python packages by running pip install -r .\03-analysis\requirements.txt from the root folder.
For the charts, run python .\03-analysis\plotting.py from the root folder and follow the instructions. Results will be generated into 04-results in two formats, in the respective pdf and png subfolders.
For the significance tests, run python .\03-analysis\t_test.py > 04-results/significance/results.txt from the root folder. Results will be generated into 04-results/significance in a textual tabular format.

NOTE: The above steps have been tested with python>=3.8 && python<=3.13.

Reproduction of experimental data

Setting up Eclipse

For the following steps, refer to the tool's official GitHub repository.

Download Eclipse Modeling Tools, Version: 2025-06 (4.36.0) from the Eclipse Foundation site.
Install Eclipse Xtend, Version: 2.39.0 either through the Marketplace or from the Xtend site.
Install Viatra, Version: 2.9.1 either through the Marketplace or from the Viatra site.
Clone the github repository for the tool’s official GitHub repository.
Import the contents of the (1) plugins, (2) examples, and (3) tests folders into the running Eclipse instance.
Generate the RL model and edit code using the genmodel in /plugins/ca.mcmaster.ssm.rl4mt.metamodel/models).
- Open rl.genmodel, right-click the root node and select generate model code.
- Right-click the root node and select generate edit code.
Generate the Lake model and edit code using the genmodel in /examples/ca.mcmaster.ssm.rl4mt.examples.lake.metamodel/models.
- Open lake.genmodel right-click the root node and select generate model code.
- Right-click the root node and select generate edit code.

Obtaining experimental data

Data can be obtained by running experiments encoded in unit tests. Unit tests are parameterized with human advice found in the 01-advice folder of this replication package.

To locate the unit tests, navigate to https://github.com/ssm-lab/rl4mt/tree/main/tests/ca.mcmaster.ssm.rl4mt.examples.lake.tests/src/ca/mcmaster/ssm/rl4mt/examples/lake/tests in the tool's official GitHub repository.

Run configurations

Repeat these steps for each experiment file.

Right-click the file name.
Go to Run as and click Run configurations.
Select JUnit Plug-in Test and create a new configuration. Optionally, name this test as the experiment file.
Under the Test tab select Run a single test and under Test class select the experiment file.
Click on the Arguments tab.
- Program arguments: -os ${target.os} -ws ${target.ws} -arch ${target.arch} -nl ${target.nl} -consoleLog -nosplash.
- VM arguments: -Xms512m -Xmx4096m.

Note: Headless mode preferred.

Click on the Main tab.
Under Program to Run select Run an application and select [No Application] - Headless Mode.

NOTE: The following steps take a long time (about half an hour each, depending on the hardware) to compute.

Random Agent

Run LakeTestRandom.xtend
Rename rewardData.csv to randomRewardData.csv

Unadvised Agent

Run LakeTestUnadvised.xtend
Rename rewardData.csv to unadvisedRewardData.csv

Oracle - 100% advice quota

In this experiment, a single oracle advisor gives advice about every tile.

In LakeTestSingleAdvisor.xtend, on line 233 change the SingleExperimentMode to All
- runAdvisedAgentSingleAdvisor(SingleExperimentMode.ALL)
Save and run LakeTestSingleAdvisor.xtend
Rename rewardData.csv to allRewardData.csv

Oracle - 20% advice quota

In this experiment, a single oracle advisor gives advice about hole tiles and the goal tile (about 20% of the problem space).

In LakeTestSingleAdvisor.xtend, on line 233 change the SingleExperimentMode to HOLES_AND_GOAL
- runAdvisedAgentSingleAdvisor(SingleExperimentMode.HOLES_AND_GOAL)
Save and run LakeTestSingleAdvisor.xtend
Rename rewardData.csv to holesAndGoalRewardData.csv

Single human - 10% advice quota

In this experiment, a single human advisor gives advice about 10% of the problem space.

In LakeTestSingleAdvisor.xtend, on line 233 change the SingleExperimentMode to HUMAN10
- runAdvisedAgentSingleAdvisor(SingleExperimentMode.HUMAN10)
Save and run LakeTestSingleAdvisor.xtend
Rename rewardData.csv to human10RewardData.csv

Single human - 5% advice quota

In this experiment, a single human advisor gives advice about 5% of the problem space.

In LakeTestSingleAdvisor.xtend, on line 233 change the SingleExperimentMode to HUMAN5
- runAdvisedAgentSingleAdvisor(SingleExperimentMode.HUMAN5)
Save and run LakeTestSingleAdvisor.xtend
Rename rewardData.csv to human5RewardData.csv

NOTE: The following data is only briefly mentioned in the paper, but not presented in detail due to the page limit.

Two cooperating humans - 10% advice quota each (total 20%) - Sequential guidance

In this experiment, two human advisors gives advice about 10% of the problem space each. The advisors are located in the top-left corner (start) and the bottom-right corner (goal) and give advice about their local environment. Therefore, the agent is first guided by the first advisor's input, and later, by the second advisor's input -- i.e., guidance is sequential.

In LakeTestCoop.xtend, on line 333 change the CoopExperimentMode to SEQUENTIAL_10
- runAdvisedAgentCoop(CoopExperimentMode.SEQUENTIAL_10)
Save and run LakeTestCoop.xtend
Rename rewardData.csv to coop10SequentialRewardData.csv

Two cooperating humans - 10% advice quota each (total 20%) - Parallel guidance

In this experiment, two human advisors gives advice about 10% of the problem space each. The advisors are located in the bottom-left and the top-right corner and give advice about their local environment. Therefore, the agent is sometimes guided by the first advisor's input and sometimes, by the second advisor's input -- i.e., guidance is parallel.

In LakeTestCoop.xtend, on line 333 change the CoopExperimentMode to PARALLEL_10
- runAdvisedAgentCoop(CoopExperimentMode.PARALLEL_10)
Save and run LakeTestCoop.xtend
Rename rewardData.csv to coop10ParallelRewardData.csv

Two cooperating humans - 5% advice quota each (total 10%) - Sequential guidance

In this experiment, two human advisors gives advice about 5% of the problem space each. The advisors are located in the top-left corner (start) and the bottom-right corner (goal) and give advice about their local environment. Therefore, the agent is first guided by the first advisor's input, and later, by the second advisor's input -- i.e., guidance is sequential.

In LakeTestCoop.xtend, on line 333 change the CoopExperimentMode to SEQUENTIAL_5
- runAdvisedAgentCoop(CoopExperimentMode.SEQUENTIAL_5)
Save and run LakeTestCoop.xtend
Rename rewardData.csv to coop5SequentialRewardData.csv

Two cooperating humans - 5% advice quota each (total 10%) - Parallel guidance

In this experiment, two human advisors gives advice about 5% of the problem space each. The advisors are located in the bottom-left and the top-right corner and give advice about their local environment. Therefore, the agent is sometimes guided by the first advisor's input and sometimes, by the second advisor's input -- i.e., guidance is parallel.

In LakeTestCoop.xtend, on line 333 change the CoopExperimentMode to PARALLEL_5
- runAdvisedAgentCoop(CoopExperimentMode.PARALLEL_5)
Save and run LakeTestCoop.xtend
Rename rewardData.csv to coop5ParallelRewardData.csv

Experiment setup

Problem

The map used in the experiments:

Settings and hyperparameters

Parameter	Value
RL method	Discrete policy gradient
Learning rate ($\alpha$)	0.9
Discount factor ($\gamma$)	1.0
Number of episodes	10000
SL fusion operator	BCF
State-action space	12x12x4
Evaluated agent	{Random, Unadvised, Advised}
Source of advice	{Oracle, Single human, Cooperating humans}
Advice quota – Oracle	{100% ("All"), 20% ("Holes&Goal")}
Advice quota – Single human	{10%, 5%}
Advice quota – Cooperating humans	{10% each, 5% each}
Uncertainty - Oracle and Single human)	{0.2k ∣ k $\in$ 0..4}
Uncertainty – Cooperating humans	2D Manhattan distance
Cooperative advice type	{Sequential cooperation, parallel cooperation}

Results

Oracle and single human

	Oracle		Single human
u	100%	20%	10%	5%
0.0	9900.100	9914.900	9768.000	8051.300
0.2	9685.900	8948.933	8538.266	5287.833
0.4	7974.066	5216.433	6121.033	2134.966
0.6	5094.333	2177.633	3488.700	2246.733
0.8	1502.500	523.633	1126.300	1108.666

Oracle - 100% advice quota

Oracle - 20% advice quota

Single human - 10% advice quota

Single human - 5% advice quota

Two cooperating humans

	10%	5%
Sequential	8078.366	5037.066
Parallel	5429.466	4130.666

Name		Name	Last commit message	Last commit date
Latest commit History 59 Commits
01-advice		01-advice
02-data		02-data
03-analysis		03-analysis
04-results		04-results
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
REQUIREMENTS.md		REQUIREMENTS.md

License

ssm-lab/rl4mt-replication-package

Folders and files

Latest commit

History

Repository files navigation

Replication package

for the paper Complex Model Transformations by Reinforcement Learning with Uncertain Human Guidance.

About

Table of contents

Content description

Reproduction of analysis

Reproduction of experimental data

Setting up Eclipse

Obtaining experimental data

Run configurations

Random Agent

Unadvised Agent

Oracle - 100% advice quota

Oracle - 20% advice quota

Single human - 10% advice quota

Single human - 5% advice quota

Two cooperating humans - 10% advice quota each (total 20%) - Sequential guidance

Two cooperating humans - 10% advice quota each (total 20%) - Parallel guidance

Two cooperating humans - 5% advice quota each (total 10%) - Sequential guidance

Two cooperating humans - 5% advice quota each (total 10%) - Parallel guidance

Experiment setup

Problem

Settings and hyperparameters

Results

Oracle and single human

Oracle - 100% advice quota

Oracle - 20% advice quota

Single human - 10% advice quota

Single human - 5% advice quota

Two cooperating humans

Two cooperating humans - 10% advice quota each (total 20%)

Two cooperating humans - 5% advice quota each (total 10%)

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 3

Packages 0

Contributors 2

Uh oh!

Languages

Packages