for the paper Complex Model Transformations by Reinforcement Learning with Uncertain Human Guidance.
(Accepted for MODELS 2025.)
Model-driven engineering problems often require complex model transformations (MTs), i.e., MTs that are chained in extensive sequences. Pertinent examples of such problems include model synchronization, automated model repair, and design space exploration. Manually developing complex MTs is an error-prone and often infeasible process. Reinforcement learning (RL) is an apt way to alleviate these issues. In RL, an autonomous agent explores the state space through trial and error to identify beneficial sequences of actions, such as MTs. However, RL methods exhibit performance issues in complex problems. In these situations, human guidance can be of high utility. In this paper, we present an approach and technical framework for developing complex MT sequences through RL, guided by potentially uncertain human advice. Our framework allows user-defined MTs to be mapped onto RL primitives, and executes them as RL programs to find optimal MT sequences. Our evaluation shows that human guidance, even if uncertain, substantially improves RL performance, and results in more efficient development of complex MTs. Through a sensible trade-off between the certainty and timeliness of human advice, our method takes a firm step towards machine learning-driven human-in-the-loop engineering methods.
- Content description
- Reproduction of analysis
- Reproduction of experimental data
- Experiment setup
- Results
01-advice- Contains all the experimental artifacts and visualizations used in our experiments (map, advice files, and advice visualized on the map)02-data- Contains experimental data produced in accordance with theExperiment settingsrandomRewardData.csv- Cumulative rewards of a random-walk agentunadvisedRewardData.csv- Cumulative rewards of an unadvised (but not random) agentallRewardData.csv- Cumulative rewards of an agent advised by information about every stateholesAndGoalRewardData.csv- Cumulative rewards of an agent advised by information about terminating states (negative termination and positive termination, i.e., goal)human10RewardData.csv- Cumulative rewards of an agent advised by a single human advisor about 10% of the stateshuman5RewardData.csv- Cumulative rewards of an agent advised by a single human advisor about 5% of the statescoop10SequentialRewardData.csv- Cumulative rewards of an agent advised by two cooperating human advisors (one located at top left, one located at bottom right) who each advise about 10% of the statescoop10ParallelRewardData.csv- Cumulative rewards of an agent advised by two cooperating human advisors (one located at top right, one located at bottom left) who each advise about 10% of the statescoop5SequentialRewardData.csv- Cumulative rewards of an agent advised by two cooperating human advisors (one located at top left, one located at bottom right) who each advise about 5% of the statescoop5ParallelRewardData.csv- Cumulative rewards of an agent advised by two cooperating human advisors (one located at top right, one located at bottom left) who each advise about 5% of the states
03-analysis- Contains Python analysis scripts to obtain the results in the04-resultsfolder04-results- Contains the plots and statistical significance values that are used in the publication
- Install the required Python packages by running
pip install -r .\03-analysis\requirements.txtfrom the root folder. - For the charts, run
python .\03-analysis\plotting.pyfrom the root folder and follow the instructions. Results will be generated into04-resultsin two formats, in the respectivepdfandpngsubfolders. - For the significance tests, run
python .\03-analysis\t_test.py > 04-results/significance/results.txtfrom the root folder. Results will be generated into04-results/significancein a textual tabular format.
NOTE: The above steps have been tested with python>=3.8 && python<=3.13.
For the following steps, refer to the tool's official GitHub repository.
- Download Eclipse Modeling Tools, Version: 2025-06 (4.36.0) from the Eclipse Foundation site.
- Install Eclipse Xtend, Version: 2.39.0 either through the Marketplace or from the Xtend site.
- Install Viatra, Version: 2.9.1 either through the Marketplace or from the Viatra site.
- Clone the github repository for the tool’s official GitHub repository.
- Import the contents of the (1) plugins, (2) examples, and (3) tests folders into the running Eclipse instance.
- Generate the RL model and edit code using the
genmodelin /plugins/ca.mcmaster.ssm.rl4mt.metamodel/models).- Open
rl.genmodel, right-click the root node and select generate model code. - Right-click the root node and select
generate edit code.
- Open
- Generate the Lake model and edit code using the
genmodelin /examples/ca.mcmaster.ssm.rl4mt.examples.lake.metamodel/models.- Open
lake.genmodelright-click the root node and selectgenerate model code. - Right-click the root node and select
generate edit code.
- Open
Data can be obtained by running experiments encoded in unit tests. Unit tests are parameterized with human advice found in the 01-advice folder of this replication package.
To locate the unit tests, navigate to https://github.com/ssm-lab/rl4mt/tree/main/tests/ca.mcmaster.ssm.rl4mt.examples.lake.tests/src/ca/mcmaster/ssm/rl4mt/examples/lake/tests in the tool's official GitHub repository.
Repeat these steps for each experiment file.
- Right-click the file name.
- Go to
Run asand clickRun configurations. - Select
JUnit Plug-in Testand create a new configuration. Optionally, name this test as the experiment file. - Under the
Testtab selectRun a single testand underTest classselect the experiment file. - Click on the
Argumentstab.- Program arguments:
-os ${target.os} -ws ${target.ws} -arch ${target.arch} -nl ${target.nl} -consoleLog -nosplash. - VM arguments:
-Xms512m -Xmx4096m.
- Program arguments:
Note: Headless mode preferred.
- Click on the
Maintab. - Under
Program to RunselectRun an applicationand select[No Application] - Headless Mode.
NOTE: The following steps take a long time (about half an hour each, depending on the hardware) to compute.
- Run
LakeTestRandom.xtend - Rename
rewardData.csvtorandomRewardData.csv
- Run
LakeTestUnadvised.xtend - Rename
rewardData.csvtounadvisedRewardData.csv
In this experiment, a single oracle advisor gives advice about every tile.
- In
LakeTestSingleAdvisor.xtend, on line 233 change theSingleExperimentModetoAllrunAdvisedAgentSingleAdvisor(SingleExperimentMode.ALL)
- Save and run
LakeTestSingleAdvisor.xtend - Rename
rewardData.csvtoallRewardData.csv
In this experiment, a single oracle advisor gives advice about hole tiles and the goal tile (about 20% of the problem space).
- In
LakeTestSingleAdvisor.xtend, on line 233 change theSingleExperimentModetoHOLES_AND_GOALrunAdvisedAgentSingleAdvisor(SingleExperimentMode.HOLES_AND_GOAL)
- Save and run
LakeTestSingleAdvisor.xtend - Rename
rewardData.csvtoholesAndGoalRewardData.csv
In this experiment, a single human advisor gives advice about 10% of the problem space.
- In
LakeTestSingleAdvisor.xtend, on line 233 change theSingleExperimentModetoHUMAN10runAdvisedAgentSingleAdvisor(SingleExperimentMode.HUMAN10)
- Save and run
LakeTestSingleAdvisor.xtend - Rename
rewardData.csvtohuman10RewardData.csv
In this experiment, a single human advisor gives advice about 5% of the problem space.
- In
LakeTestSingleAdvisor.xtend, on line 233 change theSingleExperimentModetoHUMAN5runAdvisedAgentSingleAdvisor(SingleExperimentMode.HUMAN5)
- Save and run
LakeTestSingleAdvisor.xtend - Rename
rewardData.csvtohuman5RewardData.csv
NOTE: The following data is only briefly mentioned in the paper, but not presented in detail due to the page limit.
In this experiment, two human advisors gives advice about 10% of the problem space each. The advisors are located in the top-left corner (start) and the bottom-right corner (goal) and give advice about their local environment. Therefore, the agent is first guided by the first advisor's input, and later, by the second advisor's input -- i.e., guidance is sequential.
- In
LakeTestCoop.xtend, on line 333 change theCoopExperimentModetoSEQUENTIAL_10runAdvisedAgentCoop(CoopExperimentMode.SEQUENTIAL_10)
- Save and run
LakeTestCoop.xtend - Rename
rewardData.csvtocoop10SequentialRewardData.csv
In this experiment, two human advisors gives advice about 10% of the problem space each. The advisors are located in the bottom-left and the top-right corner and give advice about their local environment. Therefore, the agent is sometimes guided by the first advisor's input and sometimes, by the second advisor's input -- i.e., guidance is parallel.
- In
LakeTestCoop.xtend, on line 333 change theCoopExperimentModetoPARALLEL_10runAdvisedAgentCoop(CoopExperimentMode.PARALLEL_10)
- Save and run
LakeTestCoop.xtend - Rename
rewardData.csvtocoop10ParallelRewardData.csv
In this experiment, two human advisors gives advice about 5% of the problem space each. The advisors are located in the top-left corner (start) and the bottom-right corner (goal) and give advice about their local environment. Therefore, the agent is first guided by the first advisor's input, and later, by the second advisor's input -- i.e., guidance is sequential.
- In
LakeTestCoop.xtend, on line 333 change theCoopExperimentModetoSEQUENTIAL_5runAdvisedAgentCoop(CoopExperimentMode.SEQUENTIAL_5)
- Save and run
LakeTestCoop.xtend - Rename
rewardData.csvtocoop5SequentialRewardData.csv
In this experiment, two human advisors gives advice about 5% of the problem space each. The advisors are located in the bottom-left and the top-right corner and give advice about their local environment. Therefore, the agent is sometimes guided by the first advisor's input and sometimes, by the second advisor's input -- i.e., guidance is parallel.
- In
LakeTestCoop.xtend, on line 333 change theCoopExperimentModetoPARALLEL_5runAdvisedAgentCoop(CoopExperimentMode.PARALLEL_5)
- Save and run
LakeTestCoop.xtend - Rename
rewardData.csvtocoop5ParallelRewardData.csv
The map used in the experiments:
| Parameter | Value |
|---|---|
| RL method | Discrete policy gradient |
| Learning rate ( |
0.9 |
| Discount factor ( |
1.0 |
| Number of episodes | 10000 |
| SL fusion operator | BCF |
| State-action space | 12x12x4 |
| Evaluated agent | {Random, Unadvised, Advised} |
| Source of advice | {Oracle, Single human, Cooperating humans} |
| Advice quota – Oracle | {100% ("All"), 20% ("Holes&Goal")} |
| Advice quota – Single human | {10%, 5%} |
| Advice quota – Cooperating humans | {10% each, 5% each} |
| Uncertainty - Oracle and Single human) | {0.2k ∣ k |
| Uncertainty – Cooperating humans | 2D Manhattan distance |
| Cooperative advice type | {Sequential cooperation, parallel cooperation} |
| Oracle | Single human | |||
|---|---|---|---|---|
| u | 100% | 20% | 10% | 5% |
| 0.0 | 9900.100 | 9914.900 | 9768.000 | 8051.300 |
| 0.2 | 9685.900 | 8948.933 | 8538.266 | 5287.833 |
| 0.4 | 7974.066 | 5216.433 | 6121.033 | 2134.966 |
| 0.6 | 5094.333 | 2177.633 | 3488.700 | 2246.733 |
| 0.8 | 1502.500 | 523.633 | 1126.300 | 1108.666 |
| 10% | 5% | |
|---|---|---|
| Sequential | 8078.366 | 5037.066 |
| Parallel | 5429.466 | 4130.666 |












