Skip to content

Commit a09b798

Browse files
authored
Added calculation explanation
1 parent 57e2d24 commit a09b798

File tree

1 file changed

+5
-0
lines changed

1 file changed

+5
-0
lines changed

mean_rewards_data.csv

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,7 @@
1+
It considers only the final 10\% of episodes from each training run, as this segment reflects the agent’s mean performance after reaching a stable, fully trained state. This approach ensures that the reported results represent the agent’s proficiency at the culmination of the learning process, minimizing the influence of early-stage variability.
2+
3+
4+
15
algorithm,environment,seed,run_dir,mean_reward
26
c51,Acrobot-v1,1,Acrobot-v1__c51__1__1756494412,-497.052736
37
c51,Acrobot-v1,2,Acrobot-v1__c51__2__1756494414,-497.89741
@@ -440,3 +444,4 @@ c51_expected_sarsa,CartPole-v1,8,CartPole-v1__c51_expected_sarsa__8__1756494412,
440444
c51_expected_sarsa,CartPole-v1,9,CartPole-v1__c51_expected_sarsa__9__1756494413,201.245974
441445
c51_expected_sarsa,CartPole-v1,10,CartPole-v1__c51_expected_sarsa__10__1756494411,225.154123
442446

447+

0 commit comments

Comments
 (0)