Added calculation explanation

Rijul-Tandon · web-flow · commit a09b7983771e · 2025-10-13T11:59:42.000+05:30
diff --git a/mean_rewards_data.csv b/mean_rewards_data.csv
@@ -1,3 +1,7 @@
+ It considers only the final 10\% of episodes from each training run, as this segment reflects the agent’s mean performance after reaching a stable, fully trained state. This approach ensures that the reported results represent the agent’s proficiency at the culmination of the learning process, minimizing the influence of early-stage variability.
+
+
+
 algorithm,environment,seed,run_dir,mean_reward
 c51,Acrobot-v1,1,Acrobot-v1__c51__1__1756494412,-497.052736
 c51,Acrobot-v1,2,Acrobot-v1__c51__2__1756494414,-497.89741
@@ -440,3 +444,4 @@ c51_expected_sarsa,CartPole-v1,8,CartPole-v1__c51_expected_sarsa__8__1756494412,
 c51_expected_sarsa,CartPole-v1,9,CartPole-v1__c51_expected_sarsa__9__1756494413,201.245974
 c51_expected_sarsa,CartPole-v1,10,CartPole-v1__c51_expected_sarsa__10__1756494411,225.154123
 
+