Publishing the model with exploration scores for CB use-cases

Pavlos Athanasios Apostolopoulos · facebook-github-bot · commit 5c99dd8d5d18 · 2024-12-23T13:20:48.000-08:00
Summary:
Adding exploration module as part of the output for CB models

Removed also is_contextual_bandit from the transoformer's input arguments as this can retrieved by the type of agent

Renamde max_number_actions to number_of_actions for clarity

Differential Revision: D67604246

fbshipit-source-id: 2bf26ef6ad9e6fd2c9c4b5da128e2c085b0aab48
diff --git a/pearl/utils/functional_utils/train_and_eval/offline_learning_and_evaluation.py b/pearl/utils/functional_utils/train_and_eval/offline_learning_and_evaluation.py
@@ -189,7 +189,8 @@ def offline_learning(
         batch = data_buffer.sample(offline_agent.policy_learner.batch_size)
         assert isinstance(batch, TransitionBatch)
         loss = offline_agent.learn_batch(batch=batch)
-        learning_logger(loss, i, batch, TRAINING_TAG)
+        if i % 1000 == 0:
+            learning_logger(loss, i, batch, TRAINING_TAG)
 
 
 def offline_evaluation(