Skip to content

Commit f1c2b78

Browse files
author
yvchen
committed
Add all user simulator code
1 parent 4c5b157 commit f1c2b78

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

52 files changed

+114492
-0
lines changed

README.md

+82
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,84 @@
11
# UserSimulator
22
User Simulation for Task-Completion Dialogues
3+
This instruction describes how to run the simulation and agents (rule, command line, RL).
4+
5+
6+
1. Some DataSets:
7+
under this folder: ./src/deep_dialog/data
8+
9+
[movie_kb]
10+
movie_kb.1k.p: 94% success rate (for user_goals_first_turn_template_subsets.v1.p)
11+
movie_kb.v2.p: 36% success rate (for user_goals_first_turn_template_subsets.v1.p)
12+
13+
[user goal files]:
14+
first turn: user_goals_first_turn_template.v2.p
15+
user_goals_first_turn_template.part.movie.v1.p: a subset of user goal. [Please use this one, the upper bound success rate on movie_kb.1k.json is 0.9765.]
16+
17+
[NLG rule template]
18+
dia_act_nl_pairs.v6.json: some predefined NLG rule templates for both User simulator and Agent.
19+
20+
[Dialog Act Intent]:
21+
dia_acts.txt
22+
23+
[Dialog Act Slot]:
24+
slot_set.txt
25+
26+
27+
2. Some Parameters:
28+
29+
-agt: the agent id
30+
-usr: the user (simulator) id
31+
-max_turn: maximum turns
32+
-episodes: how many dialogues you want to run
33+
-slot_err_prob: slot level err probability
34+
-slot_err_mode: which kind of slot err mode
35+
-intent_err_prob: intent level err probability
36+
37+
-movie_kb_path: the movie kb path for agent side
38+
-goal_file_path: the user goal file path for user simulator side
39+
40+
-dqn_hidden_size: hidden size for RL (DQN) agent
41+
-batch_size: batch size for DQN training
42+
-simulation_epoch_size: how many dialogue to be simulated in one epoch
43+
44+
-warm_start: use rule policy to fill the experience replay buffer at the beginning.
45+
-warm_start_epochs: how many dialogues to run in the warm start
46+
47+
-run_mode: 0 for display mode (NL); 1 for debug mode (dia_act); 2 for debug mode (dia_act and NL); >3 for no display (i.e. training)
48+
-auto_suggest: 0 for no auto_suggest; 1 for auto_suggest.
49+
-act_level: 0 for user simulator is dia_act level; 1 for user simulator is NL level
50+
-cmd_input_mode: 0 for NL input; 1 for Dia_Act input. (this is for AgentCmd only)
51+
52+
-write_model_dir: the directory to write the models
53+
-trained_model_path: the trained RL agent model; load the trained model for prediction purpose.
54+
55+
56+
3. Commands to run the different agents and user simulators
57+
58+
Rule Agent:
59+
python run.py --agt 5 --usr 1 --max_turn 40 --episodes 150 --movie_kb_path .\deep_dialog\data\movie_kb.1k.p --goal_file_path .\deep_dialog\data\user_goals_first_turn_template.part.movie.v1.p --intent_err_prob 0.00 --slot_err_prob 0.00 --episodes 500 --act_level 0
60+
61+
62+
Cmd Agent:
63+
NL Input: python run.py --agt 0 --usr 1 --max_turn 40 --episodes 150 --movie_kb_path .\deep_dialog\data\movie_kb.1k.p --goal_file_path .\deep_dialog\data\user_goals_first_turn_template.part.movie.v1.p --intent_err_prob 0.00 --slot_err_prob 0.00 --episodes 500 --act_level 0 --run_mode 0 --cmd_input_mode 0
64+
Dia_Act Input: python run.py --agt 0 --usr 1 --max_turn 40 --episodes 150 --movie_kb_path .\deep_dialog\data\movie_kb.1k.p --goal_file_path .\deep_dialog\data\user_goals_first_turn_template.part.movie.v1.p --intent_err_prob 0.00 --slot_err_prob 0.00 --episodes 500 --act_level 0 --run_mode 0 --cmd_input_mode 1
65+
66+
67+
Train RL Agent:
68+
[End2End without NLU and NLG, with simulated noise in NLU]
69+
RL: python run.py --agt 9 --usr 1 --max_turn 40 --movie_kb_path .\deep_dialog\data\movie_kb.1k.p --dqn_hidden_size 80 --experience_replay_pool_size 1000 --episodes 500 --simulation_epoch_size 100 --write_model_dir .\deep_dialog\checkpoints\rl_agent\ --run_mode 3 --act_level 0 --slot_err_prob 0.00 --intent_err_prob 0.00 --batch_size 16 --goal_file_path .\deep_dialog\data\user_goals_first_turn_template.part.movie.v1.p --warm_start 1 --warm_start_epochs 120
70+
71+
[End2End with NLU and NLG]
72+
RL: python run.py --agt 9 --usr 1 --max_turn 40 --movie_kb_path .\deep_dialog\data\movie_kb.1k.p --dqn_hidden_size 80 --experience_replay_pool_size 1000 --episodes 500 --simulation_epoch_size 100 --write_model_dir .\deep_dialog\checkpoints\rl_agent\ --run_mode 3 --act_level 1 --slot_err_prob 0.00 --intent_err_prob 0.00 --batch_size 16 --goal_file_path .\deep_dialog\data\user_goals_first_turn_template.part.movie.v1.p --warm_start 1 --warm_start_epochs 120
73+
74+
75+
Test RL Agent with N dialogues:
76+
RL: python run.py --agt 9 --usr 1 --max_turn 40 --movie_kb_path .\deep_dialog\data\movie_kb.1k.p --dqn_hidden_size 80 --experience_replay_pool_size 1000 --episodes 300 --simulation_epoch_size 100 --write_model_dir .\deep_dialog\checkpoints\rl_agent\ --slot_err_prob 0.00 --intent_err_prob 0.00 --batch_size 16 --goal_file_path .\deep_dialog\data\user_goals_first_turn_template.part.movie.v1.p --trained_model_path .\deep_dialog\checkpoints\rl_agent\noe2e\agt_9_400_420_0.90000.p --run_mode 3
77+
78+
79+
4. Learning Curves:
80+
1). python draw_learning_curve.py --result_file ./deep_dialog/checkpoints/rl_agent/noe2e/agt_9_performance_records.json
81+
82+
2). Or pull out the numbers and draw the curves in Excel
83+
84+

imgs/noe2e_learning_curve.png

54.4 KB
Loading

instructions

+81
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,81 @@
1+
This instruction describes how to run the simulation and agents (rule, command line, RL).
2+
3+
4+
1. Some DataSets:
5+
under this folder: ./src/deep_dialog/data
6+
7+
[movie_kb]
8+
movie_kb.1k.p: 94% success rate (for user_goals_first_turn_template_subsets.v1.p)
9+
movie_kb.v2.p: 36% success rate (for user_goals_first_turn_template_subsets.v1.p)
10+
11+
[user goal files]:
12+
first turn: user_goals_first_turn_template.v2.p
13+
user_goals_first_turn_template.part.movie.v1.p: a subset of user goal. [Please use this one, the upper bound success rate on movie_kb.1k.json is 0.9765.]
14+
15+
[NLG rule template]
16+
dia_act_nl_pairs.v6.json: some predefined NLG rule templates for both User simulator and Agent.
17+
18+
[Dialog Act Intent]:
19+
dia_acts.txt
20+
21+
[Dialog Act Slot]:
22+
slot_set.txt
23+
24+
25+
2. Some Parameters:
26+
27+
-agt: the agent id
28+
-usr: the user (simulator) id
29+
-max_turn: maximum turns
30+
-episodes: how many dialogues you want to run
31+
-slot_err_prob: slot level err probability
32+
-slot_err_mode: which kind of slot err mode
33+
-intent_err_prob: intent level err probability
34+
35+
-movie_kb_path: the movie kb path for agent side
36+
-goal_file_path: the user goal file path for user simulator side
37+
38+
-dqn_hidden_size: hidden size for RL (DQN) agent
39+
-batch_size: batch size for DQN training
40+
-simulation_epoch_size: how many dialogue to be simulated in one epoch
41+
42+
-warm_start: use rule policy to fill the experience replay buffer at the beginning.
43+
-warm_start_epochs: how many dialogues to run in the warm start
44+
45+
-run_mode: 0 for display mode (NL); 1 for debug mode (dia_act); 2 for debug mode (dia_act and NL); >3 for no display (i.e. training)
46+
-auto_suggest: 0 for no auto_suggest; 1 for auto_suggest.
47+
-act_level: 0 for user simulator is dia_act level; 1 for user simulator is NL level
48+
-cmd_input_mode: 0 for NL input; 1 for Dia_Act input. (this is for AgentCmd only)
49+
50+
-write_model_dir: the directory to write the models
51+
-trained_model_path: the trained RL agent model; load the trained model for prediction purpose.
52+
53+
54+
3. Commands to run the different agents and user simulators
55+
56+
Rule Agent:
57+
python run.py --agt 5 --usr 1 --max_turn 40 --episodes 150 --movie_kb_path .\deep_dialog\data\movie_kb.1k.p --goal_file_path .\deep_dialog\data\user_goals_first_turn_template.part.movie.v1.p --intent_err_prob 0.00 --slot_err_prob 0.00 --episodes 500 --act_level 0
58+
59+
60+
Cmd Agent:
61+
NL Input: python run.py --agt 0 --usr 1 --max_turn 40 --episodes 150 --movie_kb_path .\deep_dialog\data\movie_kb.1k.p --goal_file_path .\deep_dialog\data\user_goals_first_turn_template.part.movie.v1.p --intent_err_prob 0.00 --slot_err_prob 0.00 --episodes 500 --act_level 0 --run_mode 0 --cmd_input_mode 0
62+
Dia_Act Input: python run.py --agt 0 --usr 1 --max_turn 40 --episodes 150 --movie_kb_path .\deep_dialog\data\movie_kb.1k.p --goal_file_path .\deep_dialog\data\user_goals_first_turn_template.part.movie.v1.p --intent_err_prob 0.00 --slot_err_prob 0.00 --episodes 500 --act_level 0 --run_mode 0 --cmd_input_mode 1
63+
64+
65+
Train RL Agent:
66+
[End2End without NLU and NLG, with simulated noise in NLU]
67+
RL: python run.py --agt 9 --usr 1 --max_turn 40 --movie_kb_path .\deep_dialog\data\movie_kb.1k.p --dqn_hidden_size 80 --experience_replay_pool_size 1000 --episodes 500 --simulation_epoch_size 100 --write_model_dir .\deep_dialog\checkpoints\rl_agent\ --run_mode 3 --act_level 0 --slot_err_prob 0.00 --intent_err_prob 0.00 --batch_size 16 --goal_file_path .\deep_dialog\data\user_goals_first_turn_template.part.movie.v1.p --warm_start 1 --warm_start_epochs 120
68+
69+
[End2End with NLU and NLG]
70+
RL: python run.py --agt 9 --usr 1 --max_turn 40 --movie_kb_path .\deep_dialog\data\movie_kb.1k.p --dqn_hidden_size 80 --experience_replay_pool_size 1000 --episodes 500 --simulation_epoch_size 100 --write_model_dir .\deep_dialog\checkpoints\rl_agent\ --run_mode 3 --act_level 1 --slot_err_prob 0.00 --intent_err_prob 0.00 --batch_size 16 --goal_file_path .\deep_dialog\data\user_goals_first_turn_template.part.movie.v1.p --warm_start 1 --warm_start_epochs 120
71+
72+
73+
Test RL Agent with N dialogues:
74+
RL: python run.py --agt 9 --usr 1 --max_turn 40 --movie_kb_path .\deep_dialog\data\movie_kb.1k.p --dqn_hidden_size 80 --experience_replay_pool_size 1000 --episodes 300 --simulation_epoch_size 100 --write_model_dir .\deep_dialog\checkpoints\rl_agent\ --slot_err_prob 0.00 --intent_err_prob 0.00 --batch_size 16 --goal_file_path .\deep_dialog\data\user_goals_first_turn_template.part.movie.v1.p --trained_model_path .\deep_dialog\checkpoints\rl_agent\noe2e\agt_9_400_420_0.90000.p --run_mode 3
75+
76+
77+
4. Learning Curves:
78+
1). python draw_learning_curve.py --result_file ./deep_dialog/checkpoints/rl_agent/noe2e/agt_9_performance_records.json
79+
80+
2). Or pull out the numbers and draw the curves in Excel
81+

src/deep_dialog/__init__.py

+1
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
#

src/deep_dialog/agents/__init__.py

+3
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
from .agent_cmd import *
2+
from .agent_baselines import *
3+
from .agent_dqn import *

src/deep_dialog/agents/agent.py

+92
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,92 @@
1+
"""
2+
Created on May 17, 2016
3+
4+
@author: xiul, t-zalipt
5+
"""
6+
7+
from deep_dialog import dialog_config
8+
9+
class Agent:
10+
""" Prototype for all agent classes, defining the interface they must uphold """
11+
12+
def __init__(self, movie_dict=None, act_set=None, slot_set=None, params=None):
13+
""" Constructor for the Agent class
14+
15+
Arguments:
16+
movie_dict -- This is here now but doesn't belong - the agent doesn't know about movies
17+
act_set -- The set of acts. #### Shouldn't this be more abstract? Don't we want our agent to be more broadly usable?
18+
slot_set -- The set of available slots
19+
"""
20+
self.movie_dict = movie_dict
21+
self.act_set = act_set
22+
self.slot_set = slot_set
23+
self.act_cardinality = len(act_set.keys())
24+
self.slot_cardinality = len(slot_set.keys())
25+
26+
self.epsilon = params['epsilon']
27+
self.agent_run_mode = params['agent_run_mode']
28+
self.agent_act_level = params['agent_act_level']
29+
30+
31+
def initialize_episode(self):
32+
""" Initialize a new episode. This function is called every time a new episode is run. """
33+
self.current_action = {} # TODO Changed this variable's name to current_action
34+
self.current_action['diaact'] = None # TODO Does it make sense to call it a state if it has an act? Which act? The Most recent?
35+
self.current_action['inform_slots'] = {}
36+
self.current_action['request_slots'] = {}
37+
self.current_action['turn'] = 0
38+
39+
def state_to_action(self, state, available_actions):
40+
""" Take the current state and return an action according to the current exploration/exploitation policy
41+
42+
We define the agents flexibly so that they can either operate on act_slot representations or act_slot_value representations.
43+
We also define the responses flexibly, returning a dictionary with keys [act_slot_response, act_slot_value_response]. This way the command-line agent can continue to operate with values
44+
45+
Arguments:
46+
state -- A tuple of (history, kb_results) where history is a sequence of previous actions and kb_results contains information on the number of results matching the current constraints.
47+
user_action -- A legacy representation used to run the command line agent. We should remove this ASAP but not just yet
48+
available_actions -- A list of the allowable actions in the current state
49+
50+
Returns:
51+
act_slot_action -- An action consisting of one act and >= 0 slots as well as which slots are informed vs requested.
52+
act_slot_value_action -- An action consisting of acts slots and values in the legacy format. This can be used in the future for training agents that take value into account and interact directly with the database
53+
"""
54+
act_slot_response = None
55+
act_slot_value_response = None
56+
return {"act_slot_response": act_slot_response, "act_slot_value_response": act_slot_value_response}
57+
58+
59+
def register_experience_replay_tuple(self, s_t, a_t, reward, s_tplus1, episode_over):
60+
""" Register feedback from the environment, to be stored as future training data
61+
62+
Arguments:
63+
s_t -- The state in which the last action was taken
64+
a_t -- The previous agent action
65+
reward -- The reward received immediately following the action
66+
s_tplus1 -- The state transition following the latest action
67+
episode_over -- A boolean value representing whether the this is the final action.
68+
69+
Returns:
70+
None
71+
"""
72+
pass
73+
74+
75+
def set_nlg_model(self, nlg_model):
76+
self.nlg_model = nlg_model
77+
78+
def set_nlu_model(self, nlu_model):
79+
self.nlu_model = nlu_model
80+
81+
82+
def add_nl_to_action(self, agent_action):
83+
""" Add NL to Agent Dia_Act """
84+
85+
if agent_action['act_slot_response']:
86+
agent_action['act_slot_response']['nl'] = ""
87+
user_nlg_sentence = self.nlg_model.convert_diaact_to_nl(agent_action['act_slot_response'], 'agt') #self.nlg_model.translate_diaact(agent_action['act_slot_response']) # NLG
88+
agent_action['act_slot_response']['nl'] = user_nlg_sentence
89+
elif agent_action['act_slot_value_response']:
90+
agent_action['act_slot_value_response']['nl'] = ""
91+
user_nlg_sentence = self.nlg_model.convert_diaact_to_nl(agent_action['act_slot_value_response'], 'agt') #self.nlg_model.translate_diaact(agent_action['act_slot_value_response']) # NLG
92+
agent_action['act_slot_response']['nl'] = user_nlg_sentence

0 commit comments

Comments
 (0)