sampletimes = 3问题 #15

luoyueyi · 2018-09-16T01:55:45Z

在rlmodel 中定义sampletimes = 3 ，网络结构并没有改变，只是重复计算prob吗？ list_state, list_action三次采样应该都是一样的值。

for j in range(sampletimes):
                      #reset environment
                      state = env.reset( batch_en1, batch_en2,batch_sentence_ebd,batch_reward)
                      list_action = []
                      list_state = []
                      old_prob = []


                      #get action
                      #start = time.time()
                      for i in range(batch_len):

                          state_in = np.append(state[0],state[1])
                          feed_dict = {}
                          feed_dict[myAgent.entity1] = [state[2]]
                          feed_dict[myAgent.entity2] = [state[3]]
                          feed_dict[myAgent.state_in] = [state_in]
                          prob = sess2.run(myAgent.prob,feed_dict = feed_dict)

                          old_prob.append(prob[0])
                          action = get_action(prob)
                          #add produce data for training cnn model
                          list_action.append(action)
                          list_state.append(state)
                          state = env.step(action)

The text was updated successfully, but these errors were encountered:

xuyanfu · 2018-09-16T02:44:18Z

get_action函数中，会通过np.random.rand()随机的进行采样，所以三次结果的action是不一样的。

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sampletimes = 3问题 #15

sampletimes = 3问题 #15

luoyueyi commented Sep 16, 2018

xuyanfu commented Sep 16, 2018

sampletimes = 3问题 #15

sampletimes = 3问题 #15

Comments

luoyueyi commented Sep 16, 2018

xuyanfu commented Sep 16, 2018