We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
在rlmodel 中定义sampletimes = 3 ,网络结构并没有改变,只是重复计算prob吗? list_state, list_action三次采样应该都是一样的值。
for j in range(sampletimes): #reset environment state = env.reset( batch_en1, batch_en2,batch_sentence_ebd,batch_reward) list_action = [] list_state = [] old_prob = [] #get action #start = time.time() for i in range(batch_len): state_in = np.append(state[0],state[1]) feed_dict = {} feed_dict[myAgent.entity1] = [state[2]] feed_dict[myAgent.entity2] = [state[3]] feed_dict[myAgent.state_in] = [state_in] prob = sess2.run(myAgent.prob,feed_dict = feed_dict) old_prob.append(prob[0]) action = get_action(prob) #add produce data for training cnn model list_action.append(action) list_state.append(state) state = env.step(action)
The text was updated successfully, but these errors were encountered:
get_action函数中,会通过np.random.rand()随机的进行采样,所以三次结果的action是不一样的。
Sorry, something went wrong.
No branches or pull requests
在rlmodel 中定义sampletimes = 3 ,网络结构并没有改变,只是重复计算prob吗? list_state, list_action三次采样应该都是一样的值。
The text was updated successfully, but these errors were encountered: