Skip to content

Navigation Menu

Explore
By company size
By use case
By industry
View all solutions
Topics
- AI
- DevOps
- Security
- Software Development
- View all
Explore
- GitHub Sponsors
  Fund open source developers
- The ReadME Project
  GitHub community articles
Repositories
- Enterprise platform
  AI-powered developer platform
Available add-ons
Pricing

Search code, repositories, users, issues, pull requests...

Search

Clear

Search syntax tips

Provide feedback

We read every piece of feedback, and take your input very seriously.

Include my email address so I can be contacted

Saved searches

Use saved searches to filter your results more quickly

Name

Query

To see all available qualifiers, see our documentation.

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.

Dismiss alert

doyle1996 / Deep-Learning-Interview-Book Public

forked from amusi/Deep-Learning-Interview-Book

Notifications You must be signed in to change notification settings
Fork 0
Star 0

Code
Pull requests
Actions
Projects
Security
Insights

Additional navigation options

Code
Pull requests
Actions
Projects
Security
Insights

Breadcrumbs

Deep-Learning-Interview-Book
docs

/

强化学习.md

Latest commit

History

162 lines (86 loc) · 3.63 KB

Breadcrumbs

Deep-Learning-Interview-Book
docs

/

强化学习.md

File metadata and controls

162 lines (86 loc) · 3.63 KB

[TOC]

强化学习

强化学习解决的是什么样的问题？

TODO

举出强化学习与有监督学习的异同点。有监督学习靠样本标签训练模型，强化学习靠的是什么？

TODO

强化学习的损失函数（loss function）是什么？

TODO

写贝尔曼方程（Bellman Equation）

TODO

参考资料

贝尔曼方程

最优值函数和最优策略为什么等价？

TODO

求解马尔科夫决策过程都有哪些方法？

TODO

简述蒙特卡罗估计值函数的算法。

TODO

简述时间差分算法

TODO

介绍Q-Learning

TODO

参考资料

Q-Learning
Q-learning算法
【强化学习】Q-Learning算法详解
通过 Q-learning 深入理解强化学习

DQN 算法

基本原理

参考资料

【强化学习】Deep Q Network(DQN)算法详解
强化学习—DQN算法原理详解

DQN的两个关键trick分别是什么？

TODO

DQN 都有哪些变种？DQN有哪些改进方向？

TODO

引入状态奖励的是哪种DQN？

TODO
Double -DQN
优先经验回放
Dueling-DQN

Dueling DQN和DQN有什么区别？

TODO

介绍OpenAI用的PPO算法

TODO

介绍TRPO算法

TODO

为什么TRPO能保证新策略的回报函数单调不减？

TODO

介绍DDPG算法

画出DDPG框架

DDPG中的第二个D 为什么要确定？

TODO

介绍A3C算法

TODO

参考资料

一文读懂深度强化学习算法 A3C （Actor-Critic Algorithm）
深度强化学习——A3C

A3C中优势函数意义

TODO

强化学习如何用在推荐系统中？

TODO

参考资料

用强化学习研究推荐系统的前景和难度怎么样？
深度强化学习如何和推荐系统结合起来？
ICML 2019 | 强化学习用于推荐系统，蚂蚁金服提出生成对抗用户模型
最新！五大顶会2019必读的深度推荐系统与CTR预估相关的论文

介绍Sarsa算法

TODO

参考资料

AI学习笔记——Sarsa算法

Sarsa 和 Q-Learning区别

TODO

参考资料

强化学习(五)：Sarsa算法与Q-Learning算法
强化学习中的Q-learning算法和Sarsa算法的区别
Bourne强化学习笔记2：彻底搞清楚什么是Q-learning与Sarsa

强化学习中有value-based 和 policy-based，这两种的优缺点分别是什么？应用场景分别是什么？

TODO

value-based方法学习的目标是什么？

TODO

强化学习 DQN，DDQN，AC，DDPG 的区别

TODO

参考资料

再励学习面试真题
强化学习面经

Footer

© 2024 GitHub, Inc.

Footer navigation

Terms
Privacy
Security
Status
Docs
Contact

You can’t perform that action at this time.