英语对照表
强化学习:reinforcement learning
智能体: agent
动作:action
决策:decision
奖励:reward
监督学习:supervised learning
损失函数:loss function
反向传播:back propagation
独立同分布:independent and identically distributed
延迟奖励:delayed reward
试错探索:trial-and-error exploration
探索:exploration
利用:exploitation
上限:upper bound
预演:rollout
轨迹:trajectory
回合:episode
序列决策:sequential decision making
完全可观测:fully observed
马尔可夫决策过程:markov decision process, MDP
部分可观测:partially observed
部分可观测的马尔可夫决策过程:partially observed markov decision process, POMDP
随机性策略:stochastic policy
确定性策略:deterministic policy