英语对照表

强化学习：reinforcement learning

智能体： agent

动作：action

决策：decision

奖励：reward

监督学习：supervised learning

损失函数：loss function

反向传播：back propagation

独立同分布：independent and identically distributed

延迟奖励：delayed reward

试错探索：trial-and-error exploration

探索：exploration

利用：exploitation

上限：upper bound

预演：rollout

轨迹：trajectory

回合：episode

序列决策：sequential decision making

完全可观测：fully observed

马尔可夫决策过程：markov decision process, MDP

部分可观测：partially observed

部分可观测的马尔可夫决策过程：partially observed markov decision process, POMDP

随机性策略：stochastic policy

确定性策略：deterministic policy

英语对照表 ​