Skip to content

英语对照表

强化学习:reinforcement learning

智能体: agent

动作:action

决策:decision

奖励:reward

监督学习:supervised learning

损失函数:loss function

反向传播:back propagation

独立同分布:independent and identically distributed

延迟奖励:delayed reward

试错探索:trial-and-error exploration

探索:exploration

利用:exploitation

上限:upper bound

预演:rollout

轨迹:trajectory

回合:episode

序列决策:sequential decision making

完全可观测:fully observed

马尔可夫决策过程:markov decision process, MDP

部分可观测:partially observed

部分可观测的马尔可夫决策过程:partially observed markov decision process, POMDP

随机性策略:stochastic policy

确定性策略:deterministic policy

per aspera ad astra