2020/09/29 10:11

楚楚作者

IJCAI 2020丨近期必读七篇【深度强化学习】论文

导语：

国际人工智能联合会议（International Joint Conference on Artificial Intelligence, 简称为 IJCAI）是人工智能领域中最主要的学术会议之一，原为单数年召开，自2016年起改为每年召开。因疫情的影响, IJCAI 2020将于2021年1月5日-10日在举行。

根据AMiner-IJCAI 2020词云图，小脉发现表征学习、图神经网络、深度强化学习、深度神经网络等都是今年比较火的Topic，受到了很多人的关注。今天小脉给大家分享的是IJCAI 2020七篇必读的深度强化学习（Deep Reinforcement Learning）相关论文。

1. 论文名称：Efficient Deep Reinforcement Learning via Adaptive Policy Transfer

论文链接：

https://www.aminer.cn/pub/5ef96b048806af6ef2772111/efficient-deep-reinforcement-learning-via-adaptive-policy-transfer?conf=ijcai2020

作者：Tianpei Yang、Jianye Hao、Zhaopeng Meng、Zongzhang Zhang、Yujing Hu、Yingfeng Chen、Changjie Fan、Weixun Wang、Wulong Liu、Zhaodong Wang、Jiajie Peng

简介：

· The authors propose a Policy Transfer Framework (PTF) which can efficiently select the optimal source policy and exploit the useful information to facilitate the target task learning.

· PTF efficiently avoids negative transfer through terminating the exploitation of current source policy and selects another one adaptively.

· PTF can be combined with existing deep DRL methods.

· Experimental results show PTF efficiently accelerates the learning process of existing state-ofthe-art DRL methods and outperforms previous policy reuse approaches.

2. 论文名称：KoGuN: Accelerating Deep Reinforcement Learning via Integrating Human Suboptimal Knowledge

论文链接：

https://www.aminer.cn/pub/5e4d083f3a55ac8cfd770c23/kogun-accelerating-deep-reinforcement-learning-via-integrating-human-suboptimal-knowledge?conf=ijcai2020

作者：Zhang Peng、Jianye Hao、Wang Weixun、Tang Hongyao、Ma Yi、Duan Yihai、Zheng Yan

简介：

· The authors propose a novel policy network framework called KoGuN to leverage human knowledge to accelerate the learning process of RL agents.

· The authors firstly evaluate the algorithm on four tasks in Section 4.1 : CartP ole [Barto and Sutton, 1982], LunarLander and LunarLanderContinuous in OpenAI Gym [Brockman et al, 2016] and F lappyBird in PLE [Tasfi, 2016].

· The authors show the effectiveness and robustness of KoGuN in sparse reward setting in Section 4.2.

· For PPO without KoGuN, the authors use a neural network with two full-connected hidden layers as policy approximator.

· For KoGuN with normal network (KoGuN-concat) as refine module, the authors use a neural network with two full-connected hidden layers for the refine module.

· For KoGuN with hypernetworks (KoGuN-hyper), the authors use hypernetworks to generate a refine module with one hidden layer.

· All hidden layers described above have 32 units. w1 is set to 0.7 at beginning and decays to 0.1 in the end of training phase

3. 论文名称：Generating Behavior-Diverse Game AIs with Evolutionary Multi-Objective Deep Reinforcement Learning

论文链接：

https://www.aminer.cn/pub/5ef96b048806af6ef277219d/generating-behavior-diverse-game-ais-with-evolutionary-multi-objective-deep-reinforcement-learning?conf=ijcai2020

作者：Ruimin Shen、Yan Zheng、Jianye Hao、Zhaopeng Meng、Yingfeng Chen、Changjie Fan、Yang Liu

简介：

· This paper proposes EMOGI, aiming to efficiently generate behavior-diverse Game AIs by leveraging EA, PMOO and DRL.

· Empirical results show the effectiveness of EMOGI in creating diverse and complex behaviors.

· To deploy AIs in commercial games, the robustness of the generated AIs is worth investigating as future work [Sun et al, 2020]

4. 论文名称：Solving Hard AI Planning Instances Using Curriculum-Driven Deep Reinforcement Learning

论文链接：

https://www.aminer.cn/pub/5eda19d991e01187f5d6db49/solving-hard-ai-planning-instances-using-curriculum-driven-deep-reinforcement-learning?conf=ijcai2020

作者：Feng Dieqiao、Gomes Carla P.、Selman Bart

简介：

· The authors presented a framework based on deep RL for solving hard combinatorial planning problems in the domain of Sokoban.

· The authors showed the effectiveness of the learning based planning strategy by solving hard Sokoban instances that are out of reach of previous search-based solution techniques, including methods specialized for Sokoban.

· Since Sokoban is one of the hardest challenge domains for current AI planners, this work shows the potential of curriculumbased deep RL for solving hard AI planning tasks.

5. 论文名称：I4R: Promoting Deep Reinforcement Learning by the Indicator for Expressive Representations

论文链接：

https://www.aminer.cn/pub/5ef96b048806af6ef2772128/i-r-promoting-deep-reinforcement-learning-by-the-indicator-for-expressive-representations?conf=ijcai2020

作者：Xufang Luo、Qi Meng、Di He、Wei Chen、Yunhong Wang

简介：

· The authors mainly study the relationship between representations and performance of the DRL agents.

· The authors define the NSSV indicator, i.e, the smallest number of significant singular values, as a measurement for learning representations, the authors verify the positive correlation between NSSV and the rewards, and further propose a novel method called I4R, to improve DRL algorthims via adding the corresponding regularization term to enhance NSSV.

· The authors show the proposed method I4R based on exploratory experiments, including 3 parts, i.e., observations, the proposed indicator NSSV, and the novel algorithm I4R.

6. 论文名称：Rebalancing Expanding EV Sharing Systems with Deep Reinforcement Learning

论文链接：

https://www.aminer.cn/pub/5ef96b048806af6ef2772092/rebalancing-expanding-ev-sharing-systems-with-deep-reinforcement-learning?conf=ijcai2020

作者：Man Luo、Wenzhe Zhang、Tianyou Song、Kun Li、Hongming Zhu、Bowen Du 、Hongkai Wen

简介：

· The authors study the incentive-based rebalancing for continuous expanding EV sharing systems.

· The authors design a simulator to simulate the operation of EV sharing systems, which is calibrated with real data from an actual EV sharing system for a year.

· Extensive experiments have shown that the proposed approach significantly outperforms the baselines and state-of-the-art in both satisfied demand rate and net revenue, and is robust to different levels of system expansion dynamics.

· The authors show that the proposed approach performs consistently with different charging time and EV range.

7. 论文名称：Independent Skill Transfer for Deep Reinforcement Learning

论文链接：

https://www.aminer.cn/pub/5ef96b048806af6ef2772129/independent-skill-transfer-for-deep-reinforcement-learning?conf=ijcai2020

作者：Qiangxing Tian、Guanchu Wang、Jinxin Liu、Donglin Wang、Yachen Kang

简介：

· Deep reinforcement learning (DRL) has wide applications in various challenging fields, such as real-world visual navigation [Zhu et al, 2017], playing games [Silver et al, 2016] and robotic controls [Schulman et al, 2015]

· In this work , the authors propose to learn independent skills for efficient skill transfer, where the learned primitive skills with strong correlation are decomposed into independent skills

· We take the eigenvalues in Figure 1 as an example: for the case of 6 primitive skills, |Z| = 3 is reasonable since more than 98% component of primitive actions can be represented by three independent components

· Effective observation collection and independent skills guarantee the success of low-dimension skill transfer

查看更多IJCAI 2020精彩论文，请移步：

https://www.aminer.cn/conf/ijcai2020/papers

AMiner学术头条

AMiner平台由清华大学计算机系研发，拥有我国完全自主知识产权。系统2006年上线，吸引了全球220个国家/地区800多万独立IP访问，数据下载量230万次，年度访问量1000万，成为学术搜索和社会网络挖掘研究的重要数据和实验平台。

https://www.aminer.cn/

理论深度强化学习

相关数据

深度强化学习技术

强化学习（Reinforcement Learning）是主体（agent）通过与周围环境的交互来进行学习。强化学习主体（RL agent）每采取一次动作（action）就会得到一个相应的数值奖励（numerical reward），这个奖励表示此次动作的好坏。通过与环境的交互，综合考虑过去的经验（exploitation）和未知的探索（exploration），强化学习主体通过试错的方式（trial and error）学会如何采取下一步的动作，而无需人类显性地告诉它该采取哪个动作。强化学习主体的目标是学习通过执行一系列的动作来最大化累积的奖励（accumulated reward）。一般来说，真实世界中的强化学习问题包括巨大的状态空间（state spaces）和动作空间（action spaces），传统的强化学习方法会受限于维数灾难（curse of dimensionality）。借助于深度学习中的神经网络，强化学习主体可以直接从原始输入数据（如游戏图像）中提取和学习特征知识，然后根据提取出的特征信息再利用传统的强化学习算法（如TD Learning，SARSA，Q-Learnin）学习控制策略（如游戏策略），而无需人工提取或启发式学习特征。这种结合了深度学习的强化学习方法称为深度强化学习。

来源：Scholarpedia

人工智能技术

在学术研究领域，人工智能通常指能够感知周围环境并采取行动以实现最优的可能结果的智能体（intelligent agent）

来源：Russell, S., & Norvig, P. (2003). Artificial Intelligence: A Modern Approach.

表征学习技术

在机器学习领域，表征学习（或特征学习）是一种将原始数据转换成为能够被机器学习有效开发的一种技术的集合。在特征学习算法出现之前，机器学习研究人员需要利用手动特征工程（manual feature learning）等技术从原始数据的领域知识（domain knowledge）建立特征，然后再部署相关的机器学习算法。虽然手动特征工程对于应用机器学习很有效，但它同时也是很困难、很昂贵、很耗时、并依赖于强大专业知识。特征学习弥补了这一点，它使得机器不仅能学习到数据的特征，并能利用这些特征来完成一个具体的任务。

来源：Wikipedia

图神经网络技术

图网络即可以在社交网络或其它基于图形数据上运行的一般深度学习架构，它是一种基于图结构的广义神经网络。图网络一般是将底层图形作为计算图，并通过在整张图上传递、转换和聚合节点特征信息，从而学习神经网络基元以生成单节点嵌入向量。生成的节点嵌入向量可作为任何可微预测层的输入，并用于节点分类或预测节点之间的连接，完整的模型可以通过端到端的方式训练。

来源：机器之心

深度神经网络技术

深度神经网络（DNN）是深度学习的一种框架，它是一种具备至少一个隐层的神经网络。与浅层神经网络类似，深度神经网络也能够为复杂非线性系统提供建模，但多出的层次为模型提供了更高的抽象层次，因而提高了模型的能力。

来源：机器之心 Techopedia

OpenAI Gym技术

OpenAI Gym是用于开发和比较强化学习算法的工具包。

来源：OpenAI Gym