Elsevier

Robotics and Autonomous Systems

Volume 112, February 2019, Pages 72-83
Robotics and Autonomous Systems

Deep reinforcement learning with smooth policy update: Application to robotic cloth manipulation

https://doi.org/10.1016/j.robot.2018.11.004Get rights and content
Under a Creative Commons license
open access

Highlights

  • Learning of cloth manipulation by Deep Reinforcement Learning by a dual-arm robot.

  • Combine smooth policy update with feature extraction in deep neural networks.

  • Propose new Deep Reinforcement Learning based on Dynamic Policy Programming.

  • Achieved better sample efficiency than comparisons by smooth policy update.

Abstract

Deep Reinforcement Learning (DRL), which can learn complex policies with high-dimensional observations as inputs, e.g., images, has been successfully applied to various tasks. Therefore, it may be suitable to apply them for robots to learn and perform daily activities like washing and folding clothes, cooking, and cleaning since such tasks are difficult for non-DRL methods that often require either (1) direct access to state variables or (2) well-designed hand-engineered features extracted from sensory inputs. However, applying DRL to real robots remains very challenging because conventional DRL algorithms require a huge number of training samples for learning, which is arduous in real robots. To alleviate this dilemma, in this paper, we propose two sample efficient DRL algorithms: Deep P-Network (DPN) and Dueling Deep P-Network (DDPN). The core idea is to combine the nature of smooth policy update with the capability of automatic feature extraction in deep neural networks to enhance the sample efficiency and learning stability with fewer samples. The proposed methods were first investigated by a robot-arm reaching task in the simulation that compared previous DRL methods and applied to two real robotic cloth manipulation tasks: (1) flipping a handkerchief and (2) folding a t-shirt with a limited number of samples. All the results suggest that our method outperformed the previous DRL methods.

MSC

00-01
99-00

Keywords

Deep reinforcement learning
Robotic cloth manipulation
Dynamic policy programming

Cited by (0)

Yoshihisa Tsurumine received his B.E. in Advanced Course of Production Systems Engineering from National Institute of Technology, Ube College, Yamaguci, Japan, in 2016 and his M.E. in information science from the Nara Institute of Science and Technology, Nara, Japan, in 2018. His research interests are robot control using machine learning.

Yunduan Cui was born in China in 1990. He is currently undertaking a research assistant professor at Nara Institute of Science and Technology, Japan. He received Ph.D. in information science from Nara Institute of Science and Technology, Japan in September 2017, M.E in computer science from Doshisha University, Japan in September 2014, and the B.E. in Electronic Engineering from Xidian University, China in 2012. His research interests are machine learning and control theory of robotics, especially reinforcement learning in robot control.

Eiji Uchibe received his B.S. in 1994, M.S. in 1996, and Ph.D. in 1999, at Osaka University. He worked as a research associate of the Japan Society for the Promotion of Science, in the Research for the Future Program titled Cooperative Distributed Vision for Dynamic Three-Dimensional Scene Understanding. Then he joined ATR as a researcher in 2001. Since 2004 he has been a group leader of Adaptive Systems Group at the Neural Computation Unit, Okinawa Institute of Science and Technology Graduate University. He joined Department of Brain Robot Interface,ATR Computational Neuroscience Laboratories in 2015 as Principal Researcher. His research interests are in learning robots, reinforcement learning, evolutionary computation, and computational neuroscience, and their applications.

Takamitsu Matsubara received his B.E. in electrical and electronic systems engineering from Osaka Prefecture University, Osaka, Japan, in 2003, an M.E. in information science from the Nara Institute of Science and Technology, Nara, Japan, in 2005, and a Ph.D. in information science from the Nara Institute of Science and Technology, Nara, Japan, in 2007. From 2005 to 2007, he was a research fellow (DC1) of the Japan Society for the Promotion of Science. From 2013 to 2014, he is a visiting researcher of the Donders Institute for Brain Cognition and Behavior, Radboud University Nijmegen, Nijmegen, The Netherlands. He is currently an associate professor at the Nara Institute of Science and Technology and a visiting researcher at the ATR Computational Neuroscience Laboratories, Kyoto, Japan. His research interests are machine learning and control theory for robotics.