GRID and The University of Electro-Communications presented multiple collaborative research results at JSAI2018, including “Smart Grid Optimization by Deep Reinforcement Learning over Discrete and Continuous Action Space”, “Hybrid Policy Gradient for Deep Reinforcement Learning”, “Using feature graphs to develop generalized CNN deep learning methods”, among others.
■Smart Grid Optimization by Deep Reinforcement Learning over Discrete and Continuous Control Space
Tomah Sogabe*, Dinesh Bahadur Malla, Shota Takayama, Shinji Yokogawa, Katsuyoshi Sakamoto, Koichi Yamaguchi, Thakur Praveen Singh, Masaru Sogabe
Energy optimization in smart grid has gradually shifted to agent-based machine learning method represented by the state of art deep learning and deep reinforcement learning. Especially deep neural network based reinforcement learning methods are emerging and gain popularity to for smart grid application. In this work, we have applied the applied two deep reinforcement learning algorithms designed for both discrete and continuous action space. These algorithms were well embedded in a rigorous physical model using Simscape Power SystemsTM (Matlab/SimulinkTM Environment) for smart grid optimization. The results showed that the agent successfully captured the energy demand and supply feature in the training data and learnt to choose behavior leading to maximize its reward.
■Hybrid Policy Gradient for Deep Reinforcement Learning
Thakur Praveen Singh1, Masaru Sogabe1, Katsuyoshi Sakamoto2, Koichi Yamaguchi2, Dinesh Bahadur Malla2, Shinji Yokogawa2, Tomah Sogabe*1,2
1 GRID Inc. 2 i-PERC, The University of Electro-communications
We propose an alternative way of updating the actor in Deep Deterministic Policy Gradient (DDPG) algorithm. In our proposed Hybrid-DDPG (shortly H-DDPG), policy parameters are moved based on TD-error of critic. Once among 5 trial runs on PendulumSwingup-v1-environment, reward obtained at the early stage of training in H-DDPG is higher than DDPG. This results in 1) higher reward than DDPG 2) pushes the policy parameters to move in a direction such that the actions with higher reward likely to occur more than the other. This implies if the policy explores at early stages good rewards, the policy may converge quickly otherwise vice versa.
高橋彗, 沼尻 匠, 曽我部 完, 坂本克好, 山口浩一, 横川慎二, 曽我部東馬*
we propose a method of applying Convolutional Neural Networks (CNN) to non – image data. CNN has been successful in many fields such as image processing and speech recognition. On the other hand, it was difficult to adapt CNN to non-image data such as a csv file. The sequence of the data of the low dimensional grid structure such as the image has a meaning, and the CNN recognizes the order as the feature of the image and processes it. Therefore, CNN could not perform feature recognition on non-image data whose structure can be changed. We focused on a method to make CNN applicable by giving meaning to the sequence of non – image data, and demonstrated by adding improvements.
黄川田優太, 坂本克好, 山口浩一, Thakur Praveen Singh, 曽我部完, 横川慎二,曽我部東馬*
Classical Autoencoder is used for feature extraction of the image data and take characteristics, called prior learning. Here we propose quantum Autoencoder using quantum annealing. Quantum annealing is optimization technique by adding a wide magnetic field to spin system expressed in Ising model. In order to solve the optimization problem using quantum annealing, it is necessary to convert the problem into a combination problem of two values(±1) as direction of the spin(up spin:+1, down spin:-1 ). In this paper, we explain feature extraction learning method from the image using quantum Autoencoder. We showed the successful featuring extraction in terms of the coupling coefficient J_ij from the original image data . Finally, we applied the extracted and learnt J_ij to the original data added with unknown noise and successfully removed the noise by the transfer learning process.