CHEN Chih-Chieh, 斯波 廣大, 曽我部 完, 坂本 克好, 曽我部 東馬
※当論文は、人工知能学会2020の国際セッションにおけるトップ34の内の1本に、さらにシュプリンガー・ネイチャー発行の「Advances in Artificial Intelligence」に選出されました
Solving large-scale systems of linear equations is an important part in many artificial intelligence applications, especially for dynamic programing which is heavily used in the reinforcement learning field. The arrival of noisy-intermediate-scalequantum computers provides new opportunities to solve linear systems at larger scales. The hybrid quantum-classical linear solver using Ulam-von Neumann (UvN) method was demonstrated previously. In this work, we apply the hybrid quantumclassical UvN linear solver to the dynamic programming where the state value function or state-action value function V(orQ)=(1−γP)−¹R(where is γ is discount rate, P is state transition matrix and R is reward) to be solved. Systematic circuit extensions beyond unistochastic matrices are developed based on the idea of linear combination of unitary operators and quantum random walks. Numerical examples for some benchmark tasks are demonstrated.
木村 友彰, Dinesh MALLA, 曽我部 完, 坂本 克好, 山口 浩一, 曽我部 東馬
Time series anomaly detection methods are applied for various fields. These methods basically assume distributions for data and users need to set threshold to detect anomality. Otherwise in reinforcement learning, an agent can learn desirable action through interaction with environment and the agent doesn’t need to know environment. By applying reinforcement learning for anomaly detection, it is possible to detect anomality from trial and error without assumptions. In this paper to deal with time series, we performed anomaly detection using Partially Observable MDP(POMDP). Furthermore, we compared accuracy by changing LSTM steps.
斯波 廣大, Chih-Chieh CHEN, 曽我部 完, 坂本 克好, 山口 浩一, 曽我部 東馬
Currently, variational quantum algorithms, such as VQE and QAOA, which use a gate-type quantum computer, are attracting attention as a solution to the real-world combinatorial optimization problem. The calculation method of the combinatorial optimization problem using the variational quantum algorithm has higher extensibility than the calculation method using the quantum annealing, and may be able to cope with various kinds of optimization problems. However, there are still few usecases of QAOA in cases closer to real-world problems. In this paper, we used QAOA, quantum adiabatic algorithm (QAA), Gurobi, VQE, and QBsolv to solve several examples and compare the results. In addition, we adjusted the hyperparameters and optimizer, and investigated the changes in the ability to find an exact solution for QAOA. As a result, it was found that QAOA had instability in computing exact solutions, and required substantial improvement of the algorithm rather than adjusting hyperparameters and optimization methods.
MALLA Dinesh, 木村 友彰, 曽我部 完, 坂本 克好, 山口 浩一, 曽我部 東馬
Anomaly detection is widely applied in a variety of domains, involving, for instance, smart home systems, network traffic monitoring, IoT applications, and sensor networks. In a safety-critical environment, it is crucial to have an automatic detection system to screen the streaming data gathered by monitoring sensors and report abnormal observations if detected in real-time. Oftentimes, stakes are much higher when these potential anomalies are intentional or goal-oriented. We assume that abnormality detection is a great challenge, especially without labels, to maximize the confidence level of the decision and minimize the stopping time concurrently. We propose an end- to-end framework for sequential anomaly detection using inverse reinforcement learning (IRL), whose objective is to determine the decision-making agent’s underlying function which triggers agent detection behavior. The proposed method takes the sequence of state of a target source and other meta information as input. The agent’s normal behavior is then understood by the reward function, which is inferred via a Bayesian approach for IRL. We use a neural network to represent a reward function. We firstly checked the Bayesian IRL reward using a Gym classic game environment and We also present results using the Numenta Anomaly Benchmark (NAB), a benchmark containing real-world data streams with labeled anomalies.
SUNG Jaebok, 高橋 慧, MALLA Dinesh, 坂本 克好, 山口 浩一, 曽我部 東馬
Recently, many of the manufacturing systems adopt AGV(Automated Guided Vehicle) to respond to diversifying needs in the manufacturing industry. However, it is difficult to solve the optimization problem of the AGV transport system by using mathematical optimization. In this study, we use Deep Q Network, one of the method of deep reinforcement learning, to optimize AGV transport system. After train the neural network that decide the behavior of the AGVs by the simulation of practice model, evaluate it by comparing with the result of rule-based simulation and applying trained neural network to the testing model.
SOGABE Reed, 木村 友彰, MALLA Dinesh, 曽我部 完, 坂本 克好, 曽我部 東馬
※当論文は、人工知能学会2020の国際セッションにおけるトップ21の内の1本に、さらにシュプリンガー・ネイチャー発行の「Advances in Artificial Intelligence」に選出されました
In traditional reinforcement learning, the exploration has always relied on the use of a single clearly identified reward. However, when this is applied to robotics or real-world tasks, it could prove to be a challenge as the exploration is done in environments where there is a sparse reward. The two main contributing factors behind this is the low probability of the agent encountering the reward when random exploration is done, and the complicated nature of the surroundings in real-world applications. With machine learning already imposing a strong presence in the field of engineering, we turn to robotics to explore how demonstrations could be used to tackle this exploration hurdle. Although the demonstration data, or “teacher data” is commonly placed together with its original data in the action state value function, this could prove to be a problem as it could result in overlearning or under-learning, where the demonstrator’s data plays a much larger role or smaller role than intended. Thus, in order to alleviate this issue, inverse reinforcement learning (IRL) is used to calculate the reward, a hidden feature, from the demonstrator’s behavior trajectory(s,a),which is recorded from a VR embedded Gazebo environment . Meanwhile, in order to further tackle the learning issue, a bootstrapping Baysian inverse reinforcement learning scheme is proposed to obtain the distribution of reward, instead of the single reward at maximum likelihood.
曽我部 東馬, 斯波 廣大, 坂本 克好, 山口 浩一, Dinesh MALLA
Variational Quantum Eigensolver (VQE) has been devised which solves large-scale eigenvalue problems by calculating quantum computers and classical computers alternately. Nelder-Mead method is mainly used as a method of optimizing eigenvalues. However, there is a disadvantage that a global optimum solution cannot always be searched. In this paper, we used two kinds of optimization methods, Particle Swarm Optimization (PSO) and Quantum Behaved Particle Swarm Optimization (QPSO) as an alternative method. As a result, we found that the performance of PSO and QPSO is better than that of Nelder-Mead method for the eigenvalue optimization method in VQE algorithm. Furthermore, we found that the relative error when using QPSO is the smallest among the three optimization methods. Meanwhile, we are also incorporating quantum-bit particle swarm optimization (QBPSO) into the eigenvalue optimization method of the VQE algorithm, aiming to realize an eigenvalue search algorithm solely using quantum circuits.
斯波 廣大, 袴田 紘斗, 坂本 克好, 山口 浩一, 曽我部 東馬
The quantum gate type quantum computer has high versatility and can be expected to be put into practical use in short future. However, the quantum bits of the quantum gate type quantum computer are very weak against external interference, and it is difficult to maintain the quantum state for a long time. Therefore, in the currently developed quantum computer, the number of quantum bits is limited, so it is difficult to calculate large scale and high dimensional data. In this paper, as a solution to this problem, we proposed a computation method that applies convolution filter, which is one of the methods used in machine learning, to quantum computation. Furthermore, as a result of applying this method to the quantum auto encoder, we found the effectiveness by applying convolution filter constituted several qubits to the data made of several hundred qubits or more under the autoencoding accuracy of 98%.
Malla Dinesh, Tomoyuki Hioki, Takahashi Kei, Sogabe Masaru, Sakamoto Katsuyoshi, Yamaguchi Koichi, Sogabe Tomah
Recently deep neural network-based reinforcement learning (DRL) methods, which demonstrated unprecedented success in game and robotic control, are gradually gaining attention to solve the combinatory optimization problem. However, effective operation in smart grid system has to be submitted to various constraints such as power demand-supply relation, lower and upper bound of battery electricity, market price etc. Because of these constraints, DRL algorithm is not efficient to get an optimized result. In this paper we address this issue by developing an attention-masking extended deep Q network （AME-DQN） reinforcement learning algorithm. Special focus was lied on the prediction ability of the trained AME-DQN model given various weather conditions and demand profile. These results were further compared with MILP results and finally we demonstrate that the AME-DQN are able to predict optimized actions which satisfy all the constraints while the MILP failed to meet the conditions in most of the cases.
日置 智之, Sogabe Tomah, Malla Dinesh, Takahashi Kei, Sogabe Masaru, Sakamoto Katsuyoshi, Yamaguchi Kouichi
Multi-carrier energy hub has provided more flexibility for energy management systems. On the other hand, due to the mutual impact of different energy carriers in an energy hub’s energy management becomes more challengeable. For energy management purpose Mathematic optimization tools are used, but real-time optimization challenges the optimal management. On the other hand, energy demand and supply are very changeable so optimization objectives may vary or more than one. For real-time management, changing environment and multi-objective options AI is purposed. In this work operation of multi-carrier energy hub optimization has been solved by executing a multiagent AI algorithm, which contain deep deterministic policy gradient(DDPG) algorithm. Research multi-agent simulation results show that AI agent can manage a balance between demand and supply, proper charging and discharging of storage agent to optimize energy hub cost. It also describes the price determination method by using AI, which is good for demand and supply management purpose for a market.
渡部 雅也, 楊 坤, Dinesh Malla, 坂本 克好, 山口 浩一, 曽我部 東馬
Deep Learning and Reinforcement Learning are developing rapidly in recent years. A lot of researches which apply deep reinforcement learning to the field such as game and robot control have generated great success. In this paper, we examine the possibility of adopting AlphaZero, an reinforcement learning algorithm demonstrates an unprecedented level of versatility for an game AI, to optimal control problems and gain insight on its ability to control the actions under noisy environment that is difficult to handle by using conventional control mechanism.
高橋 慧, 坂本 克好, 山口 浩一, 沼尻 匠, 曽我部 完, 曽我部 東馬
In this paper, we study the data clustering in a high dimensional space based on density spheres for traffic data sets with many samples and features, and predict traffic congestion by creating a distance matrix from features with Density Sphere GraphCNN. Density spheres represent the density which serves as a reference for clustering data in a high dimensional space, and it is possible to investigate the relationship of data by considering both data correlation and distance. A mechanism to realize highly accurate congestion prediction will be studied based on the result of predicting the degree of congestion by combining traffic simulation model, which reproduces congestion and compares the prediction accuracy by varying the volume of density balls
木村 友彰, 渡部 雅也, 坂本 克好, 山口 浩一, Malla Dinesh, 曽我部 東馬
In multigoal reinforcement learning, Universal Value Function Approximators(UVFA) that takes not only a state but also a goal for inputs is used. We designed a task by bringing the end effector of the 7DOF robot arm to the goals using UVFA based multigoal reinforcement learning, Meanwhile, we performed the equivalent task by changing the number of goals. We confirmed a superb prediction ability by mapping the goal reachability degree using UVFA.
曽我部 東馬, Dinesh Malla, 高山 将太, 坂本 克好, 山口 浩一, Singh Thakur, 曽我部 完
Energy optimization in smart grid has gradually shifted to agent-based machine learning method represented by the state of art deep learning and deep reinforcement learning. Especially deep neural network based reinforcement learning methods are emerging and gain popularity to for smart grid application. In this work, we have applied the applied two deep reinforcement learning algorithms designed for both discrete and continuous action space. These algorithms were well embedded in a rigorous physical model using Simscape Power SystemsTM (Matlab/SimulinkTM Environment) for smart grid optimization. The results showed that the agent successfully captured the energy demand and supply feature in the training data and learnt to choose behavior leading to maximize its reward.
Praveen singh THAKUR, Masaru SOGABE, Katsuyoshi SAKAMOTO, Koichi YAMAGUCHI, Dinesh Bahadur MALLA, Shinji YOKOGAWA, Tomah SOGABE
We propose an alternative way of updating the actor in Deep Deterministic Policy Gradient (DDPG) algorithm. In our proposed Hybrid-DDPG (shortly H-DDPG), policy parameters are moved based on TD-error of critic. Once among 5 trial runs on PendulumSwingup-v1-environment, reward obtained at the early stage of training in H-DDPG is higher than DDPG. This results in 1) higher reward than DDPG 2) pushes the policy parameters to move in a direction such that the actions with higher reward likely to occur more than the other. This implies if the policy explores at early stages good rewards, the policy may converge quickly otherwise vice versa.
高橋彗, 沼尻 匠, 曽我部 完, 坂本克好, 山口浩一, 横川慎二, 曽我部東馬
we propose a method of applying Convolutional Neural Networks (CNN) to non – image data. CNN has been successful in many fields such as image processing and speech recognition. On the other hand, it was difficult to adapt CNN to non-image data such as a csv file. The sequence of the data of the low dimensional grid structure such as the image has a meaning, and the CNN recognizes the order as the feature of the image and processes it. Therefore, CNN could not perform feature recognition on non-image data whose structure can be changed. We focused on a method to make CNN applicable by giving meaning to the sequence of non – image data, and demonstrated by adding improvements.
黄川田優太, 坂本克好, 山口浩一, Thakur Praveen Singh, 曽我部完, 横川慎二,曽我部東馬
Classical Autoencoder is used for feature extraction of the image data and take characteristics, called prior learning. Here we propose quantum Autoencoder using quantum annealing. Quantum annealing is optimization technique by adding a wide magnetic field to spin system expressed in Ising model. In order to solve the optimization problem using quantum annealing, it is necessary to convert the problem into a combination problem of two values(±1) as direction of the spin(up spin:+1, down spin:-1 ). In this paper, we explain feature extraction learning method from the image using quantum Autoencoder. We showed the successful featuring extraction in terms of the coupling coefficient J_ij from the original image data . Finally, we applied the extracted and learnt J_ij to the original data added with unknown noise and successfully removed the noise by the transfer learning process.