TITLE:
Quantum Multiple Q-Learning
AUTHORS:
Michael Ganger, Wei Hu
KEYWORDS:
Quantum Computing, Reinforcement Learning, Q-Learning
JOURNAL NAME:
International Journal of Intelligence Science,
Vol.9 No.1,
January
16,
2019
ABSTRACT: In this paper, a
collection of value-based quantum reinforcement learning algorithms are
introduced which use Grover’s algorithm to update the policy, which is stored
as a superposition of qubits associated with each possible action, and their
parameters are explored. These algorithms may be grouped in two classes, one
class which uses value functions (V(s)) and new class which
uses action value functions (Q(s,a)). The
new (Q(s,a))-based quantum algorithms
are found to converge faster than V(s)-based algorithms, and
in general the quantum algorithms are found to converge in fewer iterations
than their classical counterparts, netting larger returns during training. This
is due to fact that the (Q(s,a)) algorithms are more
precise than those based on V(s), meaning that updates
are incorporated into the value function more efficiently. This effect is also
enhanced by the observation that the Q(s,a)-based algorithms may
be trained with higher learning rates. These algorithms are then extended by
adding multiple value functions, which are observed to allow larger learning
rates and have improved convergence properties in environments with stochastic
rewards, the latter of which is further improved by the probabilistic nature of
the quantum algorithms. Finally, the quantum algorithms were found to use less
CPU time than their classical counterparts overall, meaning that their benefits
may be realized even without a full quantum computer.