Intelligent Automation and Soft Computing, 17(1) 71-94, 2011 Peer-reviewedLead author
This paper presents a variational Bayes (VB) method for normalized Gaussian network, which is a mixture model of local experts. Based on the Bayesian framework, we introduce a meta-learning mechanism to optimize the prior distribution and the model structure. In order to search for the optimal model structure efficiently, we also develop a hierarchical model selection method. The performance of our method is evaluated by using function approximation problems and an system identification problem of a nonlinear dynamical system. Experimental results show that our Bayesian framework results in the reduction of generalization error and achieves better function approximation than existing methods within the finite mixtures of experts family whim the number of training data is fairly small.
Most conventional policy gradient reinforcement learning (PGRL) algorithms neglect (or do not explicitly make use of) a term in the average reward gradient with respect to the policy parameter. That term involves the derivative of the stationary state distribution that corresponds to the sensitivity of its distribution to changes in the policy parameter. Although the bias introduced by this omission can be reduced by setting the forgetting rate. for the value functions close to 1, these algorithms do not permit. to be set exactly at gamma = 1. In this article, we propose a method for estimating the log stationary state distribution derivative (LSD) as a useful form of the derivative of the stationary state distribution through backward Markov chain formulation and a temporal difference learning framework. A new policy gradient (PG) framework with an LSD is also proposed, in which the average reward gradient can be estimated by setting gamma = 0, so it becomes unnecessary to learn the value functions. We also test the performance of the proposed algorithms using simple benchmark tasks and show that these can improve the performances of existing PG methods.
Corticostriatal synapse plasticity of medium spiny neurons is regulated by glutamate input from the cortex and dopamine input from the substantia nigra. While cortical stimulation alone results in long-term depression (LTD), the combination with dopamine switches LTD to long-term potentiation (LTP), which is known as dopamine-dependent plasticity. LTP is also induced by cortical stimulation in magnesium-free solution, which leads to massive calcium influx through NMDA-type receptors and is regarded as calcium-dependent plasticity. Signaling cascades in the corticostriatal spines are currently under investigation. However, because of the existence of multiple excitatory and inhibitory pathways with loops, the mechanisms regulating the two types of plasticity remain poorly understood. A signaling pathway model of spines that express D1-type dopamine receptors was constructed to analyze the dynamic mechanisms of dopamine- and calcium-dependent plasticity. The model incorporated all major signaling molecules, including dopamine- and cyclic AMP-regulated phosphoprotein with a molecular weight of 32 kDa (DARPP32), as well as AMPA receptor trafficking in the post-synaptic membrane. Simulations with dopamine and calcium inputs reproduced dopamine- and calcium-dependent plasticity. Further in silico experiments revealed that the positive feedback loop consisted of protein kinase A (PKA), protein phosphatase 2A (PP2A), and the phosphorylation site at threonine 75 of DARPP-32 (Thr75) served as the major switch for inducing LTD and LTP. Calcium input modulated this loop through the PP2B (phosphatase 2B)-CK1 (casein kinase 1)-Cdk5 (cyclin-dependent kinase 5)-Thr75 pathway and PP2A, whereas calcium and dopamine input activated the loop via PKA activation by cyclic AMP (cAMP). The positive feedback loop displayed robust bi-stable responses following changes in the reaction parameters. Increased basal dopamine levels disrupted this dopamine-dependent plasticity. The present model elucidated the mechanisms involved in bidirectional regulation of corticostriatal synapses and will allow for further exploration into causes and therapies for dysfunctions such as drug addiction.
Alan Fermin, Takehiko Yoshida, Makoto Ito, Junichiro Yoshimoto, Kenji Doya
JOURNAL OF MOTOR BEHAVIOR, 42(6) 371-379, 2010 Peer-reviewed
In this article, the authors examine whether and how humans use model-free, reflexive strategies and model-based, deliberative strategies in motor sequence learning. They asked subjects to perform the grid-sailing task, which required moving a cursor to different goal positions in a 5 x 5 grid using different key-mapping (KM) rules between 3 finger keys and 3 cursor movement directions. The task was performed under 3 conditions: Condition 1, new KM; Condition 2, new goal position with learned KM; and Condition 3, learned goal position with learned KM; with or without prestart delay time. The performance improvement with prestart delay was significantly larger under Condition 2. This result provides evidence that humans implement a model-based strategy for sequential action selection and learning by using previously learned internal model of state transition by actions.
The striatum is the input nucleus of the basal ganglia and is thought to be involved in reinforcement learning. The striatum receives glutamate input from the cortex, which carries sensory information, and dopamine input from the substantia nigra, which carries reward information. Dopamine-dependent plasticity of cortico-striatal synapses is supposed to play a critical role in reinforcement learning. Recently, a number of labs reported contradictory results of its dependence on the timing of cortical inputs and spike output. To clarify the mechanisms behind spike timing-dependent plasticity of striatal synapses, we investigated spike timing-dependence of intracellular calcium concentration by constructing a striatal neuron model with realistic morphology. Our simulation predicted that the calcium transient will be maximal when cortical spike input and dopamine input precede the postsynaptic spike. The gain of the calcium transient is enhanced during the "up-state" of striatal cells and depends critically on NMDA receptor currents.
The free-energy-based reinforcement learning is a new approach to handling high-dimensional states and actions. We investigate its properties using a new experimental platform called the digit floor task. In this task, the high-dimensional pixel data of hand-written digits were directly used as sensory inputs to the reinforcement learning agent. The simulation results showed the robustness of the free-energy-based reinforcement learning method against noise applied in both the training and testing phases. In addition, reward-dependent sensory representations were found in the distributed activation patterns of hidden units. The representations coded in a distributed fashion persisted even when the number of hidden nodes were varied.
We investigate the properties of free-energy-based reinforcement learning using a new experimental platform called the digit floor task. The simulation results showed the robustness of the reinforcement learning method against noise applied in both the training and testing phases. In addition, reward-dependent and reward-invariant representations were found in the distributed activation patterns of hidden units. The representations coded in a distributed fashion persisted even when the number of hidden nodes were varied.
NEURAL INFORMATION PROCESSING, PART I, 4984 614-624, 2008 Peer-reviewed
We present a Bayesian method for the system identification of molecular cascades in biological systems. The contribution of this study is to provide a theoretical framework for unifying three issues: 1) estimating the most likely parameters; 2) evaluating and visualizing the confidence of the estimated parameters; and 3) selecting the most likely structure of the molecular cascades from two or more alternatives. The usefulness of our method is demonstrated in several benchmark tests.
MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, PART II, PROCEEDINGS, 5212 82-97, 2008 Peer-reviewed
The parameter space of a statistical learning machine has a Riemannian metric structure in terms of its objective function. Amari [1] proposed the concept of "natural gradient" that takes the Riemannian metric of the parameter space into account. Kakade [2] applied it to policy gradient reinforcement learning, called a natural policy gradient (NPG). Although NPGs evidently depend on the underlying Riemannian metrics, careful attention was not paid to the alternative choice of the metric in previous studies. In this paper, we propose a Riemannian metric for the joint distribution of the state-action, which is directly linked with the average reward, and derive a new NPG named "Natural State-action Gradient" (NSG). Then, we prove that NSG can be computed by fitting a certain linear model into the immediate reward function. In numerical experiments, we verify that the NSG learning can handle MDPs with a large number of states, for which the performances of the existing (N)PG methods degrade.
An important character of on-line learning is its potential to adapt to changing environments by properly adjusting meta-parameters that control the balance between plasticity and stability of the learning model. In our previous study, we proposed a learning scheme to address changing environments in the framework of an on-line variational Bayes (VB), which is an effective on-line learning scheme based on Bayesian inference. The motivation of that work was, however, its implications for animal learning, and the formulation of the learning model was heuristic and not theoretically justified. In this article, we propose a new approach that balances the plasticity and stability of on-line VB learning in a more theoretically justifiable manner by employing the principle of hierarchical Bayesian inference. We present a new interpretation of on-line VB as a special case of incremental Bayes that allows the hierarchical Bayesian setting to balance the plasticity and stability as well as yielding a simple learning rule compared to standard on-line VB. This dynamic on-line VB scheme is applied to probabilistic PCA as an example of probabilistic models involving latent variables. In computer simulations using artificial data sets, the new on-line VB learning shows robust performance to regulate the balance between plasticity and stability, thus adapting to changing environments. (c) 2006 Elsevier B.V. All rights reserved.
YUKINAWA NAOTO, YOSHIMOTO JUNICHIRO, OBA SHIGEYUKI, ISHII SHIN
情報処理学会論文誌. 数理モデル化と応用, 46(10) 57-65, Jun 15, 2005 Peer-reviewed
Several methods based on state space models have been proposed for analyzing dynamics of gene expression. Existing analysis methods can detect false noisy internal variables which seem to have no dynamics in state space because the methods do not assume any dynamics with system noise and observation noise. In this study, we propose a linear dynamical system model in which state variables and observation variables are generated by Gaussian white noise process and provide a variational Bayes inference for the model. We first show effectiveness of our method when applied to a synthesized noisy...
The IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences (Japanese edition) A, 88(5) 646-657, May 1, 2005 Peer-reviewed
A brain needs to detect an environmental change and to quickly learn internal representations necessary in a new environment. This paper presents a theoretical model of cortical representation learning that can adapt to dynamic environments, incorporating the results by previous studies on the functional role of acetylcholine (ACh). We adopt the probabilistic principal component analysis (PPCA) as a functional model of cortical representation learning, and present an on-line learning method for PPCA according to Bayesian inference, including a heuristic criterion for model selection. Our approach is examined in two types of simulations with synthesized and realistic data sets, in which our model is able to re-learn new representation bases after the environment changes. Our model implies the possibility that a higher-level recognition regulates the cortical ACh release in the lower-level, and that the ACh level alters the learning dynamics of a local circuit in order to continuously acquire appropriate representations in a dynamic environment.
Transactions of the Institute of Systems, Control and Information Engineers, 16(5) 209-217, May 15, 2003 Peer-reviewedLead author
In this paper, we propose a new reinforcement learning (RL) method for dynamical systems that have continuous state and action spaces. Our RL method has an architecture like the actorcritic model. The critic tries to approximate the Q-function, and the actor tries to approximate a stochastic soft-max policy dependent on the Q-function. An on-line EM algorithm is used to train the critic and the actor. We apply this method to two control problems. Computer simulations in two tasks show that our method is able to acquire good control after a few learning trials.
In reinforcement learning (RL), the duality between exploitation and exploration has long been an important issue. This paper presents a new method that controls the balance between exploitation and exploration. Our learning scheme is based on model-based RL, in which the Bayes inference with forgetting effect estimates the state-transition probability of the environment. The balance parameter, which corresponds to the randomness in action selection, is controlled based on variation of action results and perception of environmental change. When applied to maze tasks, our method successfully obtains good controls by adapting to environmental changes. Recently, Usher et al. [Science 283 (1999) 549] has suggested that noradrenergic neurons in the locus coeruleus may control the exploitation-exploration balance in a real brain and that the balance may correspond to the level of animal's selective attention. According to this scenario, we also discuss a possible implementation in the brain. (C) 2002 Elsevier Science Ltd. All rights reserved.
Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks. IJCNN 2000. Neural Computing: New Challenges and Perspectives for the New Millennium, 2000 Peer-reviewedLead author
Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research on Innovative Areas (Research in a proposed research area), Japan Society for the Promotion of Science, Apr, 2010 - Mar, 2015
ISHII Shin, OBA Shigeyuki, MAEDA Shinichi, YOSHIMOTO Junichiro, SAKUMURA Yuichi