研究者業績

吉本 潤一郎

ヨシモト ジュンイチロウ  (Junichiro Yoshimoto)

基本情報

所属
藤田医科大学 医学部 医学科 教授
株式会社国際電気通信基礎技術研究所 脳情報通信総合研究所 客員研究員
奈良先端科学技術大学院大学 先端科学技術研究科情報科学領域 客員教授
学位
博士(工学)(2002年9月 奈良先端科学技術大学院大学)

J-GLOBAL ID
200901094345074980
researchmap会員ID
1000301373

論文

 79
  • Takashi Nakano, Junichiro Yoshimoto, Kenji Doya
    Frontiers in Computational Neuroscience 7 119 2013年9月13日  査読有り
    The dopamine-dependent plasticity of the cortico-striatal synapses is considered as the cellular mechanism crucial for reinforcement learning. The dopaminergic inputs and the calcium responses affect the synaptic plasticity by way of the signaling cascades within the synaptic spines. The calcium concentration within synaptic spines, however, is dependent on multiple factors including the calcium influx through ionotropic glutamate receptors, the intracellular calcium release by activation of metabotropic glutamate receptors, and the opening of calcium channels by EPSPs and back-propagating action potentials. Furthermore, dopamine is known to modulate the efficacies of NMDA receptors, some of the calcium channels, and sodium and potassium channels that affect the back propagation of action potentials. Here we construct an electric compartment model of the striatal medium spiny neuron with a realistic morphology and predict the calcium responses in the synaptic spines with variable timings of the glutamatergic and dopaminergic inputs and the postsynaptic action potentials. The model was validated by reproducing the responses to current inputs and could predict the electric and calcium responses to glutamatergic inputs and back-propagating action potential in the proximal and distal synaptic spines during up- and down-states. We investigated the calcium responses by systematically varying the timings of the glutamatergic and dopaminergic inputs relative to the action potential and found that the calcium response and the subsequent synaptic potentiation is maximal when the dopamine input precedes glutamate input and action potential. The prediction is not consistent with the hypothesis that the dopamine input provides the reward prediction error for reinforcement learning. The finding suggests that there is an unknown learning mechanisms at the network level or an unknown cellular mechanism for calcium dynamics and signaling cascades. © 2013 Nakano, Yoshimoto and Doya.
  • Junichiro Yoshimoto, Masa-Aki Sato, Shin Ishii
    Intelligent Automation and Soft Computing 17(1) 71-94 2011年  査読有り筆頭著者
    This paper presents a variational Bayes (VB) method for normalized Gaussian network, which is a mixture model of local experts. Based on the Bayesian framework, we introduce a meta-learning mechanism to optimize the prior distribution and the model structure. In order to search for the optimal model structure efficiently, we also develop a hierarchical model selection method. The performance of our method is evaluated by using function approximation problems and an system identification problem of a nonlinear dynamical system. Experimental results show that our Bayesian framework results in the reduction of generalization error and achieves better function approximation than existing methods within the finite mixtures of experts family whim the number of training data is fairly small.
  • Tetsuro Morimura, Eiji Uchibe, Junichiro Yoshimoto, Jan Peters, Kenji Doya
    NEURAL COMPUTATION 22(2) 342-376 2010年2月  査読有り
    Most conventional policy gradient reinforcement learning (PGRL) algorithms neglect (or do not explicitly make use of) a term in the average reward gradient with respect to the policy parameter. That term involves the derivative of the stationary state distribution that corresponds to the sensitivity of its distribution to changes in the policy parameter. Although the bias introduced by this omission can be reduced by setting the forgetting rate. for the value functions close to 1, these algorithms do not permit. to be set exactly at gamma = 1. In this article, we propose a method for estimating the log stationary state distribution derivative (LSD) as a useful form of the derivative of the stationary state distribution through backward Markov chain formulation and a temporal difference learning framework. A new policy gradient (PG) framework with an LSD is also proposed, in which the average reward gradient can be estimated by setting gamma = 0, so it becomes unnecessary to learn the value functions. We also test the performance of the proposed algorithms using simple benchmark tasks and show that these can improve the performances of existing PG methods.
  • Takashi Nakano, Tomokazu Doi, Junichiro Yoshimoto, Kenji Doya
    PLOS COMPUTATIONAL BIOLOGY 6(2) e1000670 2010年2月  査読有り
    Corticostriatal synapse plasticity of medium spiny neurons is regulated by glutamate input from the cortex and dopamine input from the substantia nigra. While cortical stimulation alone results in long-term depression (LTD), the combination with dopamine switches LTD to long-term potentiation (LTP), which is known as dopamine-dependent plasticity. LTP is also induced by cortical stimulation in magnesium-free solution, which leads to massive calcium influx through NMDA-type receptors and is regarded as calcium-dependent plasticity. Signaling cascades in the corticostriatal spines are currently under investigation. However, because of the existence of multiple excitatory and inhibitory pathways with loops, the mechanisms regulating the two types of plasticity remain poorly understood. A signaling pathway model of spines that express D1-type dopamine receptors was constructed to analyze the dynamic mechanisms of dopamine- and calcium-dependent plasticity. The model incorporated all major signaling molecules, including dopamine- and cyclic AMP-regulated phosphoprotein with a molecular weight of 32 kDa (DARPP32), as well as AMPA receptor trafficking in the post-synaptic membrane. Simulations with dopamine and calcium inputs reproduced dopamine- and calcium-dependent plasticity. Further in silico experiments revealed that the positive feedback loop consisted of protein kinase A (PKA), protein phosphatase 2A (PP2A), and the phosphorylation site at threonine 75 of DARPP-32 (Thr75) served as the major switch for inducing LTD and LTP. Calcium input modulated this loop through the PP2B (phosphatase 2B)-CK1 (casein kinase 1)-Cdk5 (cyclin-dependent kinase 5)-Thr75 pathway and PP2A, whereas calcium and dopamine input activated the loop via PKA activation by cyclic AMP (cAMP). The positive feedback loop displayed robust bi-stable responses following changes in the reaction parameters. Increased basal dopamine levels disrupted this dopamine-dependent plasticity. The present model elucidated the mechanisms involved in bidirectional regulation of corticostriatal synapses and will allow for further exploration into causes and therapies for dysfunctions such as drug addiction.
  • Alan Fermin, Takehiko Yoshida, Makoto Ito, Junichiro Yoshimoto, Kenji Doya
    JOURNAL OF MOTOR BEHAVIOR 42(6) 371-379 2010年  査読有り
    In this article, the authors examine whether and how humans use model-free, reflexive strategies and model-based, deliberative strategies in motor sequence learning. They asked subjects to perform the grid-sailing task, which required moving a cursor to different goal positions in a 5 x 5 grid using different key-mapping (KM) rules between 3 finger keys and 3 cursor movement directions. The task was performed under 3 conditions: Condition 1, new KM; Condition 2, new goal position with learned KM; and Condition 3, learned goal position with learned KM; with or without prestart delay time. The performance improvement with prestart delay was significantly larger under Condition 2. This result provides evidence that humans implement a model-based strategy for sequential action selection and learning by using previously learned internal model of state transition by actions.
  • Tetsuro Morimura, Eiji Uchibe, Junichiro Yoshimoto, Kenji Doya
    Advances in Neural Information Processing Systems 22 2009年12月  査読有り
  • Takashi Nakano, Junichiro Yoshimoto, Jeff Wickens, Kenji Doya
    ARTIFICIAL NEURAL NETWORKS - ICANN 2009, PT I 5768(PART 1) 249-+ 2009年  査読有り
    The striatum is the input nucleus of the basal ganglia and is thought to be involved in reinforcement learning. The striatum receives glutamate input from the cortex, which carries sensory information, and dopamine input from the substantia nigra, which carries reward information. Dopamine-dependent plasticity of cortico-striatal synapses is supposed to play a critical role in reinforcement learning. Recently, a number of labs reported contradictory results of its dependence on the timing of cortical inputs and spike output. To clarify the mechanisms behind spike timing-dependent plasticity of striatal synapses, we investigated spike timing-dependence of intracellular calcium concentration by constructing a striatal neuron model with realistic morphology. Our simulation predicted that the calcium transient will be maximal when cortical spike input and dopamine input precede the postsynaptic spike. The gain of the calcium transient is enhanced during the "up-state" of striatal cells and depends critically on NMDA receptor currents.
  • Makoto Otsuka, Junichiro Yoshimoto, Kenji Doya
    NEURAL NETWORK WORLD 19(5) 597-610 2009年  査読有り
    The free-energy-based reinforcement learning is a new approach to handling high-dimensional states and actions. We investigate its properties using a new experimental platform called the digit floor task. In this task, the high-dimensional pixel data of hand-written digits were directly used as sensory inputs to the reinforcement learning agent. The simulation results showed the robustness of the free-energy-based reinforcement learning method against noise applied in both the training and testing phases. In addition, reward-dependent sensory representations were found in the distributed activation patterns of hidden units. The representations coded in a distributed fashion persisted even when the number of hidden nodes were varied.
  • 森村 哲郎, 内部 英治, 吉本 潤一郎, 銅谷 賢治
    電子情報通信学会論文誌. D, 情報・システム 91(6) 1515-1527 2008年6月  
    一般に統計モデルや学習機械のパラメータの空間は,その出力の変化に関してユークリッドではなくリーマン空間としての性質をもち,その最急こう配方向は従来のこう配である出力の偏微分と必ずしも一致しない.この問題に対してAmariは自然こう配法を提案し,Kakadeがマルコフ決定過程の最適化手法の一つである方策こう配強化学習法に自然こう配を適用した.自然こう配方向はリーマン構造を規定するリーマン計量のもと定まるので,その選択は重要な問題となる.しかしながら,Kakadeの用いたリーマン計量行列は方策のパラメータ摂動による行動の確率分布変化だけを考慮した計量行列であり,同様に方策の影響を受けるはずの状態の確率分布変化に関しては無視していた.そこで本論文では,状態の確率分布も考慮した新しいリーマン計量を提案し,その計量に基づく新しい自然方策こう配を導出する.更に,この自然方策こう配は方策パラメータにより規定される基底関数をもつ線形関数近似器で即時報酬を近似した際に学習されるパラメータに一致することを証明する.また様々な状態数のマルコフ決定問題に適用した数値実験より,特に状態数が多い場合に従来法に比べ提案法は有効に働くことを示す.
  • Makoto Otsuka, Junichiro Yoshimoto, Kenji Doya
    ARTIFICIAL NEURAL NETWORKS - ICANN 2008, PT I 5163 377-386 2008年  査読有り
    We investigate the properties of free-energy-based reinforcement learning using a new experimental platform called the digit floor task. The simulation results showed the robustness of the reinforcement learning method against noise applied in both the training and testing phases. In addition, reward-dependent and reward-invariant representations were found in the distributed activation patterns of hidden units. The representations coded in a distributed fashion persisted even when the number of hidden nodes were varied.
  • Junichiro Yoshimoto, Kenji Doya
    NEURAL INFORMATION PROCESSING, PART I 4984 614-624 2008年  査読有り
    We present a Bayesian method for the system identification of molecular cascades in biological systems. The contribution of this study is to provide a theoretical framework for unifying three issues: 1) estimating the most likely parameters; 2) evaluating and visualizing the confidence of the estimated parameters; and 3) selecting the most likely structure of the molecular cascades from two or more alternatives. The usefulness of our method is demonstrated in several benchmark tests.
  • Tetsuro Morimura, Eiji Uchibe, Junichiro Yoshimoto, Kenji Doya
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, PART II, PROCEEDINGS 5212 82-97 2008年  査読有り
    The parameter space of a statistical learning machine has a Riemannian metric structure in terms of its objective function. Amari [1] proposed the concept of "natural gradient" that takes the Riemannian metric of the parameter space into account. Kakade [2] applied it to policy gradient reinforcement learning, called a natural policy gradient (NPG). Although NPGs evidently depend on the underlying Riemannian metrics, careful attention was not paid to the alternative choice of the metric in previous studies. In this paper, we propose a Riemannian metric for the joint distribution of the state-action, which is directly linked with the average reward, and derive a new NPG named "Natural State-action Gradient" (NSG). Then, we prove that NSG can be computed by fitting a certain linear model into the immediate reward function. In numerical experiments, we verify that the NSG learning can handle MDPs with a large number of states, for which the performances of the existing (N)PG methods degrade.
  • Junichiro Hirayama, Junichiro Yoshimoto, Shin Ishii
    NEUROCOMPUTING 69(16-18) 1954-1961 2006年10月  査読有り
    An important character of on-line learning is its potential to adapt to changing environments by properly adjusting meta-parameters that control the balance between plasticity and stability of the learning model. In our previous study, we proposed a learning scheme to address changing environments in the framework of an on-line variational Bayes (VB), which is an effective on-line learning scheme based on Bayesian inference. The motivation of that work was, however, its implications for animal learning, and the formulation of the learning model was heuristic and not theoretically justified. In this article, we propose a new approach that balances the plasticity and stability of on-line VB learning in a more theoretically justifiable manner by employing the principle of hierarchical Bayesian inference. We present a new interpretation of on-line VB as a special case of incremental Bayes that allows the hierarchical Bayesian setting to balance the plasticity and stability as well as yielding a simple learning rule compared to standard on-line VB. This dynamic on-line VB scheme is applied to probabilistic PCA as an example of probabilistic models involving latent variables. In computer simulations using artificial data sets, the new on-line VB learning shows robust performance to regulate the balance between plasticity and stability, thus adapting to changing environments. (c) 2006 Elsevier B.V. All rights reserved.
  • Tokita, Y, Nakamura, Y, Yoshimoto, J, Ishii, S
    Proceedings of the 11th international symposium on Artificial Life and Robotics (GS22-3) 2006年  査読有り
  • 行縄 直人, 吉本 潤一郎, 大羽 成征, 石井 信
    情報処理学会論文誌. 数理モデル化と応用 46(10) 57-65 2005年6月15日  査読有り
    遺伝子発現ダイナミクスの解析のために, 状態空間モデルに基づく解析法が提案されている.従来の解析法では, 状態変数のダイナミクスを仮定せず, また, システムノイズと観測ノイズを無視したモデルを仮定していたため, 状態空間に含まれるノイズ成分を状態変数として誤検出する可能性がある.本研究では, ノイズプロセスに白色ガウシアンを仮定した線形ダイナミカルシステムモデルを考え, 変分ベイズ法による推定とモデル選択を行う.本手法を出芽酵母細胞周期に関する公開データセットに適用したところ, 従来手法で選択されたモデルと比較し, よりシンプルでもっともらしいモデルが選択された.また, この結果得られたモデルパラメータは, 生物学的な考察とよく一致した.人工データへの適用も行い, ノイズを含む時系列データに対する有効性が示された.
  • 西村 政哉, 吉本 潤一郎, 時田 陽一, 中村 泰, 石井 信
    電子情報通信学会論文誌. A, 基礎・境界 88(5) 646-657 2005年5月1日  査読有り
    アクロボットは2リンク2関節からなる劣駆動マニピュレータであり, その制御設計は困難な非線形問題であることが知られている.本研究では, 制御理論及び機械学習の分野で得られた知見を統合することにより, 実アクロボットに対する適応的制御設計法を提案する.提案手法では, まず, システム同定法によって近似されたシステム方程式に基づいて部分問題を解くための制御器を複数個設計する.そして, 各制御器を適用すべき部分状態空間が強化学習法によって適応的に決定される.提案手法を実装したところ, 実アクロボットの振上げ・安定化制御を成功することができた.
  • Junichiro Yoshimoto, Masaya Nishimura, Yoichi Tokita, Shin Ishii
    Artificial Life and Robotics 9(2) 67-71 2005年5月  査読有り筆頭著者
    Reinforcement learning (RL) has been applied to constructing controllers for nonlinear systems in recent years. Since RL methods do not require an exact dynamics model of the controlled object, they have a higher flexibility and potential for adaptation to uncertain or nonstationary environments than methods based on traditional control theory. If the target system has a continuous state space whose dynamic characteristics are nonlinear, however, RL methods often suffer from unstable learning processes. For this reason, it is difficult to apply RL methods to control tasks in the real world. In order to overcome the disadvantage of RL methods, we propose an RL scheme combining multiple controllers, each of which is constructed based on traditional control theory. We then apply it to a swinging-up and stabilizing task of an acrobot with a limited torque, which is a typical but difficult task in the field of nonlinear control theory. Our simulation result showed that our method was able to realize stable learning and to achieve fairly good control. © ISAROB 2005.
  • Magono, M, Yoshimoto, J, Ishii, S, Doya, K
    Proceedings of the 2005 International Symposium on Nonlinear Theory and its Applications 401-404 2005年  査読有り
  • Junichiro Hirayama, Junichiro Yoshimoto, Shin Ishii
    Neural networks 17(10) 1391-400 2004年12月  査読有り
    A brain needs to detect an environmental change and to quickly learn internal representations necessary in a new environment. This paper presents a theoretical model of cortical representation learning that can adapt to dynamic environments, incorporating the results by previous studies on the functional role of acetylcholine (ACh). We adopt the probabilistic principal component analysis (PPCA) as a functional model of cortical representation learning, and present an on-line learning method for PPCA according to Bayesian inference, including a heuristic criterion for model selection. Our approach is examined in two types of simulations with synthesized and realistic data sets, in which our model is able to re-learn new representation bases after the environment changes. Our model implies the possibility that a higher-level recognition regulates the cortical ACh release in the lower-level, and that the ACh level alters the learning dynamics of a local circuit in order to continuously acquire appropriate representations in a dynamic environment.
  • Yukinawa, N, Yoshimoto, J, Oba, S, Ishii, S
    Proceedings of the 2004 International Symposium on Nonlinear Theory and its Applications 577-580 2004年  査読有り
  • Junichiro Yoshimoto, Shin Ishii
    2004 IEEE International Joint Conference on Neural Networks (IEEE Cat. No.04CH37541) 3 1817-1822 2004年  査読有り
  • Kanemoto, K, Yoshimoto, J, Ishii, S
    Proceedings of the 9th International Symposium on Artificial Life and Robotics 1 329-332 2004年  査読有り
  • 吉本 潤一郎, 石井 信, 佐藤 雅昭
    システム制御情報学会論文誌 16(5) 209-217 2003年5月15日  査読有り筆頭著者
    In this paper, we propose a new reinforcement learning (RL) method for dynamical systems that have continuous state and action spaces. Our RL method has an architecture like the actorcritic model. The critic tries to approximate the Q-function, and the actor tries to approximate a stochastic soft-max policy dependent on the Q-function. An on-line EM algorithm is used to train the critic and the actor. We apply this method to two control problems. Computer simulations in two tasks show that our method is able to acquire good control after a few learning trials.
  • Junichiro Yoshimoto, Shin Ishii, Masa-aki Sato
    Artificial Neural Networks and Neural Information Processing — ICANN/ICONIP 2003 123-131 2003年  査読有り筆頭著者
  • Junichiro Yoshimoto, Shin Ishii, Masa-aki Sato
    Artificial Neural Networks — ICANN 2002 661-666 2002年8月  査読有り筆頭著者
  • S Ishii, W Yoshida, J Yoshimoto
    NEURAL NETWORKS 15(4-6) 665-687 2002年6月  査読有り
    In reinforcement learning (RL), the duality between exploitation and exploration has long been an important issue. This paper presents a new method that controls the balance between exploitation and exploration. Our learning scheme is based on model-based RL, in which the Bayes inference with forgetting effect estimates the state-transition probability of the environment. The balance parameter, which corresponds to the randomness in action selection, is controlled based on variation of action results and perception of environmental change. When applied to maze tasks, our method successfully obtains good controls by adapting to environmental changes. Recently, Usher et al. [Science 283 (1999) 549] has suggested that noradrenergic neurons in the locus coeruleus may control the exploitation-exploration balance in a real brain and that the balance may correspond to the level of animal's selective attention. According to this scenario, we also discuss a possible implementation in the brain. (C) 2002 Elsevier Science Ltd. All rights reserved.
  • Yoshimoto, J., Ishii, S., Sato, M.-A.
    Systems and Computers in Japan 32(5) 2001年  
  • 吉本 潤一郎, 石井 信, 佐藤 雅昭
    電子情報通信学会論文誌. D-2, 情報・システム 2-パターン処理 83(3) 1024-1033 2000年3月25日  査読有り筆頭著者
    acrobotは2リンク2関節からなるロボットで, 第2関節のみにアクチュエータが存在する.acrobotは非線形なダイナミックスをもち, 状態変数及び制御変数の空間がともに連続であるために, 強化学習によってこの制御を獲得することは難しい課題の一つである.本論文では, acrobotをバランスする制御に強化学習を応用する.我々の強化学習法はactor-criticアーキテクチャを用いて学習が行われる.actorは現在の状態に対して制御信号を出力し, criticは将来を通して得られる報酬の累積(期待報酬)を予測する.actorとcriticはともに正規化ガウス関数ネットワークによって近似され, オンラインEMアルゴリズムを用いて学習が行われる.また, criticの学習を促進させるための新たな手法を導入する.本手法が少ない試行回数から良い制御を獲得できることを計算機シミュレーションの結果により示す.
  • Junichiro Yoshimoto, Shin Ishii, Masa-aki Sato
    Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks. IJCNN 2000. Neural Computing: New Challenges and Perspectives for the New Millennium 2000年  査読有り筆頭著者

MISC

 69

書籍等出版物

 4

講演・口頭発表等

 7

担当経験のある科目(授業)

 8

共同研究・競争的資金等の研究課題

 5

社会貢献活動

 5