Curriculum Vitaes

Hiroaki Kawashima

  (川嶋 宏彰)

Profile Information

Affiliation
Professor, Graduate School of Information Science, University of Hyogo
Degree
Doctor of Informatics(Kyoto University)

J-GLOBAL ID
200901098553710896
researchmap Member ID
5000031823

External link

Papers

 75
  • Hiroaki Kawashima, Yu Horii, Takashi Matsuyama
    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, 442-445, 2010  Peer-reviewed
    A variety of methods for audio-visual integration, which integrate audio and visual information at the level of either features, states, or classifier outputs, have been proposed for the purpose of robust speech recognition. However, these methods do not always fully utilize auditory information when the signal-to-noise ratio becomes low. In this paper, we propose a novel approach to estimate speech signal in noise environments. The key idea behind this approach is to exploit clean speech candidates generated by using timing structures between mouth movements and sound signals. We first extract a pair of feature sequences of media signals and segment each sequence into temporal intervals. Then, we construct a cross-media timing-structure model of human speech by learning the temporal relations of overlapping intervals. Based on the learned model, we generate clean speech candidates from the observed mouth movements.
  • Ryo Yonetani, Hiroaki Kawashima, Takatsugu Hirayama, Takashi Matsuyama
    Proceedings - International Conference on Pattern Recognition, 101-104, 2010  Peer-reviewed
    We propose a novel method to estimate the object that a user is focusing on by using the synchronization between the movements of objects and a user's eyes as a cue. We first design an event as a characteristic motion pattern, and we then embed it within the movement of each object. Since the user's ocular reactions to these events are easily detected using a passive camera-based eye tracker, we can successfully estimate the object that the user is focusing on as the one whose movement is most synchronized with the user's eye reaction. Experimental results obtained from the application of this system to dynamic content (consisting of scrolling images) demonstrate the effectiveness of the proposed method over existing methods. © 2010 IEEE.
  • Jean-Baptiste Dodane, Takatsugu Hirayama, Hiroaki Kawashima, Takashi Matsuyama
    Proceedings - 2009 3rd International Conference on Affective Computing and Intelligent Interaction and Workshops, ACII 2009, 201-208, 2009  Peer-reviewed
    Human-machine interaction still lacks smoothness and naturalness despite the widespread utilization of intelligent systems and emotive agents. In order to improve the interaction, this work proposes an approach to estimate user's interest based on the relationships between dynamics of user's eye movements, more precisely the endogenous control mode of saccades, and machine's proactive visual content presentation. Under a specially-designed presentation phase to make the user express the endogenous saccades, we analyzed delays between the saccades and the presentation events. As a result, we confirmed that the delay while the user's gaze is maintained on the previous presented content regardless of the next event, called resistance, is a good indicator of the interest estimation (70% success, upon 20 experiments). It showed higher accuracy than the conventional interest estimation based on gaze duration. © 2009 IEEE.
  • Akihiro Kobayashi, Jyunji Satake, Takatsugu Hirayama, Hiroaki Kawashima, Takashi Matsuyama
    IEEE International Conference on Automatic Face and Gesture Recognition (FG), Sep, 2008  Peer-reviewed
  • 川嶋宏彰, 三井健, 松山隆司
    第11回画像の認識・理解シンポジウム (MIRU), 339-346, Jul, 2008  Peer-reviewed
  • 堀井悠, 川嶋宏彰, 松山隆司
    第11回画像の認識・理解シンポジウム (MIRU), 193-200, Jul, 2008  Peer-reviewed
  • Yu Horii, Hiroaki Kawashima, Takashi Matsuyama
    IEEE CVPR Workshop on Interaction Dynamics on Human Communicative Behavior Analysis, Jun, 2008  Peer-reviewed
  • Hiroaki Kawashima, Takeshi Nishikawa, Takashi Matsuyama
    Conference on Human Factors in Computing Systems - Proceedings, 3585-3590, 2008  Peer-reviewed
    Turn-taking in a smooth conversation is supported by the anticipation of the floor handover timing among participants. However, it becomes difficult to maintain natural turn-taking in video conferencing with transmission delays because the utterances and movements of each participant are presented to the others with a time lag, which often leads to a collision of utterances. In order to facilitate smooth communication over a video-conferencing system, we propose a novel method, "Visual Filler," that fills temporal gaps in turn-taking caused by the existence of delays. Visual Filler overlays an artificial visual stimulus that has a function similar to that of filler sounds on a screen with participant images. We have evaluated the effectiveness of a Visual Filler for reducing the unnaturalness of turn-taking on a simulated dyadic dialog situation with a delay.
  • 川嶋宏彰, 松山隆司
    情報処理学会論文誌, 48(12) 3680-3691, Dec, 2007  Peer-reviewed
  • 川嶋宏彰, 西川猛司, 松山隆司
    情報処理学会論文誌, 48(12) 3715-3728, Dec, 2007  Peer-reviewed
  • Hiroaki Kawashima, Takashi Matsuyama
    International Conference on Image Analysis and Processing (ICIAP), 789-794, Sep, 2007  Peer-reviewed
  • 西川猛司, 川嶋宏彰, 松山隆司
    情報科学技術レターズ, 311-314, Sep, 2007  Peer-reviewed
  • 川嶋宏彰, スコギンズ・リーバイ, 松山隆司
    ヒューマンインタフェース学会, 9(3) 379-390, Aug, 2007  Peer-reviewed
  • 平山高嗣, 川嶋宏彰, 西山正紘, 松山隆司
    ヒューマンインタフェース学会, 9(2) 201-211, May, 2007  Peer-reviewed
  • Hiroaki Kawashima, Kimitaka Tsutsumi, Takashi Matsuyama
    ARTICULATED MOTION AND DEFORMABLE OBJECTS, PROCEEDINGS, 4069 453-463, 2006  Peer-reviewed
    Modeling and describing temporal structure in multimedia signals, which are captured simultaneously by multiple sensors, is important for realizing human machine interaction and motion generation. This paper proposes a method for modeling temporal structure in multimedia signals based on temporal intervals of primitive signal patterns. Using temporal difference between beginning points and the difference between ending points of the intervals, we can explicitly express timing structure; that is, synchronization and mutual dependency among media signals. We applied the model to video signal generation from an audio signal to verify the effectiveness.
  • Hiroaki Kawashima, Takashi Matsuyama
    IEICE Transactions on Fundamentals, E88-A(11) 3022-3035, Nov, 2005  Peer-reviewed
    This paper addresses the parameter estimation problem of an interval-based hybrid dynamical system (interval system). The interval system has a two-layer architecture that comprises a finite state automaton and multiple linear dynamical systems. The automaton controls the activation timing of the dynamical systems based on a stochastic transition model between intervals. Thus, the interval system can generate and analyze complex multivariate sequences that consist of temporal regimes of dynamic primitives. Although the interval system is a powerful model to represent human behaviors such as gestures and facial expressions, the learning process has a paradoxical nature : temporal segmentation of primitives and identification of constituent dynamical systems need to be solved simultaneously. To overcome this problem, we propose a multiphase parameter estimation method that consists of a bottom-up clustering phase of linear dynamical systems and a refinement phase of all the system parameters. Experimental results show the method can organize hidden dynamical systems behind the training data and refine the system parameters successfully.
  • 川嶋宏彰, 西山正紘, 松山隆司
    情報科学技術レターズ, 153-156, Sep, 2005  Peer-reviewed
  • Hiroaki Kawashima, Takashi Matsuyama
    3rd International Conference on Advances in Pattern Recognition (S. Singh et al. Eds.: ICAPR 2005 Springer LNCS 3686), 229-238, Aug, 2005  Peer-reviewed
  • スコギンズ・リーバイ, 川嶋宏彰, 松山隆司
    インタラクション2005, (D-404) 1-2, Mar, 2005  Peer-reviewed
  • M Nishiyama, H Kawashima, T Hirayama, T Matsuyama
    ANALYSIS AND MODELLING OF FACES AND GESTURES, PROCEEDINGS, 3723 140-154, 2005  Peer-reviewed
    This paper presents a method for interpreting facial expressions based on temporal structures among partial movements in facial image sequences. To extract the structures, we propose a novel facial expression representation, which we call a facial score, similar to a musical score. The facial score enables us to describe facial expressions as spatio-temporal combinations of temporal intervals; each interval represents a simple motion pattern with the beginning and ending times of the motion. Thus, we can classify fine-grained expressions from multivariate distributions of temporal differences between the intervals in the score. In this paper, we provide a method to obtain the score automatically from input images using bottom-up clustering of dynamics. We evaluate the efficiency of facial scores by comparing the temporal structure of intentional smiles with that of spontaneous smiles.
  • 川嶋宏彰, 堤公孝, 松山隆司
    第7回情報論的学習理論ワークショップ(IBIS), 86-93, Nov, 2004  Peer-reviewed
  • 川嶋宏彰, 堤公孝, 松山隆司
    情報科学技術レターズ, 175-178, Sep, 2004  Peer-reviewed
  • Hiroaki Kawashima, Takashi Matsuyama
    Systems and Computers in Japan, 34(14) 1-12, Dec, 2003  Peer-reviewed
    This paper proposes a system architecture for event recognition that dynamically integrates information from multiple sources (e.g., multimodal data from visual and auditory sensors). The proposed system consists of multiple event classifiers called Continuous State Machines (CSMs). Each CSM has a state transition rule in a continuous state space and classifies time-varying patterns from a different single source. Since the rule is defined as an extension of Kalman filters (i.e., the next state is deduced from the trade-off scheme between the input data and the model's prediction), CSMs support dynamic time warping and robustness against noise. We then introduce an interaction method among CSMs to classify events from multiple sources. A continuous state space (i.e., vector space) allows us to design interaction as minimization of an energy function. This interaction enables the system to dynamically suppress unreliable classifiers and improves system reliability and the accuracy of classifying events in dynamically changing situations (e.g., the object is temporary occluded from one of multiple cameras in a gesture recognition task). Experimental results on gesture recognition by two cameras show the effectiveness of our proposed system.
  • 川嶋宏彰, 松山隆司
    電子情報通信学会論文誌, J85-D-II(12) 1801-1812, Dec, 2002  Peer-reviewed
  • H Kawashima, T Matsuyama
    16TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOL II, PROCEEDINGS, 2 785-789, 2002  Peer-reviewed
    This paper proposes a system architecture for event recognition that integrates information from multiple sources (e.g., gesture and speech recognition from distributed sensors in the real world). The proposed system consists of multiple recognizers named Continuous State Machines (CSMs). Each CSM has a state transition rule in a continuous state space and classifies time-varying patterns from a single source. Since the rule is defined as a simplification of Kalman filter (i.e., the next state is deduced from the trade-off scheme between input data and model's prediction), CSMs support dynamic time warping and robustness against noise. We then introduce an interaction method among CSMs to classify events from multiple sources. A continuous state space (i.e., vector space) allows as to design interaction as recursively minimizing an energy function. This interaction enables the system to dynamically focus over the multiple sources, and improves reliability and accuracy of classifying events in dynamically changing situations (e.g., the object is temporally occluded from one of multiple cameras in a gesture recognition task). Experimental results on gesture recognition by two cameras show the effectiveness of our proposed system.

Misc.

 70

Books and Other Publications

 6
  • 浮田 浩行, 濱上 知樹, 藤吉 弘亘, 大町 真一郎, 戸田 智基, 岩崎 敦, 小林 泰介, 鈴木 亮太, 木村 雄喜, 橋本 大樹, 玉垣 勇樹, 水谷 麻紀子, 永田 毅, 木村 光成, 李 晃伸, 川嶋 宏彰 (Role: Joint author, 第11章11ページ)
    コロナ社, Jan, 2023 (ISBN: 9784339033854)
  • Katsushi Ikeuchi (Editor) (Role: Contributor, Active Appearance Models)
    Springer, Oct 14, 2021 (ISBN: 3030634159)
  • 笹島 宗彦(編) (Role: Contributor, p.36-50(3.2 相関))
    朝倉書店, Apr 5, 2021 (ISBN: 4254129114)
  • P. Benner, R. Findeisen, D. Flockerzi, U. Reichl, K. Sundmacher (Role: Contributor, Chap.3, Magnus Egerstedt, Jean-Pierre de la Croix, Hiroaki Kawashima, and Peter Kingston, "Interacting with Networks of Mobile Agents")
    Birkhauser-Springer, 2014
  • 乾敏郎, 川口潤, 吉川左紀子 (Role: Contributor, 第I部 第10章「タイミング」)
    ミネルヴァ書房, 2010

Presentations

 31

Major Teaching Experience

 19

Research Projects

 17

Industrial Property Rights

 2

Academic Activities

 7

Social Activities

 12