研究者業績

中野 有紀子

ナカノ ユキコ  (Yukiko Nakano)

基本情報

所属
成蹊大学 理工学部 理工学科 教授
学位
博士(情報理工学)(東京大学)

J-GLOBAL ID
201101020839458565
researchmap会員ID
B000004842

外部リンク

1990年東京大学大学院教育学研究科修士課程修了.同年,日本電信電話(株)入社.2002年 MIT Media Arts & Sciences修士課程修了.同年より JST社会技術研究開発センター専門研究員,東京農工大学大学院工学府特任准教授,成蹊大学理工学部情報科学科准教授を経て,現在,成蹊大学理工学部情報科学科教授.知的で自然なユーザインタフェースの実現に向けて,人との言語・非言語コミュニケーションが可能な会話エージェントの研究に従事.博士(情報理工学).ACM,人工知能学会,電子情報通信学会,情報処理学会各会員.

経歴

 2

論文

 51
  • Candy Olivia Mawalim, Shogo Okada, Yukiko I. Nakano, Masashi Unoki
    Journal on Multimodal User Interfaces 17(2) 47-63 2023年  
  • Atsushi Ito, Yukiko I. Nakano, Fumio Nihei, Tatsuya Sakato, Ryo Ishii, Atsushi Fukayama, Takao Nakamura
    J. Inf. Process. 31 34-44 2023年  
  • Atsushi Ito, Yukiko I. Nakano, Fumio Nihei, Tatsuya Sakato, Ryo Ishii, Atsushi Fukayama, Takao Nakamura
    IUI 2022: 27th International Conference on Intelligent User Interfaces 85-88 2022年  
  • Fumio Nihei, Ryo Ishii, Yukiko I. Nakano, Kyosuke Nishida, Ryo Masumura, Atsushi Fukayama, Takao Nakamura
    INTERSPEECH 1086-1090 2022年  
  • 久芳 和己, 中野 有紀子, 岡田 将吾
    人工知能学会全国大会論文集 JSAI2022 3C4GS604-3C4GS604 2022年  
    ジェスチャーや表情など、さまざまな非言語情報は2者間対話の世界において重要な役割を担っている。 本研究の第一の目的は,多言語話者による二者間相互作用における非言語情報の差異を分析することである.そこで,文化的背景の異なる3カ国で収集されたマルチモーダルな会話データセットであるNoXiデータコーパスを用いて,ANOVA分析により3カ国の対話ペアグループ間で異なる非言語情報を分析し、報告する。
  • Yukiko I. Nakano, Eri Hirose, Tatsuya Sakato, Shogo Okada, Jean-Claude Martin
    ICMI 5-14 2022年  
  • 鈴木 凱, 岡田 将吾, 黄 宏軒, 中野 有紀子
    人工知能学会全国大会論文集 JSAI2022 3H3OS12a01-3H3OS12a01 2022年  
    本論文では,マルチモーダル特徴量を用いた議論の質の推定モデルの精度を改善するための手法を提案する.計56回のグループミーティングで観測された参加者の韻律・表情・言語・発話ターンの特徴量を含むグループ会議コーパスMATRICSを用いる.先行研究で課題となっていた時系列データに含まれる全てのフレーム・全てのモダリティの特徴量が, そのラベルの推定に有効であるとは限らないという問題に対して,ノイズラベルに有効な弱教師あり学習であるCo-teachingをよりノイズに対して頑健に拡張したN-teachingモデルを提案する.またノイズとして学習に使われなかったサンプルについて分析を行い,先行研究との比較を行った.本研究では議論内容のOriginally(新規性)の指標においてMAE 0.309という最高精度を得た.
  • Candy Olivia Mawalim, Shogo Okada, Yukiko I. Nakano
    ACM Transactions on Multimedia Computing, Communications, and Applications 17(4) 1-27 2021年11月30日  
    Case studies of group discussions are considered an effective way to assess communication skills (CS). This method can help researchers evaluate participants’ engagement with each other in a specific realistic context. In this article, multimodal analysis was performed to estimate CS indices using a three-task-type group discussion dataset, the MATRICS corpus. The current research investigated the effectiveness of engaging both static and time-series modeling, especially in task-independent settings. This investigation aimed to understand three main points: first, the effectiveness of time-series modeling compared to nonsequential modeling; second, multimodal analysis in a task-independent setting; and third, important differences to consider when dealing with task-dependent and task-independent settings, specifically in terms of modalities and prediction models. Several modalities were extracted (e.g., acoustics, speaking turns, linguistic-related movement, dialog tags, head motions, and face feature sets) for inferring the CS indices as a regression task. Three predictive models, including support vector regression (SVR), long short-term memory (LSTM), and an enhanced time-series model (an LSTM model with a combination of static and time-series features), were taken into account in this study. Our evaluation was conducted by using the R2 score in a cross-validation scheme. The experimental results suggested that time-series modeling can improve the performance of multimodal analysis significantly in the task-dependent setting (with the best R2 = 0.797 for the total CS index), with word2vec being the most prominent feature. Unfortunately, highly context-related features did not fit well with the task-independent setting. Thus, we propose an enhanced LSTM model for dealing with task-independent settings, and we successfully obtained better performance with the enhanced model than with the conventional SVR and LSTM models (the best R2 = 0.602 for the total CS index). In other words, our study shows that a particular time-series modeling can outperform traditional nonsequential modeling for automatically estimating the CS indices of a participant in a group discussion with regard to task dependency.
  • Kazufumi Tsukada, Yutaka Takase, Yukiko I. Nakano
    ACM/IEEE International Conference on Human-Robot Interaction 02-05- 93-94 2015年3月2日  査読有り
    In aging societies, supporting elderly people is a critical issue, and companion agents that can function as a conversational partner are expected to provide social support to isolated older adults. Aiming at improving companionship dialogues with these agents, this study proposes a topic selection mechanism using blog articles written by the elderly. By categorizing the nouns extracted from blogs using Wikipedia, we defined 219 topic categories consisting of about 3,000 topic words that the elderly discuss in their daily life. The topic selection mechanism is implemented into a companion agent and used to generate the agent's utterances.
  • Takashi Yoshino, Yutaka Takase, Yukiko I. Nakano
    ACM/IEEE International Conference on Human-Robot Interaction 02-05- 127-128 2015年3月2日  査読有り
    A robot's gaze behaviors are indispensable in allowing the robot to participate in multiparty conversations. To build a robot that can convey appropriate attentional behavior in multiparty human-robot conversations, this study proposes robot head gaze models in terms of participation roles and dominance in a conversation. By implementing such models, we developed a robot that can determine appropriate gaze behaviors according to its conversational roles and dominance.
  • Naoko Saito, Shogo Okada, Katsumi Nitta, Yukiko I. Nakano, Yuki Hayashi
    AAAI Spring Symposium - Technical Report SS-15-07 100-103 2015年  
    Toward constructing a multimodal conversation agent system which can be used to interview elderly patients with dementia, we propose a turn taking mechanism based on recognition of the subjects attitude as to whether the subject has (or relinquish) the right to speak. A key strategy in the recognition task is to extract features from pausing behavior in subject's spontaneous speech and to fuse multimodal signals (gaze, head motion, and speech). In this paper, we focus on evaluation of the recognition module used in guiding turn taking. To evaluate it, we collect multimodal data corpus from 42 dyadic conversations between subjects with dementia and the virtual agent we have developed as a prototype and annotate subject's multimodal data manually. In experiments, we validate recognition models trained multimodal dataset by machine learning methods. Experimental results shows that pause features are effective to improve the attitude recognition accuracy and the accuracy is improved up to 88%.
  • 林 佑樹, 二瓶 芙巳雄, 中野 有紀子, 黄 宏軒, 岡田 将吾
    56(4) 1217-1227 2015年  査読有り
  • Reo Suzuki, Yutaka Takase, Yukiko I. Nakano
    In Proceedings of The Eighth International Conference on Advances in Computer-Human Interactions (ACHI 2015) 92-95 2015年  査読有り
  • Sakiko Nihonyanagi, Yuki Hayashi, Yukiko I. Nakano
    GazeIn 2014 - Proceedings of the 7th ACM Workshop on Eye Gaze in Intelligent Human Machine Interaction: Eye-Gaze and Multimodality, Co-located with ICMI 2014 33-37 2014年11月16日  
    In collaborative learning, participants work on the learning task together. In this environment, linguistic information via speech as well as non-verbal information such as gaze and writing actions are important elements. It is expected that integrating the information from these behaviors will contribute to assessing the learning activity and characteristics of each participant in a more objective manner. With the objective of characterizing participants in the collaborative learning activity, this study analyzed the verbal and nonverbal behaviors and found that the gaze behaviors of individual participants and those between the participants provides useful information in distinguishing a leader of the group, one who follows the leader, or one who attends to other participants who do not appear to understand.
  • Fumio Nihei, Yukiko I. Nakano, Yuki Hayashi, Hung-Hsuan Huang, Shogo Okada
    ICMI 2014 - Proceedings of the 2014 International Conference on Multimodal Interaction 136-143 2014年11月12日  査読有り
    Group discussions are used widely when generating new ideas and forming decisions as a group. Therefore, it is assumed that giving social influence to other members through facilitating the discussion is an important part of discussion skill. This study focuses on influential statements that affect discussion flow and highly related to facilitation, and aims to establish a model that predicts influential statements in group discussions. First, we collected a multimodal corpus using different group discussion tasks in-basket and case-study. Based on schemes for analyzing arguments, each utterance was annotated as being influential or not. Then, we created classification models for predicting influential utterances using prosodic features as well as attention and head motion information from the speaker and other members of the group. In our model evaluation, we discovered that the assessment of each participant in terms of discussion facilitation skills by experienced observers correlated highly to the number of influential utterances by a given participant. This suggests that the proposed model can predict influential statements with considerable accuracy, and the prediction results can be a good predictor of facilitators in group discussions.
  • Hung-Hsuan Huang, Roman Bednarik, Kristiina Jokinen, Yukiko I. Nakano
    Proceedings of the 16th International Conference on Multimodal Interaction(ICMI) 527-528 2014年  
  • 中野有紀子, 馬場直哉, 黄宏軒, 林佑樹
    人工知能学会論文誌 29(1) 69-79 2014年  査読有り
  • 林佑樹, 小川裕史, 中野有紀子
    情報処理学会論文誌 55(1) 189-198 2014年  査読有り
  • Vrzakova, H, Bednarik, R, Nihei, F, Nakano, Y
    In the 8th Nordic Conference on Human-Computer Interaction 915-918 2014年  査読有り
  • Misato Yatsushiro, Naoya Ikeda, Yuki Hayashi, Yukiko I. Nakano
    GazeIn 2013 - Proceedings of the 2013 ACM Workshop on Eye Gaze in Intelligent Human Machine Interaction: Gaze in Multimodal Interaction, co-located with ICMI 2013 13-18 2013年12月13日  
    With a goal of contributing to multiparty conversation management, this paper proposes a mechanism for estimating conversational dominance in group interaction. Based on our corpus analysis, we have already established a regression model for dominance estimation using speech and gaze information. In this study, we implement the model as a dominance estimation mechanism, and propose an idea of utilizing the mechanism in moderating multiparty conversations between a conversational robot and three human users. The system decides whom the system should talk to based on the dominance level of each user.
  • 石井 亮, 小澤 史朗, 川村 春美, 小島 明, 中野 有紀子
    電子情報通信学会論文誌 D J96-D(1) 110-119 2013年  査読有り
  • 馬場直哉, 黄 宏軒, 中野有紀子
    人工知能学会論文誌 28(2) 149-159 2013年  査読有り
  • Yukiko I. Nakano, Naoya Baba, Hung-Hsuan Huang, Yuki Hayashi
    ICMI'13: PROCEEDINGS OF THE 2013 ACM INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION 35-42 2013年  査読有り
    In conversational agents with multiparty communication functionality, a system needs to be able to identify the addressee for the current floor and respond to the user when the utterance is addressed to the agent. This study proposes some addressee identification models based on speech and gaze information, and tests whether the models can be applied to different proxemics. We build an addressee identification mechanism by implementing the models and incorporate it into a fully autonomous multiparty conversational agent. The system identifies the addressee from online multimodal data and uses this information in language understanding and dialogue management. Finally, an evaluation experiment shows that the proposed addressee identification mechanism works well in a real-time system, with an F-measure for addressee estimation of 0.8 for agent-addressed utterances. We also found that our system more successfully avoided disturbing the conversation by mistakenly taking a turn when the agent is not addressed.
  • Hung-Hsuan Huang, Hiroki Matsushita, Kyoji Kawagoe, Yoichi Sakai, Yuuko Nonaka, Yukiko Nakano, Kiyoshi Yasuda
    Proceedings of the 11th IEEE International Conference on Cognitive Informatics and Cognitive Computing, ICCI*CC 2012 295-299 2012年  査読有り
    With the increasing average life expectancy of world population, there are more and more dementia patients, and the needs of assistive technology emerges. According to the literature, the progress of cognitive impairment can be suppressed to be slower if the patients is constantly in calm mood. An effective way is suggested by keeping them in social relationship with others. With the goal of developing a conversational humanoid that can serve as a companion for dementia patients, we propose an autonomous virtual agent that can generate back channel feedback, such as head nods and verbal acknowledgment, on the basis of acoustic information in the user's speech. The system is also capable of speech recognition and language understanding functionalities. In order to compensate the companionship of the agent and the ability to assist the user's memory, we are developing a memory vest which is equipped with portable devices including an Android smartphone, two IC audio recorders, and a digital video recorder to log the daily life of the patient. The gathered activity history database can then be used to enrich the dialogue ability of the agent and for helping the user to recall his / her own memory. © 2012 IEEE.
  • 塚本 剛生, 中野 有紀子
    日本バーチャルリアリティ学会論文誌 17(2) 79-89 2012年  査読有り
    This paper proposes a direction giving avatar system in Metaverse, which automatically generates direction giving gestures based on linguistic information obtained from the user's chat text input and spatial information in Metaverse. First, we conduct an experiment to collect direction giving conversation corpus. Then, using the collected corpus, we analyze the relationship between the proxemics of conversation participants and the position of their direction giving gestures. Next, we analyze the relationship between linguistic features in direction giver's utterances and the shape of their spatial gestures. We define five categories of gesture concepts and four gesture shape parameters, and analyze the relationship between the gesture concepts and a set of gesture parameters. Based on these results, we propose an automatic gesture decision mechanism and implement a direction giving avatar system in Metaverse.
  • Yukiko I. Nakano, Yuki Fukuhara
    ICMI '12: PROCEEDINGS OF THE ACM INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION 77-84 2012年  査読有り
    It is important for conversational agents that manage multiparty conversations to recognize the group dynamics existing among the users. This paper proposes a method for estimating the conversational dominance of participants in group interactions. First, we conducted a Wizard-of-Oz experiment to collect conversational speech, and motion data. Then, we analyzed various paralinguistic speech and gaze behaviors to elucidate the factors that predict conversational dominance. Finally, by exploiting the speech and gaze data as estimation parameters, we created a regression model to estimate conversational dominance, and the multiple correlation coefficient of this model was 0.85.
  • 石井亮, 大古亮太, 中野有紀子, 西田豊明
    情報処理学会論文誌, 52(12) 3625-3636 2011年  査読有り
  • 中野有紀子
    2011 International Conference on Intelligent User Interfaces (IUI2011), Workshop on Eye Gaze in Intelligent Human Machine Interaction 2011年  査読有り
  • 中野有紀子
    in the proceedings of the 10th International Conference on Autonomous Agents and Multiagent Systems (AAMAS2011) 441-448 2011年  査読有り
  • 中野有紀子
    the 11th International Conference on Intelligent Virtual Agents (IVA2011) 262-268 2011年  査読有り
  • 中野有紀子
    the 11th International Conference on Intelligent Virtual Agents (IVA2011), 255-261 2011年  査読有り
  • 中野有紀子
    Proceedings of the 11th International Conference on Intelligent Virtual Agents (IVA 2011) 1-13 2011年  査読有り
  • 中野有紀子
    the 13th ACM International Conference on Multimodal Interaction (ICMI2011) 401-408 2011年  査読有り
  • 黄 宏軒, 武田 信也, 小野 正貴, 中野 有紀子
    人工知能学会全国大会論文集 JSAI2010 1C13-1C13 2010年  
    本研究は,旅行代理店を題材として,二人の客の間でなされる意思決定会話を支援する情報提供エージェントを開発するものである.本システムは,ユーザの発話状態と顔向きの変化から,会話内容が相談,雑談,質問,理解のどの状態であるかを推定し,会話制御機構において,四種類の状態に応じてエージェントによる会話への参入方法を決定する.指標の有効性と推定精度を,被験者評価実験によって検証し,その結果を報告する.
  • Yukiko I. Nakano, Toyoaki Nishida
    Proceedings of the Symposium on Conversational Informatics for Supporting Social Intelligence and Interaction: Situational and Environmental Information Enforcing Involvement in Conversation 128-135 2005年  
    In face-to-face communication, conversation is affected by what is existing and taking place within the environment. With the goal of improving communicative capability of humanoid systems, this paper proposes conversational agents that are aware of a perceived world, and use the perceptual information to enforce the involvement in conversation. First, we review previous studies on nonverbal engagement behaviors in face-to-face and human-artifact interaction. Based on the discussion, we implement some engagement functions into a conversational agent embodied in a story-based communication environment where multimodal recognition and generation techniques support useragent communication.
  • Yoshiyasu Ogasawara, Masashi Okamoto, Yukiko I. Nakano, Yong Xu, Toyoaki Nishida
    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 3683 289-295 2005年  
    One of the key points to make robot a robust and interactive communicator is enabling robot to recognize user's attention behavior and transit among communication situations, so that human and robot can be involved into shared activity. Four communication situations are defined to describe typical situations of interactive human-robot communication. The robot can not only open and close communication, but also adapt to user's behavior passively or actively. We proposed a two-layered architecture of robot system. Implementation of listener robot partly proved effectiveness of the proposal approach. © Springer-Verlag Berlin Heidelberg 2005.
  • Kazunori Okamoto, Yukiko I. Nakano, Masashi Okamoto, Hung-Hsuan Huang, Toyoaki Nishida
    Knowledge-Based Intelligent Information and Engineering Systems 848-854 2005年  
  • Yukiko I. Nakano, Toshiyasu Murayama, Toyoaki Nishida
    IEICE Transactions on Information and Systems E87-D(6) 1338-1346 2004年  
    In story-based communication, where a message is conveyed in story form, it is important to embody the story with expressive materials. However, it is quite difficult for users to create rich multimedia contents using multimedia editing tools. This paper proposes a webbased multimedia environment, SPOC (Stream-oriented Public Opinion Channel), aiming at helping non-skillful people to convert their stories into TV-like programs very easily. The system can produce a digital camera work for graphics and video clips as well as generate an agent animation automatically according to a narration text. Findings in evaluation experiments showed that SPOC is easy-to-use and easy-to-learn for novice users. Given a short instruction, the subjects not only mastered the operations of the software, but also succeeded in creating highly original programs. In subjective evaluation, the subjects answered that they enjoyed using the software without feeling difficulty. These results suggest that this system reduces user's cost in making a program, and encourages communication in a network community.
  • Takaaki Hasegawa, Yukiko I. Nakano, Tsuneaki Kato
    Proceedings of the International Conference on Autonomous Agents 75-82 1997年  
    In human dialogue, the participants not only try to accomplish their own goal but also collaborate with each other. To put it concretely, the participants generate utterances that are appropriate to the degree of the partners' understanding. To allow a computer to communicate with us as naturally as we usually communicate with other people, we focus on this feature. In short, we think that a computer should be able to change its dialogue dynamically and autonomously according to the human's understanding. In this paper, we propose a dialogue model with this feature. We think that a dialogue agent should have two characteristic aspects namely, the reactive aspect that tries to maintain the dialogue and the deliberative aspect that tries to accomplish a task. Our model yields natural interaction through the interplay between these two aspects.
  • Yukiko I. Nakano, Masashi Okamoto, Daisuke Kawahara, Qing Li, Toyoaki Nishida
      査読有り
    This paper proposes a method for assigning gestures to text based on lexical and syntactic information. First, our empirical study identified lexical and syntactic information strongly correlated with gesture occurrence and suggested that syntactic structure is more useful for judging gesture occurrence than local syntactic cues. Based on the empirical results, we have implemented a system that converts text into an animated agent that gestures and speaks synchronously. 1
  • Justine Cassell, Tom Stocky, Tim Bickmore, Yang Gao, Yukiko Nakano, Kimiko Ryokai, Catherine Vaucelle, Hannes Vilhjálmsson
      査読有り
    In this paper, we describe an embodied conversational kiosk that builds on research in embodied conversational agents (ECAs) and on information displays in mixed reality and kiosk format in order to display spatial intelligence. ECAs leverage people’s abilities to coordinate information displayed in multiple modalities, particularly information conveyed in speech and gesture. Mixed reality depends on users ’ interactions with everyday objects that are enhanced with computational overlays. We describe an implementation, MACK (Media lab Autonomous Conversational Kiosk), an ECA who can answer questions about and give directions to the MIT Media Lab’s various research groups, projects and people. MACK uses a combination of speech, gesture, and indications on a normal paper map that users place on a table between themselves and MACK. Research issues involve users’ differential attention to hand gestures, speech and the map, and flexible architectures for Embodied Conversational Agents that allow these modalities to be fused in input and generation.
  • Hung-hsuan Huang, Tsuyoshi Masuda, Ra Cerekovic, Kateryna Tarasenko, Igor S. P, Yukiko Nakano, Toyoaki Nishida
      査読有り
    Abstract. Embodied Conversational Agents (ECAs) are computer generated life-like characters that interact with human users in face-to-face conversations. To achieve natural multi-modal conversations, ECA systems are very sophisticated and require many building assemblies and thus are difficult for individual research groups to develop. This paper proposes a generic architecture, the Universal ECA Framework, which is currently under development and includes a blackboard-based platform, a high-level protocol to integrate general purpose ECA components and ease ECA system prototyping. 1. The Essential Components of Embodied Conversational Agents and the Issues to Integrate Them Embodied Conversational Agents (ECAs) are computer generated life-like characters that interact with human users in face-to-face conversations. To achieve natural communications with human users, many software or hardware assemblies are required in an ECA system. By their functionalities in the information flow of the interactions with human users, they can be divided into four categories:
  • Justine Cassell, Yukiko I. Nakano, Timothy W. Bickmore, Ace L. Sidner, Charles Rich
      査読有り
    This paper addresses the problem of designing embodied conversational agents that exhibit appropriate posture shifts during dialogues with human users. Previous research has noted the importance of hand gestures, eye gaze and head nods in conversations between embodied agents and humans. However, this research has neglected the role of other body movements, in particular postural shifts. We present an analysis of human monologues and dialogues that suggests that postural shifts can be predicted as a function of discourse state in monologues, and discourse state and conversation state in dialogues. On the basis of these findings, we have implemented an embodied conversational agent that uses a dialogue manager called Collagen in such a way as to generate postural shifts.
  • Yukiko Nakano, Gabe Reinstein, Tom Stocky, Justine Cassell
      査読有り
    We investigate the verbal and nonverbal means for grounding, and propose a design for embodied conversational agents that relies on both kinds of signals to establish common ground in human-computer interaction. We analyzed eye gaze, head nods and attentional focus in the context of a direction-giving task. The distribution of nonverbal behaviors differed depending on the type of dialogue move being grounded, and the overall pattern reflected a monitoring of lack of negative feedback. Based on these results, we present an ECA that uses verbal and nonverbal grounding acts to update dialogue state.
  • Yukiko I. Nakano, Kenji Imamura, Kenji Hnamura, Hisashi Ohara
      査読有り
    While recent advancements in virtual reality technology have created a rich communication interface linking humans and computers, there has been little work on building dialogue systems for 3D virtual worlds. This paper proposes a method for altering the instruction dialogue to match the user's view in a virtual environment. We illustrate the method with the system MID-3D, which interactively instructs the user on dismantling some parts of a car. First, in order to change the content of the instruction dialogue to match the user's view, we extend the refinement-driven planning algorithm by using the user's view as a plan constraint. Second, to manage the dialogue smoothly, the system keeps track of the user's viewpoint as part of the dialogue state and uses this information for coping with interruptive subdialogues. These mechanisms enable MID-3D to set instruction dialogues in an incremental way; it takes account of the user's view even when it changcs frequently.
  • Yukiko I. Nakano, Tsuneaki Kato
      査読有り
    The purpose of this pape is to identify effective factors for selecting discourse organization cue phrases in instruction dialogue that signM changes in discourse structure such as topic shifts and attentional state changes. By using a machine learning technique, a variety of features concerning discourse structure, task structure, and dialogue context are examined in terms of their effectiveness and the best set of learning 2eaturea is identified. Our result reveals that, in addition to discourse structure, already identified in previous studies, task structure and alogue context play an important role. Moreover, an evaluation using a large dialogue corpus shows the utility of applying machine learning techniques to cue phrase selection.
  • Yukiko I. Nakano, Yoshiko Arimoto, Kazuyoshi Murata, Yasuhiro Asa, Mika Enomoto, Hirohiko Sagawa
      査読有り
    The aim of this paper is to develop animated agents that can control multimodal instruction dialogues by monitoring user’s behaviors. First, this paper reports on our Wizard-of-Oz experiments, and then, using the collected corpus, proposes a probabilistic model of fine-grained timing dependencies among multimodal communication behaviors: speech, gestures, and mouse manipulations. A preliminary evaluation revealed that our model can predict a instructor’s grounding judgment and a listener’s successful mouse manipulation quite accurately, suggesting that the model is useful in estimating the user’s understanding, and can be applied to determining the agent’s next action. 1
  • Justine Cassell, Yukiko I. Nakano, Timothy W. Bickmore, Ace L. Sidner, Charles Rich
      査読有り
    This paper addresses the issue of designing embodied conversational agents that exhibit appropriate posture shifts during dialogues with human users. Previous research has noted the importance of hand gestures, eye gaze and head nods in conversations between embodied agents and humans. We present an analysis of human monologues and dialogues that suggests that postural shifts can be predicted as a function of discourse state in monologues, and discourse and conversation state in dialogues. On the basis of these findings, we have implemented an embodied conversational agent that uses Collagen in such a way as to generate postural shifts. 1.
  • Matthias Rehm, Yukiko Nakano, Elisabeth André, Toyoaki Nishida
      査読有り
    Abstract. We present our concept of integrating culture as a computational parameter for modeling multimodal interactions with virtual agents. As culture is a social rather than a psychological notion, its influence is evident in interactions, where cultural patterns of behavior and interpretations mismatch. Nevertheless, taking culture seriously its influence penetrates most layers of agent behavior planning and generation. In this article we concentrate on a first meeting scenario, present our model of an interactive agent system and identify, where cultural parameters play a role. To assess the viability of our approach, we outline an evaluation study that is set up at the moment. 1
  • Afia Akhter Lipi, Yukiko Nakano, Matthias Rehm
      査読有り
    Abstract. The goal of this paper is to integrate culture as a computational term in embodied conversational agents by employing an empirical data-driven approach as well as a theoretical model-driven approach. We propose a parameter-based model that predicts nonverbal expressions appropriate for specific cultures. First, we introduce the Hofstede theory to describe socio-cultural characteristics of each country. Then, based on the previous studies in cultural differences of nonverbal behaviors, we propose expressive parameters to characterize nonverbal behaviors. Finally, by integrating socio-cultural characteristics and nonverbal expressive characteristics, we establish a Bayesian network model that predicts posture expressiveness from a country name, and vice versa.

MISC

 42
  • 伊藤 温志, 坂戸 達陽, 中野 有紀子, 二瓶 芙巳雄, 石井 亮, 深山 篤, 中村 高雄
    人工知能学会全国大会論文集 JSAI2022 3H3OS12a02-3H3OS12a02 2022年  
    説得力は、他者とのコミュニケーションにおいて重要なスキルである。本研究は、グループディスカッションにおける参加者の説得力を推定することを目的とする。まず、グループディスカッションにおける4人の参加者それぞれについて、人手によるアノテーションを行い、説得力の程度を評価した。次に、GRUベースのニューラルネットワークを用いて、音声、言語、視覚(頭部ポーズ)特徴を用いて各参加者の説得力を推定するマルチモーダルおよびマルチパーティモデルを作成した。実験の結果、マルチモーダルモデルとマルチパーティモデルは、ユニモーダルモデルやシングルパーソンモデルに比べて優れていることがわかった。最も性能の良いマルチモーダル・マルチパーティモデルは、説得力の高低の2値分類において80%の精度を達成し、グループ内で最も説得力のある参加者を77%の精度で予測することができる。
  • 二瓶 芙巳雄, 中野 有紀子, 高瀬 裕
    電子情報通信学会技術研究報告 = IEICE technical report : 信学技報 117(177) 55-59 2017年8月20日  
  • 木村清也, ZHANG Qi, HUANG Hung-Hsuan, 岡田将吾, 林佑樹, 高瀬裕, 中野有紀子, 大田直樹, 桑原和宏
    人工知能学会全国大会論文集(CD-ROM) 31st 2017年  
  • Yukiko I. Nakano, Roman Bednarik, Hung-Hsuan Huang, Kristiina Jokinen
    ACM Transactions on Interactive Intelligent Systems 6(1) 2016年4月21日  
    Eye gaze has been used broadly in interactive intelligent systems. The research area has grown in recent years to cover emerging topics that go beyond the traditional focus on interaction between a single user and an interactive system. This special issue presents five articles that explore new directions of gaze-based interactive intelligent systems, ranging from communication robots in dyadic and multiparty conversations to a driving simulator that uses eye gaze evidence to critique learners' behavior.
  • 二瓶芙巳雄, 中野有紀子, 林佑樹, HUANG Hung-Hsuan, 岡田将吾
    情報処理学会全国大会講演論文集 77th(1) 2015年  

共同研究・競争的資金等の研究課題

 11