中野有紀子

ナカノユキコ (Yukiko Nakano)

基本情報

所属: 成蹊大学理工学部理工学科教授

学位: 博士（情報理工学）(東京大学)

J-GLOBAL ID: 201101020839458565
researchmap会員ID: B000004842

外部リンク: http://iui.ci.seikei.ac.jp/

1990年東京大学大学院教育学研究科修士課程修了．同年，日本電信電話（株）入社．2002年 MIT Media Arts & Sciences修士課程修了．同年より JST社会技術研究開発センター専門研究員，東京農工大学大学院工学府特任准教授，成蹊大学理工学部情報科学科准教授を経て，現在，成蹊大学理工学部情報科学科教授．知的で自然なユーザインタフェースの実現に向けて，人との言語・非言語コミュニケーションが可能な会話エージェントの研究に従事．博士（情報理工学）．ACM，人工知能学会，電子情報通信学会，情報処理学会各会員．

研究キーワード

研究分野

経歴

2008年4月 - 2013年3月

成蹊大学理工学部准教授
2005年9月 - 2008年3月

東京農工大学大学院・工学府特任准教授

論文

Personality trait estimation in group discussions using multimodal analysis and speaker embedding.

Candy Olivia Mawalim, Shogo Okada, Yukiko I. Nakano, Masashi Unoki

Journal on Multimodal User Interfaces 17(2) 47-63 2023年
Estimating and Visualizing Persuasiveness of Participants in Group Discussions.

Atsushi Ito, Yukiko I. Nakano, Fumio Nihei, Tatsuya Sakato, Ryo Ishii, Atsushi Fukayama, Takao Nakamura

J. Inf. Process. 31 34-44 2023年
Predicting Persuasiveness of Participants in Multiparty Conversations.

Atsushi Ito, Yukiko I. Nakano, Fumio Nihei, Tatsuya Sakato, Ryo Ishii, Atsushi Fukayama, Takao Nakamura

IUI 2022: 27th International Conference on Intelligent User Interfaces 85-88 2022年
Dialogue Acts Aided Important Utterance Detection Based on Multiparty and Multimodal Information.

Fumio Nihei, Ryo Ishii, Yukiko I. Nakano, Kyosuke Nishida, Ryo Masumura, Atsushi Fukayama, Takao Nakamura

INTERSPEECH 1086-1090 2022年
性格特性推定における多国籍話者の非言語特徴の分析

久芳和己, 中野有紀子, 岡田将吾

人工知能学会全国大会論文集 JSAI2022 3C4GS604-3C4GS604 2022年

ジェスチャーや表情など、さまざまな非言語情報は2者間対話の世界において重要な役割を担っている。本研究の第一の目的は，多言語話者による二者間相互作用における非言語情報の差異を分析することである．そこで，文化的背景の異なる3カ国で収集されたマルチモーダルな会話データセットであるNoXiデータコーパスを用いて，ANOVA分析により3カ国の対話ペアグループ間で異なる非言語情報を分析し、報告する。
Detecting Change Talk in Motivational Interviewing using Verbal and Facial Information.

Yukiko I. Nakano, Eri Hirose, Tatsuya Sakato, Shogo Okada, Jean-Claude Martin

ICMI 5-14 2022年
アンサンブル弱教師付き学習に基づくグループディスカッションの質の推定

鈴木凱, 岡田将吾, 黄宏軒, 中野有紀子

人工知能学会全国大会論文集 JSAI2022 3H3OS12a01-3H3OS12a01 2022年

本論文では，マルチモーダル特徴量を用いた議論の質の推定モデルの精度を改善するための手法を提案する.計56回のグループミーティングで観測された参加者の韻律・表情・言語・発話ターンの特徴量を含むグループ会議コーパスMATRICSを用いる．先行研究で課題となっていた時系列データに含まれる全てのフレーム・全てのモダリティの特徴量が, そのラベルの推定に有効であるとは限らないという問題に対して，ノイズラベルに有効な弱教師あり学習であるCo-teachingをよりノイズに対して頑健に拡張したN-teachingモデルを提案する．またノイズとして学習に使われなかったサンプルについて分析を行い，先行研究との比較を行った．本研究では議論内容のOriginally（新規性）の指標においてMAE 0.309という最高精度を得た．
Task-independent Recognition of Communication Skills in Group Interaction Using Time-series Modeling

Candy Olivia Mawalim, Shogo Okada, Yukiko I. Nakano

ACM Transactions on Multimedia Computing, Communications, and Applications 17(4) 1-27 2021年11月30日

Case studies of group discussions are considered an effective way to assess communication skills (CS). This method can help researchers evaluate participants’ engagement with each other in a specific realistic context. In this article, multimodal analysis was performed to estimate CS indices using a three-task-type group discussion dataset, the MATRICS corpus. The current research investigated the effectiveness of engaging both static and time-series modeling, especially in task-independent settings. This investigation aimed to understand three main points: first, the effectiveness of time-series modeling compared to nonsequential modeling; second, multimodal analysis in a task-independent setting; and third, important differences to consider when dealing with task-dependent and task-independent settings, specifically in terms of modalities and prediction models. Several modalities were extracted (e.g., acoustics, speaking turns, linguistic-related movement, dialog tags, head motions, and face feature sets) for inferring the CS indices as a regression task. Three predictive models, including support vector regression (SVR), long short-term memory (LSTM), and an enhanced time-series model (an LSTM model with a combination of static and time-series features), were taken into account in this study. Our evaluation was conducted by using the R² score in a cross-validation scheme. The experimental results suggested that time-series modeling can improve the performance of multimodal analysis significantly in the task-dependent setting (with the best R² = 0.797 for the total CS index), with word2vec being the most prominent feature. Unfortunately, highly context-related features did not fit well with the task-independent setting. Thus, we propose an enhanced LSTM model for dealing with task-independent settings, and we successfully obtained better performance with the enhanced model than with the conventional SVR and LSTM models (the best R² = 0.602 for the total CS index). In other words, our study shows that a particular time-series modeling can outperform traditional nonsequential modeling for automatically estimating the CS indices of a participant in a group discussion with regard to task dependency.
Selecting Popular Topics for Elderly People in Conversation-based Companion Agents

Kazufumi Tsukada, Yutaka Takase, Yukiko I. Nakano

ACM/IEEE International Conference on Human-Robot Interaction 02-05- 93-94 2015年3月2日査読有り

In aging societies, supporting elderly people is a critical issue, and companion agents that can function as a conversational partner are expected to provide social support to isolated older adults. Aiming at improving companionship dialogues with these agents, this study proposes a topic selection mechanism using blog articles written by the elderly. By categorizing the nouns extracted from blogs using Wikipedia, we defined 219 topic categories consisting of about 3,000 topic words that the elderly discuss in their daily life. The topic selection mechanism is implemented into a companion agent and used to generate the agent's utterances.
Controlling Robot's Gaze according to Participation Roles and Dominance in Multiparty Conversations

Takashi Yoshino, Yutaka Takase, Yukiko I. Nakano

ACM/IEEE International Conference on Human-Robot Interaction 02-05- 127-128 2015年3月2日査読有り

A robot's gaze behaviors are indispensable in allowing the robot to participate in multiparty conversations. To build a robot that can convey appropriate attentional behavior in multiparty human-robot conversations, this study proposes robot head gaze models in terms of participation roles and dominance in a conversation. By implementing such models, we developed a robot that can determine appropriate gaze behaviors according to its conversational roles and dominance.
Estimating user's attitude in multimodal conversational system for elderly people with dementi

Naoko Saito, Shogo Okada, Katsumi Nitta, Yukiko I. Nakano, Yuki Hayashi

AAAI Spring Symposium - Technical Report SS-15-07 100-103 2015年

Toward constructing a multimodal conversation agent system which can be used to interview elderly patients with dementia, we propose a turn taking mechanism based on recognition of the subjects attitude as to whether the subject has (or relinquish) the right to speak. A key strategy in the recognition task is to extract features from pausing behavior in subject's spontaneous speech and to fuse multimodal signals (gaze, head motion, and speech). In this paper, we focus on evaluation of the recognition module used in guiding turn taking. To evaluate it, we collect multimodal data corpus from 42 dyadic conversations between subjects with dementia and the virtual agent we have developed as a prototype and annotate subject's multimodal data manually. In experiments, we validate recognition models trained multimodal dataset by machine learning methods. Experimental results shows that pause features are effective to improve the attitude recognition accuracy and the accuracy is improved up to 88%.
グループディスカッションコーパスの構築および性格特性との関連性の分析

林佑樹, 二瓶芙巳雄, 中野有紀子, 黄宏軒, 岡田将吾

56(4) 1217-1227 2015年査読有り
Intelligent Shop Window - Producing Dynamic Situated Augmented Reality using a Large See-through Screen-

Reo Suzuki, Yutaka Takase, Yukiko I. Nakano

In Proceedings of The Eighth International Conference on Advances in Computer-Human Interactions (ACHI 2015) 92-95 2015年査読有り
Analyzing co-Occurrence patterns of nonverbal behaviors in collaborative learning

Sakiko Nihonyanagi, Yuki Hayashi, Yukiko I. Nakano

GazeIn 2014 - Proceedings of the 7th ACM Workshop on Eye Gaze in Intelligent Human Machine Interaction: Eye-Gaze and Multimodality, Co-located with ICMI 2014 33-37 2014年11月16日

In collaborative learning, participants work on the learning task together. In this environment, linguistic information via speech as well as non-verbal information such as gaze and writing actions are important elements. It is expected that integrating the information from these behaviors will contribute to assessing the learning activity and characteristics of each participant in a more objective manner. With the objective of characterizing participants in the collaborative learning activity, this study analyzed the verbal and nonverbal behaviors and found that the gaze behaviors of individual participants and those between the participants provides useful information in distinguishing a leader of the group, one who follows the leader, or one who attends to other participants who do not appear to understand.
Predicting influential statements in group discussions using speech and head motion information

Fumio Nihei, Yukiko I. Nakano, Yuki Hayashi, Hung-Hsuan Huang, Shogo Okada

ICMI 2014 - Proceedings of the 2014 International Conference on Multimodal Interaction 136-143 2014年11月12日査読有り

Group discussions are used widely when generating new ideas and forming decisions as a group. Therefore, it is assumed that giving social influence to other members through facilitating the discussion is an important part of discussion skill. This study focuses on influential statements that affect discussion flow and highly related to facilitation, and aims to establish a model that predicts influential statements in group discussions. First, we collected a multimodal corpus using different group discussion tasks in-basket and case-study. Based on schemes for analyzing arguments, each utterance was annotated as being influential or not. Then, we created classification models for predicting influential utterances using prosodic features as well as attention and head motion information from the speaker and other members of the group. In our model evaluation, we discovered that the assessment of each participant in terms of discussion facilitation skills by experienced observers correlated highly to the number of influential utterances by a given participant. This suggests that the proposed model can predict influential statements with considerable accuracy, and the prediction results can be a good predictor of facilitators in group discussions.
Gaze-in 2014: the 7th Workshop on Eye Gaze in Intelligent Human Machine Interaction.

Hung-Hsuan Huang, Roman Bednarik, Kristiina Jokinen, Yukiko I. Nakano

Proceedings of the 16th International Conference on Multimodal Interaction(ICMI) 527-528 2014年
非言語情報に基づく受話者推定機構を用いた多人数会話システム

中野有紀子, 馬場直哉, 黄宏軒, 林佑樹

人工知能学会論文誌 29(1) 69-79 2014年査読有り
協調学習における非言語情報に基づく学習態度の可視化

林佑樹, 小川裕史, 中野有紀子

情報処理学会論文誌 55(1) 189-198 2014年査読有り
Influential statements and gaze for persuasion modeling In the 8th Nordic Conference on Human-Computer Interaction

Vrzakova, H, Bednarik, R, Nihei, F, Nakano, Y

In the 8th Nordic Conference on Human-Computer Interaction 915-918 2014年査読有り
A dominance estimation mechanism using eye-gaze and turn-taking information

Misato Yatsushiro, Naoya Ikeda, Yuki Hayashi, Yukiko I. Nakano

GazeIn 2013 - Proceedings of the 2013 ACM Workshop on Eye Gaze in Intelligent Human Machine Interaction: Gaze in Multimodal Interaction, co-located with ICMI 2013 13-18 2013年12月13日

With a goal of contributing to multiparty conversation management, this paper proposes a mechanism for estimating conversational dominance in group interaction. Based on our corpus analysis, we have already established a regression model for dominance estimation using speech and gaze information. In this study, we implement the model as a dominance estimation mechanism, and propose an idea of utilizing the mechanism in moderating multiparty conversations between a conversational robot and three human users. The system decides whom the system should talk to based on the dominance level of each user.
映像コミュニケーションにおける窓越しインタフェースＭｏＰａＣｏによるテレプレゼンスの増強

石井亮, 小澤史朗, 川村春美, 小島明, 中野有紀子

電子情報通信学会論文誌 D J96-D(1) 110-119 2013年査読有り
人対会話エージェントとの多人数会話における頭部方向と音声情報を用いた受話者推定機構

馬場直哉, 黄宏軒, 中野有紀子

人工知能学会論文誌 28(2) 149-159 2013年査読有り
Implementation and Evaluation of a Multimodal Addressee Identification Mechanism for Multiparty Conversation Systems

Yukiko I. Nakano, Naoya Baba, Hung-Hsuan Huang, Yuki Hayashi

ICMI'13: PROCEEDINGS OF THE 2013 ACM INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION 35-42 2013年査読有り

In conversational agents with multiparty communication functionality, a system needs to be able to identify the addressee for the current floor and respond to the user when the utterance is addressed to the agent. This study proposes some addressee identification models based on speech and gaze information, and tests whether the models can be applied to different proxemics. We build an addressee identification mechanism by implementing the models and incorporate it into a fully autonomous multiparty conversational agent. The system identifies the addressee from online multimodal data and uses this information in language understanding and dialogue management. Finally, an evaluation experiment shows that the proposed addressee identification mechanism works well in a real-time system, with an F-measure for addressee estimation of 0.8 for agent-addressed utterances. We also found that our system more successfully avoided disturbing the conversation by mistakenly taking a turn when the agent is not addressed.
Toward a memory assistant companion for the individuals with mild memory impairment

Hung-Hsuan Huang, Hiroki Matsushita, Kyoji Kawagoe, Yoichi Sakai, Yuuko Nonaka, Yukiko Nakano, Kiyoshi Yasuda

Proceedings of the 11th IEEE International Conference on Cognitive Informatics and Cognitive Computing, ICCI*CC 2012 295-299 2012年査読有り

With the increasing average life expectancy of world population, there are more and more dementia patients, and the needs of assistive technology emerges. According to the literature, the progress of cognitive impairment can be suppressed to be slower if the patients is constantly in calm mood. An effective way is suggested by keeping them in social relationship with others. With the goal of developing a conversational humanoid that can serve as a companion for dementia patients, we propose an autonomous virtual agent that can generate back channel feedback, such as head nods and verbal acknowledgment, on the basis of acoustic information in the user's speech. The system is also capable of speech recognition and language understanding functionalities. In order to compensate the companionship of the agent and the ability to assist the user's memory, we are developing a memory vest which is equipped with portable devices including an Android smartphone, two IC audio recorders, and a digital video recorder to log the daily life of the patient. The gathered activity history database can then be used to enrich the dialogue ability of the agent and for helping the user to recall his / her own memory. © 2012 IEEE.
メタバースにおける言語・空間情報に基づくアバタへの道案内ジェスチャの自動付与

塚本剛生, 中野有紀子

日本バーチャルリアリティ学会論文誌 17(2) 79-89 2012年査読有り

This paper proposes a direction giving avatar system in Metaverse, which automatically generates direction giving gestures based on linguistic information obtained from the user's chat text input and spatial information in Metaverse. First, we conduct an experiment to collect direction giving conversation corpus. Then, using the collected corpus, we analyze the relationship between the proxemics of conversation participants and the position of their direction giving gestures. Next, we analyze the relationship between linguistic features in direction giver's utterances and the shape of their spatial gestures. We define five categories of gesture concepts and four gesture shape parameters, and analyze the relationship between the gesture concepts and a set of gesture parameters. Based on these results, we propose an automatic gesture decision mechanism and implement a direction giving avatar system in Metaverse.
Estimating Conversational Dominance in Multiparty Interaction

Yukiko I. Nakano, Yuki Fukuhara

ICMI '12: PROCEEDINGS OF THE ACM INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION 77-84 2012年査読有り

It is important for conversational agents that manage multiparty conversations to recognize the group dynamics existing among the users. This paper proposes a method for estimating the conversational dominance of participants in group interactions. First, we conducted a Wizard-of-Oz experiment to collect conversational speech, and motion data. Then, we analyzed various paralinguistic speech and gaze behaviors to elucidate the factors that predict conversational dominance. Finally, by exploiting the speech and gaze data as estimation parameters, we created a regression model to estimate conversational dominance, and the multiple correlation coefficient of this model was 0.85.
視線と頭部動作に基づくユーザの会話参加態度の推定

石井亮, 大古亮太, 中野有紀子, 西田豊明

情報処理学会論文誌, 52(12) 3625-3636 2011年査読有り
Combining Multiple Types of Eye-gaze Information to Predict User’s Conversational Engagement

中野有紀子

2011 International Conference on Intelligent User Interfaces (IUI2011), Workshop on Eye Gaze in Intelligent Human Machine Interaction 2011年査読有り
Culture-related differences in aspects of behavior for virtual characters across Germany and Japan

中野有紀子

in the proceedings of the 10th International Conference on Autonomous Agents and Multiagent Systems (AAMAS2011) 441-448 2011年査読有り
Estimating a User’s Conversational Engagement Based on Head Pose Information

中野有紀子

the 11th International Conference on Intelligent Virtual Agents (IVA2011) 262-268 2011年査読有り
Identifying Utterances Addressed to an Agent in Multiparty Human–Agent Conversations

中野有紀子

the 11th International Conference on Intelligent Virtual Agents (IVA2011), 255-261 2011年査読有り
Culture-related Topic Selection in Small Talk Conversations across Germany and Japan,

中野有紀子

Proceedings of the 11th International Conference on Intelligent Virtual Agents (IVA 2011) 1-13 2011年査読有り
Making a Virtual Conversational Agent be Aware of the Addressee of Users’ Utterances in Multi-user Conversation from Nonverbal Information

中野有紀子

the 13th ACM International Conference on Multimodal Interaction (ICMI2011) 401-408 2011年査読有り
複数人ユーザの協力的意思決定会話を支援する情報提供エージェント

黄宏軒, 武田信也, 小野正貴, 中野有紀子

人工知能学会全国大会論文集 JSAI2010 1C13-1C13 2010年

本研究は，旅行代理店を題材として，二人の客の間でなされる意思決定会話を支援する情報提供エージェントを開発するものである．本システムは，ユーザの発話状態と顔向きの変化から，会話内容が相談，雑談，質問，理解のどの状態であるかを推定し，会話制御機構において，四種類の状態に応じてエージェントによる会話への参入方法を決定する．指標の有効性と推定精度を，被験者評価実験によって検証し，その結果を報告する．
Awareness of perceived world and conversational engagement by conversational agents

Yukiko I. Nakano, Toyoaki Nishida

Proceedings of the Symposium on Conversational Informatics for Supporting Social Intelligence and Interaction: Situational and Environmental Information Enforcing Involvement in Conversation 128-135 2005年

In face-to-face communication, conversation is affected by what is existing and taking place within the environment. With the goal of improving communicative capability of humanoid systems, this paper proposes conversational agents that are aware of a perceived world, and use the perceptual information to enforce the involvement in conversation. First, we review previous studies on nonverbal engagement behaviors in face-to-face and human-artifact interaction. Based on the discussion, we implement some engagement functions into a conversational agent embodied in a story-based communication environment where multimodal recognition and generation techniques support useragent communication.
How to make robot a robust and interactive communicator

Yoshiyasu Ogasawara, Masashi Okamoto, Yukiko I. Nakano, Yong Xu, Toyoaki Nishida

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 3683 289-295 2005年

One of the key points to make robot a robust and interactive communicator is enabling robot to recognize user's attention behavior and transit among communication situations, so that human and robot can be involved into shared activity. Four communication situations are defined to describe typical situations of interactive human-robot communication. The robot can not only open and close communication, but also adapt to user's behavior passively or actively. We proposed a two-layered architecture of robot system. Implementation of listener robot partly proved effectiveness of the proposal approach. © Springer-Verlag Berlin Heidelberg 2005.
Generating CG Movies Based on a Cognitive Model of Shot Transition.

Kazunori Okamoto, Yukiko I. Nakano, Masashi Okamoto, Hung-Hsuan Huang, Toyoaki Nishida

Knowledge-Based Intelligent Information and Engineering Systems 848-854 2005年
Multimodal story-based communication: Integrating a movie and a conversational agent

Yukiko I. Nakano, Toshiyasu Murayama, Toyoaki Nishida

IEICE Transactions on Information and Systems E87-D(6) 1338-1346 2004年

In story-based communication, where a message is conveyed in story form, it is important to embody the story with expressive materials. However, it is quite difficult for users to create rich multimedia contents using multimedia editing tools. This paper proposes a webbased multimedia environment, SPOC (Stream-oriented Public Opinion Channel), aiming at helping non-skillful people to convert their stories into TV-like programs very easily. The system can produce a digital camera work for graphics and video clips as well as generate an agent animation automatically according to a narration text. Findings in evaluation experiments showed that SPOC is easy-to-use and easy-to-learn for novice users. Given a short instruction, the subjects not only mastered the operations of the software, but also succeeded in creating highly original programs. In subjective evaluation, the subjects answered that they enjoyed using the software without feeling difficulty. These results suggest that this system reduces user's cost in making a program, and encourages communication in a network community.
Collaborative dialogue model based on interaction between reactivity and deliberation

Takaaki Hasegawa, Yukiko I. Nakano, Tsuneaki Kato

Proceedings of the International Conference on Autonomous Agents 75-82 1997年

In human dialogue, the participants not only try to accomplish their own goal but also collaborate with each other. To put it concretely, the participants generate utterances that are appropriate to the degree of the partners' understanding. To allow a computer to communicate with us as naturally as we usually communicate with other people, we focus on this feature. In short, we think that a computer should be able to change its dialogue dynamically and autonomously according to the human's understanding. In this paper, we propose a dialogue model with this feature. We think that a dialogue agent should have two characteristic aspects namely, the reactive aspect that tries to maintain the dialogue and the deliberative aspect that tries to accomplish a task. Our model yields natural interaction through the interplay between these two aspects.
Converting Text into Agent Animations: Assigning Gestures to Text

Yukiko I. Nakano, Masashi Okamoto, Daisuke Kawahara, Qing Li, Toyoaki Nishida

査読有り

This paper proposes a method for assigning gestures to text based on lexical and syntactic information. First, our empirical study identified lexical and syntactic information strongly correlated with gesture occurrence and suggested that syntactic structure is more useful for judging gesture occurrence than local syntactic cues. Based on the empirical results, we have implemented a system that converts text into an animated agent that gestures and speaks synchronously. 1
MACK: Media lab Autonomous Conversational Kiosk

Justine Cassell, Tom Stocky, Tim Bickmore, Yang Gao, Yukiko Nakano, Kimiko Ryokai, Catherine Vaucelle, Hannes Vilhjálmsson

査読有り

In this paper, we describe an embodied conversational kiosk that builds on research in embodied conversational agents (ECAs) and on information displays in mixed reality and kiosk format in order to display spatial intelligence. ECAs leverage people’s abilities to coordinate information displayed in multiple modalities, particularly information conveyed in speech and gesture. Mixed reality depends on users ’ interactions with everyday objects that are enhanced with computational overlays. We describe an implementation, MACK (Media lab Autonomous Conversational Kiosk), an ECA who can answer questions about and give directions to the MIT Media Lab’s various research groups, projects and people. MACK uses a combination of speech, gesture, and indications on a normal paper map that users place on a table between themselves and MACK. Research issues involve users’ differential attention to hand gestures, speech and the map, and flexible architectures for Embodied Conversational Agents that allow these modalities to be fused in input and generation.
Toward a Universal Platform for Integrating Embodied

Hung-hsuan Huang, Tsuyoshi Masuda, Ra Cerekovic, Kateryna Tarasenko, Igor S. P, Yukiko Nakano, Toyoaki Nishida

査読有り

Abstract. Embodied Conversational Agents (ECAs) are computer generated life-like characters that interact with human users in face-to-face conversations. To achieve natural multi-modal conversations, ECA systems are very sophisticated and require many building assemblies and thus are difficult for individual research groups to develop. This paper proposes a generic architecture, the Universal ECA Framework, which is currently under development and includes a blackboard-based platform, a high-level protocol to integrate general purpose ECA components and ease ECA system prototyping. 1. The Essential Components of Embodied Conversational Agents and the Issues to Integrate Them Embodied Conversational Agents (ECAs) are computer generated life-like characters that interact with human users in face-to-face conversations. To achieve natural communications with human users, many software or hardware assemblies are required in an ECA system. By their functionalities in the information flow of the interactions with human users, they can be divided into four categories:
Annotating and Generating Posture from Discourse Structure in Embodied Conversational Agents

Justine Cassell, Yukiko I. Nakano, Timothy W. Bickmore, Ace L. Sidner, Charles Rich

査読有り

This paper addresses the problem of designing embodied conversational agents that exhibit appropriate posture shifts during dialogues with human users. Previous research has noted the importance of hand gestures, eye gaze and head nods in conversations between embodied agents and humans. However, this research has neglected the role of other body movements, in particular postural shifts. We present an analysis of human monologues and dialogues that suggests that postural shifts can be predicted as a function of discourse state in monologues, and discourse state and conversation state in dialogues. On the basis of these findings, we have implemented an embodied conversational agent that uses a dialogue manager called Collagen in such a way as to generate postural shifts.
Towards a Model of Face-to-Face Grounding

Yukiko Nakano, Gabe Reinstein, Tom Stocky, Justine Cassell

査読有り

We investigate the verbal and nonverbal means for grounding, and propose a design for embodied conversational agents that relies on both kinds of signals to establish common ground in human-computer interaction. We analyzed eye gaze, head nods and attentional focus in the context of a direction-giving task. The distribution of nonverbal behaviors differed depending on the type of dialogue move being grounded, and the overall pattern reflected a monitoring of lack of negative feedback. Based on these results, we present an ECA that uses verbal and nonverbal grounding acts to update dialogue state.
Taking Account of the User's View in 3D Multimodal Instruction Dialogue

Yukiko I. Nakano, Kenji Imamura, Kenji Hnamura, Hisashi Ohara

査読有り

While recent advancements in virtual reality technology have created a rich communication interface linking humans and computers, there has been little work on building dialogue systems for 3D virtual worlds. This paper proposes a method for altering the instruction dialogue to match the user's view in a virtual environment. We illustrate the method with the system MID-3D, which interactively instructs the user on dismantling some parts of a car. First, in order to change the content of the instruction dialogue to match the user's view, we extend the refinement-driven planning algorithm by using the user's view as a plan constraint. Second, to manage the dialogue smoothly, the system keeps track of the user's viewpoint as part of the dialogue state and uses this information for coping with interruptive subdialogues. These mechanisms enable MID-3D to set instruction dialogues in an incremental way; it takes account of the user's view even when it changcs frequently.
Cue Phrase Selection in Instruction Dialogue Using Machine Learning

Yukiko I. Nakano, Tsuneaki Kato

査読有り

The purpose of this pape is to identify effective factors for selecting discourse organization cue phrases in instruction dialogue that signM changes in discourse structure such as topic shifts and attentional state changes. By using a machine learning technique, a variety of features concerning discourse structure, task structure, and dialogue context are examined in terms of their effectiveness and the best set of learning 2eaturea is identified. Our result reveals that, in addition to discourse structure, already identified in previous studies, task structure and alogue context play an important role. Moreover, an evaluation using a large dialogue corpus shows the utility of applying machine learning techniques to cue phrase selection.
Predicting Evidence of Understanding by Monitoring User’s Task Manipulation in Multimodal Conversations

Yukiko I. Nakano, Yoshiko Arimoto, Kazuyoshi Murata, Yasuhiro Asa, Mika Enomoto, Hirohiko Sagawa

査読有り

The aim of this paper is to develop animated agents that can control multimodal instruction dialogues by monitoring user’s behaviors. First, this paper reports on our Wizard-of-Oz experiments, and then, using the collected corpus, proposes a probabilistic model of fine-grained timing dependencies among multimodal communication behaviors: speech, gestures, and mouse manipulations. A preliminary evaluation revealed that our model can predict a instructor’s grounding judgment and a listener’s successful mouse manipulation quite accurately, suggesting that the model is useful in estimating the user’s understanding, and can be applied to determining the agent’s next action. 1
Non-verbal cues for discourse structure

Justine Cassell, Yukiko I. Nakano, Timothy W. Bickmore, Ace L. Sidner, Charles Rich

査読有り

This paper addresses the issue of designing embodied conversational agents that exhibit appropriate posture shifts during dialogues with human users. Previous research has noted the importance of hand gestures, eye gaze and head nods in conversations between embodied agents and humans. We present an analysis of human monologues and dialogues that suggests that postural shifts can be predicted as a function of discourse state in monologues, and discourse and conversation state in dialogues. On the basis of these findings, we have implemented an embodied conversational agent that uses Collagen in such a way as to generate postural shifts. 1.
Culture-specific first meeting encounters between virtual agents

Matthias Rehm, Yukiko Nakano, Elisabeth André, Toyoaki Nishida

査読有り

Abstract. We present our concept of integrating culture as a computational parameter for modeling multimodal interactions with virtual agents. As culture is a social rather than a psychological notion, its influence is evident in interactions, where cultural patterns of behavior and interpretations mismatch. Nevertheless, taking culture seriously its influence penetrates most layers of agent behavior planning and generation. In this article we concentrate on a first meeting scenario, present our model of an interactive agent system and identify, where cultural parameters play a role. To assess the viability of our approach, we outline an evaluation study that is set up at the moment. 1
A Parameter-Based Model for Generating Culturally Adaptive Nonverbal Behaviors in Embodied Conversational Agents

Afia Akhter Lipi, Yukiko Nakano, Matthias Rehm

査読有り

Abstract. The goal of this paper is to integrate culture as a computational term in embodied conversational agents by employing an empirical data-driven approach as well as a theoretical model-driven approach. We propose a parameter-based model that predicts nonverbal expressions appropriate for specific cultures. First, we introduce the Hofstede theory to describe socio-cultural characteristics of each country. Then, based on the previous studies in cultural differences of nonverbal behaviors, we propose expressive parameters to characterize nonverbal behaviors. Finally, by integrating socio-cultural characteristics and nonverbal expressive characteristics, we establish a Bayesian network model that predicts posture expressiveness from a country name, and vice versa.

MISC

グループディスカッションにおける説得力推定のためのマルチパーティモデル

伊藤温志, 坂戸達陽, 中野有紀子, 二瓶芙巳雄, 石井亮, 深山篤, 中村高雄

人工知能学会全国大会論文集 JSAI2022 3H3OS12a02-3H3OS12a02 2022年

説得力は、他者とのコミュニケーションにおいて重要なスキルである。本研究は、グループディスカッションにおける参加者の説得力を推定することを目的とする。まず、グループディスカッションにおける4人の参加者それぞれについて、人手によるアノテーションを行い、説得力の程度を評価した。次に、GRUベースのニューラルネットワークを用いて、音声、言語、視覚（頭部ポーズ）特徴を用いて各参加者の説得力を推定するマルチモーダルおよびマルチパーティモデルを作成した。実験の結果、マルチモーダルモデルとマルチパーティモデルは、ユニモーダルモデルやシングルパーソンモデルに比べて優れていることがわかった。最も性能の良いマルチモーダル・マルチパーティモデルは、説得力の高低の2値分類において80%の精度を達成し、グループ内で最も説得力のある参加者を77%の精度で予測することができる。
マルチモーダルCNNを用いた議論要約のための会話の抜粋 (ヒューマンコミュニケーション基礎)

二瓶芙巳雄, 中野有紀子, 高瀬裕

電子情報通信学会技術研究報告 = IEICE technical report : 信学技報 117(177) 55-59 2017年8月20日
グループディスカッションに参加するロボットにおけるアテンション対象モデルの提案

木村清也, ZHANG Qi, HUANG Hung-Hsuan, 岡田将吾, 林佑樹, 高瀬裕, 中野有紀子, 大田直樹, 桑原和宏

人工知能学会全国大会論文集(CD-ROM) 31st 2017年
Introduction to the special issue on new directions in eye gaze for interactive intelligent systems

Yukiko I. Nakano, Roman Bednarik, Hung-Hsuan Huang, Kristiina Jokinen

ACM Transactions on Interactive Intelligent Systems 6(1) 2016年4月21日

Eye gaze has been used broadly in interactive intelligent systems. The research area has grown in recent years to cover emerging topics that go beyond the traditional focus on interaction between a single user and an interactive system. This special issue presents five articles that explore new directions of gaze-based interactive intelligent systems, ranging from communication robots in dyadic and multiparty conversations to a driving simulator that uses eye gaze evidence to critique learners' behavior.
発話情報と頭部移動情報に基づく議論における影響力のある発言の予測

二瓶芙巳雄, 中野有紀子, 林佑樹, HUANG Hung-Hsuan, 岡田将吾

情報処理学会全国大会講演論文集 77th(1) 2015年

もっとみる

共同研究・競争的資金等の研究課題

知識に基づく個人適応型対話制御機能を有するマルチモーダル対話システム

日本学術振興会科学研究費助成事業 2019年4月 - 2023年3月

中野有紀子, 坂戸達陽, 高瀬裕
ヒューマン・コンピュータ・インタラクションのためのユーザ適応型人工知能

科学技術振興機構戦略的な研究開発の推進戦略的創造研究推進事業 CREST 2020年 - 2023年

中野有紀子
会話基盤更新プロセスの可視化と異文化コミュニケーション学習支援での検証

日本学術振興会科学研究費助成事業 2019年4月 - 2022年3月

西田豊明, Mirzaei Maryam, 中野有紀子, 黄宏軒, 岡田将吾, Peterson Mark
会話エージェントによるグループ討論コミュニケーションスキルの評価と改善支援の研究

日本学術振興会科学研究費助成事業 2013年4月 - 2017年3月

中野有紀子, 岡田将吾, 黄宏軒, 高瀬裕, 林佑樹
認知症高齢者の状態把握を行う聞き手エージェントの研究開発

日本学術振興会科学研究費助成事業 2013年4月 - 2016年3月

中野有紀子, 林佑樹