湯本高行

ユモトタカユキ (Takayuki Yumoto)

基本情報

所属: 兵庫県立大学社会情報科学部准教授

学位: 博士(情報学)(京都大学)

J-GLOBAL ID: 200901000308952299
researchmap会員ID: 5000091303

外部リンク: https://sites.google.com/view/yumotolab/

研究キーワード

研究分野

情報通信 / ウェブ情報学、サービス情報学 / ウェブマイニング

経歴

2020年4月 - 現在

兵庫県立大学社会情報科学部准教授
2007年4月 - 2020年3月

兵庫県立大学大学院工学研究科助教

学歴

- 2007年

京都大学情報学研究科社会情報学専攻

受賞

2023年3月

若手功績賞日本データベース学会

湯本高行

論文

事故説明文からの傷病の程度の推定

川原敬史, 橋口友哉, 湯本高行, 大島裕明

電子情報通信学会論文誌D 情報・システム J105-D(5) 322-336 2022年5月1日査読有り

本研究では，事故の概要を説明したテキストを入力として，当事者が受けた傷病の程度を推定する手法を提案する．入力の対象とするテキストは，数文程度の文書を想定している．機械学習による分類問題を解くことで，その入力に該当する傷病の程度を推定するというのが提案手法の構成となる．本研究で利用するデータは，事故情報データバンクシステムで公開されている事故データである．入力として用いるのは「事故の概要」項目に記載されたテキストである．提案手法では，入力テキストを汎用言語モデルBERTを利用して分散表現として表現する．BERTのモデルとしては，日本語Wikipediaを用いて学習された事前学習モデルを用いる．しかし，傷病の程度を推定するというタスクの正解率を向上させるために，四つの工夫，（1）クラスウェイト，（2）Ordinal Classification，（3）マルチタスクラーニング，（4）トークンラベル推定による追加学習モデル，を導入する．これらの工夫を用いる場合と用いない場合において，傷病の程度の推定の正解率やMacro F1，RMSE，混同行列による評価にどのような影響が出るかを検証した．その結果，（1）クラスウェイト，並びに，（2）Ordinal Classificationを導入した際に，Macro F1の向上とRMSEの改善が得られるという結果となった．また，（3）マルチタスクラーニングを導入した際に正解率の向上が見られた．
Measuring Term Relevancy Based on Actual and Predicted Co-occurrence.

Yuya Koyama, Takayuki Yumoto, Teijiro Isokawa, Naotake Kamiura

Proceedings of the 13th International Conference on Ubiquitous Information Management and Communication, IMCOM 2019, Phuket, Thailand, January 4-6, 2019 996-1005 2019年査読有り
含意関係に基づく二部グラフを用いた情報の断片のランキング

飯塚翔, 湯本高行, 新居学, 上浦尚武

電子情報通信学会論文誌 D(Web) J101-D(4) 681‐689 (WEB ONLY) 2018年4月1日
Finding Rare Information from the Web Using Social Bookmarks and Word Co-occurrence

International Journal of Biomedical Soft Computing and Human Sciences 22(1) 9-18 2017年査読有り
UHYG at the NTCIR-12 MobileClick Task: Link-based Ranking on iUnit-Page Bipartite Graph.

Sho Iizuka, Takayuki Yumoto, Manabu Nii, Naotake Kamiura

Proceedings of the 12th NTCIR Conference on Evaluation of Information Access Technologies, National Center of Sciences, Tokyo, Japan, June 7-10, 2016 2016年査読有り
Rarity-Oriented Information Retrieval: Social Bookmarking vs. Word Co-occurrence

Takayuki Yumoto, Takahiro Yamanaka, Manabu Nii, Naotake Kamiura

DIGITAL LIBRARIES: KNOWLEDGE, INFORMATION, AND DATA IN AN OPEN ACCESS SOCIETY 10075 85-91 2016年査読有り

We propose rarity-oriented retrieval methods for serendipity using two approaches. We define rare information as relevant and atypical information. We propose two approaches. In the first approach, we use social bookmark data. We introduce tag estimation to our previous work. The second approach is based on word co-occurrence in a dataset. In both approaches, we use conditional probabilities to express relevancy and atypicality. In experiments, we compared our methods with the relevance-oriented method, the diversity-oriented method, and another rarity-oriented method. Our methods using word co-occurrence obtained better nDCG scores than the other methods.
Estimating sentiment of tweets by learning social bookmark data

Yasuyuki Okamura, Takayuki Yumoto, Manabu Nii, Naotake Kamiura

Proceedings of the 14th International Conference WWW/Internet 2015 55-62 2015年1月査読有り

© 2015. People are posting huge amounts of varied information on the Web as the popularity of social media continues to increase. The sentiment of a tweet posted on Twitter can reveal valuable information on the reputation of various targets both on the Web and in the real world. We propose a method to classify tweet sentiments by machine learning. In most cases, machine learning requires a significant amount of manually labeled data. Our method is different in that we use social bookmark data as training data for classifying tweets with URLs. In social bookmarks, comments are written using casual expressions, similar to tweets. Since tags in social bookmarks partly represent sentiment, they can be used as supervisory signals for learning. The proposed method moves beyond the basic "positive"/"negative" classification to classify impressions as "useful", "funny", "negative", and "other".
University of Hyogo at NTCIR-11 TaskMine by Dependency Parsing.

Takayuki Yumoto

Proceedings of the 11th NTCIR Conference on Evaluation of Information Access Technologies, NTCIR-11, National Center of Sciences, Tokyo, Japan, December 9-12, 2014 2014年査読有り
Finding Rare Web Pages by Relevancy and Atypicality in a Category

Takayuki Yumoto, Ryohei Tada, Manabu Nii, Kunihiro Sato

2013 SECOND IIAI INTERNATIONAL CONFERENCE ON ADVANCED APPLIED INFORMATICS (IIAI-AAI 2013) 284-288 2013年査読有り

In this paper, we propose rarity of a Web page in a category given by a user to find useful information that a few people know. A rare Web page is a page that belongs to a given category and that is atypical in the category. We define a probability that the page is a rare Web page in the given category as a rarity score. The rarity score is a product of a relevancy score and an atypicality score. The relevancy is a probability that a Web page belongs to a category given by a user. The atypicality is a conditional probability that a page is atypical in the category when it belongs to the category. Both probabilities are calculated by using tags of social bookmark services and words in Web pages. We evaluated the proposed relevancy score by classifying whetherWeb pages belong to a certain category. We also evaluated the proposed rarity as a metric for ranking Web pages, and compared the rankings by relevancy and atypicality. We confirmed usefulness of the rarity score to find relevant and atypical pages.
Discovering Atypical Property Values for Object Searches on the Web

Tatsuya Fujisaka, Takayuki Yumoto, Kazutoshi Sumiya

WEB INFORMATION SYSTEMS AND MINING, PT II 6988 103-+ 2011年査読有り

Conventional search engines are able to extract commonplace information by incorporating users requests into their queries. Users perform niche requests when they want to obtain atypical objects or unique information. In these instances, it is difficult for users to expand their queries to match their niche requests. In this paper, we introduce a query suggestion method for finding objects that have atypical characteristics. Our method focuses on the property values of an object, and elicits atypical property values by using the relation between an object's name and a typical property value.
Discovering inconsistency in multimedia news based on a material-opinion model

Ling Xu, Takayuki Yumoto, Shinya Aoki, Qiang Ma, Masatoshi Yoshikawa

Proceedings of the Annual Hawaii International Conference on System Sciences 1-10 2011年査読有り

The advantages of the multimedia make the video news presented believable and impressed to the viewers when the personal opinions and ideological perspectives hidden in the contents still cause the effect. To reduce the risk of the misleading, based on a Material-Opinion model, we propose a method of detecting the inconsistent news items reporting the same event when the viewer is watching one of them. In the Material-Opinion Model, main participants filmed as the materials are presented to the viewer through the video stream, which is used to support the arguments put forward. Based on this model, given a series of multimedia news items reporting a same event, we explore inconsistency between any two of them by computing their dissimilarities of materials and of opinions. Material-dissimilarity is based on the appearance of the main participants in the video. Opinion-dissimilarity is calculated as the vector difference of two vectors consisting of the argument points extracted from the closed captions. If one of the dissimilarities is high and the other is low, we consider that there exists the inconsistency as a result. We also show some experimental results to validate the proposed methods.
Database Systems for Advanced Applications, 15th International Conference, DASFAA 2010, International Workshops: GDM, BenchmarX, MCIS, SNSMW, DIEW, UDM, Tsukuba, Japan, April 1-4, 2010, Revised Selected Papers

DASFAA Workshops 2010 6193 2010年査読有り
Object name search based on user's description in extraction and validation approach

Yuko Koba, Takayuki Yumoto, Manabu Nii, Yutaka Takahashi

2010 4th International Universal Communication Symposium, IUCS 2010 - Proceedings 350-354 2010年査読有り

People often want to know the names of the objects that they can explain but don't know the names. It is, however, difficult to find such object names using conventional Web search engines. So, we propose a new method for finding the object name from the descriptions given by a user. This method consists of two phases, the extraction phase and the validation phase. In the extraction phase, candidate words are extracted by conducting a Web search using a combination of the queries generated from the user's descriptions. In the validation phrase, each candidate word is validated through a Web search using the candidate word. We rank the candidate words based on the user's description. We evaluated our algorithm by performing several tasks to find the object names from questions in Q&amp A sites. We also compared it with the methods using queries consisting of all the words in the description and queries consisting of user-selected and user-generated words. The precision by our algorithm was higher than the precision by the other methods. ©2010 IEEE.
Evaluating credibility of web information

Katsumi Tanaka, Hiroaki Ohshima, Adam Jatowt, Satoshi Nakamura, Yusuke Yamamoto, Kazutoshi Sumiya, Ryong Lee, Daisuke Kitayama, Takayuki Yumoto, Yukiko Kawai, Jianwei Zhang, Shinsuke Nakajima, Yoichi Inagaki

Proceedings of the 4th International Conference on Ubiquitous Information Management and Communication ICUIMC 10 147-156 2010年査読有り

We describe a new concept and method for evaluating the Web information credibility. The quality control of information (text, image, video etc.) on the Web is generally insufficient due to low publishing barriers. As a result, there is a large amount of mistaken and unreliable information on the Web that can have detrimental effects on users. This calls for technology that facilitates the judging of the credibility (expertise and trustworthiness) of Web content and the accuracy of the information that users encounter on the Web. Such technology should be able to handle a wide range of tasks: extracting several credibility-related features from the target Web content, extracting reputation-related information for the target Web content, such as hyperlinks and social bookmarks and evaluating its distribution, and evaluating features of the target content authors. We propose and describe methodologies of analyzing information credibility of Web information: (1) content analysis, (2) social support analysis and (3) author analysis. We overview our recent research activities on Web information credibility evaluation based on this methodologies. © 2010 ACM.
Measuring Peculiarity of Text Using Relation between Words on the Web

Takeru Nakabayashi, Takayuki Yumoto, Manabu Nii, Yutaka Takahashi, Kazutoshi Sumiya

ROLE OF DIGITAL LIBRARIES IN A TIME OF GLOBAL CHANGE 6102 112-+ 2010年査読有り

We define the peculiarity of text as a metric of information credibility. Higher peculiarity means lower credibility. We extract the theme word and the characteristic words from text and check whether there is a subject-description relation between them. The peculiarity is defined using the ratio of the subject-description relation between a theme word and characteristic words. We evaluate the extent to which peculiarity can be used to judge by classifying text from Wikipedia and Uncyclopedia in terms of the peculiarity.
Measuring Attention Intensity to Web Pages Based on Specificity of Social Tags

Takayuki Yumoto, Kazutoshi Sumiya

DATABASE SYSTEMS FOR ADVANCED APPLICATIONS 6193 264-+ 2010年査読有り

Social bookmarks are used to find Web pages drawing much attention. However, tendency of pages to collect bookmarks is different by their topic. Therefore, the number of bookmarks can be used to know attention intensity to pages but it cannot be used as the metric of the intensity itself. We define the relative quantity of social bookmarks (RQS) for measuring the attention intensity to a Web page. The RQS is calculated using the number of social bookmarks of related pages. Related pages are found using similarity based on specificity of social tags. We define two types of specificity, local specificity, which is the specificity for a user, and global, which is the specificity common in a social bookmark service.
How-to information search by lightweight analysis of Web pages

Ryouji Nonaka, Takayuki Yumoto, Manabu Nii, Yutaka Takahashi

ACM International Conference Proceeding Series 350-354 2009年査読有り

We propose a method for searching for comprehensible how-to information on the Web. In our how-to information search, we use lightweight analysis of Web pages to extract how-to information from Web pages obtained by conventional Web search engines and rank them according to their easily-viewable-degree. In the extraction process, we focus on expressions in Web page text blocks that describe procedures. In the ranking process, we focus on images, the effect of letter string and the length of the how-to information. Copyright 2009 ACM.
Searching for comparison points between two objects from the Web

Shinya Aoki, Takayuki Yumoto, Manabu Nii, Yutaka Takahashi

ACM International Conference Proceeding Series 344-349 2009年査読有り

Recently, we have been able to often compare two objects using search engines. However, we often browse high ranked Web pages by search engines, which may give biased information. We propose a method for searching Web pages where two objects are compared using a search engine, extracting comparison points from those Web pages, and showing these points to users. Comparison points are keywords for comparing objects. The proposed method can be used to extract points for efficient comparison by using comparison expressions such as "Liquid Crystal TVs are better ..." and "... than Plasma TVs.", etc. Copyright 2009 ACM.
Converting Topics of User Query Sequences for Cooperative Web Search

Takayuki Yumoto, Yuta Mori, Kazutoshi Sumiya

SEVENTH INTERNATIONAL CONFERENCE ON CREATING, CONNECTING AND COLLABORATING THROUGH COMPUTING, PROCEEDINGS 121-+ 2009年査読有り

In this paper, we propose a method of converting a given sequence of search queries about a certain topic into a sequence of search queries about a given different topic. We define the concept of a search skeleton for topic conversion. A search skeleton represents relationships between keywords in a query. A given sequence of search queries is converted into a sequence of search skeletons, which art in turn converted into a sequence of search queries about the target topic. We evaluated our method of search query conversion and found that the precision for deciding types of subtopic keywords in search queries was 84.4%, the precision for finding relational keywords was 35.7%, and the precision for converting dynamic subtopic keywords was 40.0%.
Extracting and Clustering Related Keywords based on History of Query Frequency

Toru Onoda, Takayuki Yumoto, Kazutoshi Sumiya

PROCEEDINGS OF THE SECOND INTERNATIONAL SYMPOSIUM ON UNIVERSAL COMMUNICATION 162-+ 2008年査読有り

Query-recommendation systems based on inputted queries have become widespread. These services are effective if users cannot input relevant queries. However, the conventional systems do not take into consideration the relevance between recommended queries. This paper proposes a method of obtaining related queries and clustering them by using the history of query frequencies in query logs. We define similarity in queries based on the history of query frequency and use it for clustering queries. We selected various queries and extracted related queries and then clustered them. We found that our method was useful for clustering queries that were used in around the same term.
Comparative quality evaluation of TV contents based on Web analysis

Yutaka Kabutoya, Takayuki Yumoto, Satoshi Oyama, Keishi Tajima, Katsumi Tanaka

2007 IEEE INTERNATIONAL WORKSHOP ON DATABASES FOR NEXT GENERATION RESEARCHERS 43-+ 2007年査読有り

It is difficult to watch TV contents in an active manner such that the user can interactively select TV contents, because TV is originally a broadcast information media. It is also difficult for users to judge whether the information of TV contents is valid because conventional TV contents are not directly linked with related or evidence information. One of the methods to cope with these problems is to provide complementary or comparative information of TV contents obtained from other media such as Web etc. In our research, using the topic structure proposed by Ma et al., we evaluated quality of TV contents, and visualize the qualmity. In this paper we defined "contents coverage, " "generality, " and "social acceptance" as aspects of TV contents' quality, and examined to what extent there is complementary information against TV contents in Web pages. We also inplemented a new system to complement TV contents by Web pages, called "TV contents spectrum analyzer," which visualizes the degrees of generality and social acceptance of TV contents using WWW.
Quality estimation of local contents based on pagerank values of web pages

Yutaka Kabutoya, Takayuki Yumoto, Satoshi Oyama, Keishi Tajima, Katsumi Tanaka

ICDEW 2006 - Proceedings of the 22nd International Conference on Data Engineering Workshops 134 2006年査読有り

Recently, it is getting more frequent to search not Web contents but local contents, e.g., by Google Desktop Search. Google succeeded in the Web search because of its PageRank algorithm for the ranking of the search results. PageRank estimates the quality of Web pages based on their popularity, which in turn is estimated by the number and the quality of pages referring to them through hyperlinks. This algorithm, however, is not applicable when we search local contents without link structure, such as text data. In this research, we propose a method to estimate the quality of local contents without link structure by using the PageRank values of Web contents similar to them. Based on this estimation, we can rank the desktop search results. Furthermore, this method enables us to search contents across different resources such as Web contents and local contents. In this paper, we applied this method to Web contents, calculated the scores that estimate their quality, and we compare them with their page quality scores by PageRank.
Page sets as web search answers

Takayuki Yumoto, Katsumi Tanaka

DIGITAL LIBRARIES: ACHIEVEMENTS, CHALLENGES AND OPPORTUNITIES, PROCEEDINGS 4312 244-+ 2006年査読有り

Conventional Web search engines rank their searched results page by page. That is, conventionally, the information unit for both searching and ranking is a single Web page. There are, however, cases where a set of searched pages shows a better similarity (relevance) to a given (keyword) query than each individually searched page. This is because the information a user wishes to have is sometimes distributed on multiple Web pages. In such cases, the information unit used for ranking should be a set of pages rather than a single page. In this paper, we propose the notion of a "page set ranking", which is to rank each pertinent set of searched Web pages. We describe our new algorithm of the page set ranking to efficiently construct and rank page sets. We present some experimental results and the effectiveness of our approach.
Finding pertinent page-pairs from web search results

T Yumoto, K Tanaka

DIGITAL LIBRARIES: IMPLEMENTING STRATEGIES AND SHARING EXPERIENCES, PROCEEDINGS 3815 301-310 2005年査読有り

Conventional Web search engines evaluate each single page as a ranking unit. When the information a user wishes to have is distributed on multiple Web pages, it is difficult to find pertinent search results with these conventional engines. Furthermore, search result lists are hard to check and they do not tell us anything about the relationships between the searched Web pages. We often have to collect Web pages that reflect different viewpoints. Here, a collection of pages may be more pertinent as a search result item than a single Web page.. In this paper, we propose the idea to realize the notion of "multiple viewpoint retrieval" in Web searches. Multiple viewpoint retrieval means searching Web pages that have been described from different viewpoints for a specific topic, gathering multiple collections of Web pages, ranking each collection as a search result and returning them as results. In this paper, we consider the case of page-pairs, We describe a feature-vector based approach to finding pertinent page-pairs. We also analyze the characteristics of page-pairs.
コンテンツ統合言語としてのXQueryの再考

湯本高行, 田中克己

日本データベース学会letters 3(2) 17-20 2004年9月査読有り
A dynamic content integration language for video data and web content

T Yumoto, Q Ma, K Sumiya, K Tanaka

FOURTH INTERNATIONAL CONFERENCE ON WEB INFORMATION SYSTEMS ENGINEERING, PROCEEDINGS 83-92 2003年査読有り

Dynamic content integration of multiple information sources is one way of providing richer content that will satisfy the diverse demands of users. In this paper, we propose an XML-based language to compose synchronized content from web and video content. The notable features of this language are as follows: (1) dynamic unit identification of content that is composed into synchronized content and (2) dynamic retrieval of content through pre-defined retrieval criteria. This dynamic identification and retrieval of composable units are based on the author's intentions. Content authors can specify the units of their content that are to be integrated into new content by describing the conditions concerning this content and the conditions concerning the surrounding content. Although the proposed language looks like SMIL (Synchronized Multimedia Integration Language), it differs in its dynamic identification and retrieval capabilities. Indeed, the proposed language works just like the meta-mechanism for conventional SMIL. That is, the script written by the proposed language can generate SMIL data as its output.

MISC

LLMにおけるAttention機構の役割の分析

大塚空来, 湯本高行

第16回データ工学と情報マネジメントに関するフォーラム 2024年3月最終著者
語の共起行列に対するNMFを用いたトピック別企業検索

辻田隆善, 湯本高行

電子情報通信学会技術研究報告 123(192) 48-53 2023年9月最終著者
クラスタの代表点を用いたFew-Shot学習によるコメントのトピック推定

藤原祐也, 湯本高行

電子情報通信学会技術研究報告 123(192) 42-47 2023年9月最終著者
QAデータから構築した共起グラフを用いた関連する症状の組み合わせの発見

本白水健輔, 湯本高行

第15回データ工学と情報マネジメントに関するフォーラム 2023年3月最終著者
疑似訓練データを用いたニュース記事間の続報判定

松本直彰, 湯本高行, 山本岳洋, 大島裕明

第14回データ工学と情報マネジメントに関するフォーラム 2022年3月