Takayuki Yumoto

(湯本高行)

Profile Information

Affiliation: University of Hyogo

Degree: 博士(情報学)(京都大学)

J-GLOBAL ID: 200901000308952299
researchmap Member ID: 5000091303

External link: https://sites.google.com/view/yumotolab/

Research Interests

Research Areas

Informatics / Web and service informatics / ウェブマイニング

Research History

Apr, 2020 - Present

Associate Professor, School of Social Information Science, University of Hyogo
Apr, 2007 - Mar, 2020

Graduate School of Engineering, University of Hyogo

Education

- 2007

Kyoto University

Awards

Mar, 2023

若手功績賞, 日本データベース学会

湯本高行

Papers

Estimating the Degree of Injury from Accident Descriptions

KAWAHARA Takafumi, HASHIGUCHI Tomoya, YUMOTO Takayuki, OHSHIMA Hiroaki

J105-D(5) 322-336, May 1, 2022 Peer-reviewed

In this research, we propose a method for estimating the degree of injury from text documents that describe accidents. It is assumed that a text document to be input consists of a few sentences. The proposed method is to estimate the degree of injury by solving a classification problem using machine learning techniques. The data used in this research is the accident data published in the Accident Information Data Bank System. The text in the “Summary of the accident” field is used as an input. In the proposed method, an input text is represented as a distributed representation using the generic language model called BERT. As a model for BERT, we use a pre-trained model trained using the Japanese Wikipedia. To improve the performance of the task of estimating the degree of injury, we introduce the following four ideas; (1) the class weights, (2) the ordinal classification, (3) the multitasking learning, and (4) the fine-tuning model with token label estimation. We examined the effects of using and not using these ideas on the accuracy, Macro F1, RMSE, and confusion matrices for the task of estimating the degree of injury. The results showed that Macro F1 and RMSE are improved when (1) the class weights and (2) the ordinal classification are introduced. In addition, the accuracy is improved when (3) the multitasking learning is introduced.
Measuring Term Relevancy Based on Actual and Predicted Co-occurrence.

Yuya Koyama, Takayuki Yumoto, Teijiro Isokawa, Naotake Kamiura

Proceedings of the 13th International Conference on Ubiquitous Information Management and Communication, IMCOM 2019, Phuket, Thailand, January 4-6, 2019, 996-1005, 2019 Peer-reviewed
含意関係に基づく二部グラフを用いた情報の断片のランキング

飯塚翔, 湯本高行, 新居学, 上浦尚武

電子情報通信学会論文誌 D(Web), J101-D(4) 681‐689 (WEB ONLY), Apr 1, 2018
Finding Rare Information from the Web Using Social Bookmarks and Word Co-occurrence

Takayuki YUMOTO, Takahiro YAMANAKA, Manabu NII, Naotake KAMIURA

International Journal of Biomedical Soft Computing and Human Sciences, 22(1) 9-18, 2017 Peer-reviewed
UHYG at the NTCIR-12 MobileClick Task: Link-based Ranking on iUnit-Page Bipartite Graph.

Sho Iizuka, Takayuki Yumoto, Manabu Nii, Naotake Kamiura

Proceedings of the 12th NTCIR Conference on Evaluation of Information Access Technologies, National Center of Sciences, Tokyo, Japan, June 7-10, 2016, 2016 Peer-reviewed
Rarity-Oriented Information Retrieval: Social Bookmarking vs. Word Co-occurrence

Takayuki Yumoto, Takahiro Yamanaka, Manabu Nii, Naotake Kamiura

DIGITAL LIBRARIES: KNOWLEDGE, INFORMATION, AND DATA IN AN OPEN ACCESS SOCIETY, 10075 85-91, 2016 Peer-reviewed

We propose rarity-oriented retrieval methods for serendipity using two approaches. We define rare information as relevant and atypical information. We propose two approaches. In the first approach, we use social bookmark data. We introduce tag estimation to our previous work. The second approach is based on word co-occurrence in a dataset. In both approaches, we use conditional probabilities to express relevancy and atypicality. In experiments, we compared our methods with the relevance-oriented method, the diversity-oriented method, and another rarity-oriented method. Our methods using word co-occurrence obtained better nDCG scores than the other methods.
Estimating sentiment of tweets by learning social bookmark data

Yasuyuki Okamura, Takayuki Yumoto, Manabu Nii, Naotake Kamiura

Proceedings of the 14th International Conference WWW/Internet 2015, 55-62, Jan, 2015 Peer-reviewed

© 2015. People are posting huge amounts of varied information on the Web as the popularity of social media continues to increase. The sentiment of a tweet posted on Twitter can reveal valuable information on the reputation of various targets both on the Web and in the real world. We propose a method to classify tweet sentiments by machine learning. In most cases, machine learning requires a significant amount of manually labeled data. Our method is different in that we use social bookmark data as training data for classifying tweets with URLs. In social bookmarks, comments are written using casual expressions, similar to tweets. Since tags in social bookmarks partly represent sentiment, they can be used as supervisory signals for learning. The proposed method moves beyond the basic "positive"/"negative" classification to classify impressions as "useful", "funny", "negative", and "other".
University of Hyogo at NTCIR-11 TaskMine by Dependency Parsing.

Takayuki Yumoto

Proceedings of the 11th NTCIR Conference on Evaluation of Information Access Technologies, NTCIR-11, National Center of Sciences, Tokyo, Japan, December 9-12, 2014, 2014 Peer-reviewed
Finding Rare Web Pages by Relevancy and Atypicality in a Category

Takayuki Yumoto, Ryohei Tada, Manabu Nii, Kunihiro Sato

2013 SECOND IIAI INTERNATIONAL CONFERENCE ON ADVANCED APPLIED INFORMATICS (IIAI-AAI 2013), 284-288, 2013 Peer-reviewed

In this paper, we propose rarity of a Web page in a category given by a user to find useful information that a few people know. A rare Web page is a page that belongs to a given category and that is atypical in the category. We define a probability that the page is a rare Web page in the given category as a rarity score. The rarity score is a product of a relevancy score and an atypicality score. The relevancy is a probability that a Web page belongs to a category given by a user. The atypicality is a conditional probability that a page is atypical in the category when it belongs to the category. Both probabilities are calculated by using tags of social bookmark services and words in Web pages. We evaluated the proposed relevancy score by classifying whetherWeb pages belong to a certain category. We also evaluated the proposed rarity as a metric for ranking Web pages, and compared the rankings by relevancy and atypicality. We confirmed usefulness of the rarity score to find relevant and atypical pages.
Discovering Atypical Property Values for Object Searches on the Web

Tatsuya Fujisaka, Takayuki Yumoto, Kazutoshi Sumiya

WEB INFORMATION SYSTEMS AND MINING, PT II, 6988 103-+, 2011 Peer-reviewed

Conventional search engines are able to extract commonplace information by incorporating users requests into their queries. Users perform niche requests when they want to obtain atypical objects or unique information. In these instances, it is difficult for users to expand their queries to match their niche requests. In this paper, we introduce a query suggestion method for finding objects that have atypical characteristics. Our method focuses on the property values of an object, and elicits atypical property values by using the relation between an object's name and a typical property value.
Discovering inconsistency in multimedia news based on a material-opinion model

Ling Xu, Takayuki Yumoto, Shinya Aoki, Qiang Ma, Masatoshi Yoshikawa

Proceedings of the Annual Hawaii International Conference on System Sciences, 1-10, 2011 Peer-reviewed

The advantages of the multimedia make the video news presented believable and impressed to the viewers when the personal opinions and ideological perspectives hidden in the contents still cause the effect. To reduce the risk of the misleading, based on a Material-Opinion model, we propose a method of detecting the inconsistent news items reporting the same event when the viewer is watching one of them. In the Material-Opinion Model, main participants filmed as the materials are presented to the viewer through the video stream, which is used to support the arguments put forward. Based on this model, given a series of multimedia news items reporting a same event, we explore inconsistency between any two of them by computing their dissimilarities of materials and of opinions. Material-dissimilarity is based on the appearance of the main participants in the video. Opinion-dissimilarity is calculated as the vector difference of two vectors consisting of the argument points extracted from the closed captions. If one of the dissimilarities is high and the other is low, we consider that there exists the inconsistency as a result. We also show some experimental results to validate the proposed methods.
Database Systems for Advanced Applications, 15th International Conference, DASFAA 2010, International Workshops: GDM, BenchmarX, MCIS, SNSMW, DIEW, UDM, Tsukuba, Japan, April 1-4, 2010, Revised Selected Papers

DASFAA Workshops 2010, 6193, 2010 Peer-reviewed
Object name search based on user's description in extraction and validation approach

Yuko Koba, Takayuki Yumoto, Manabu Nii, Yutaka Takahashi

2010 4th International Universal Communication Symposium, IUCS 2010 - Proceedings, 350-354, 2010 Peer-reviewed

People often want to know the names of the objects that they can explain but don't know the names. It is, however, difficult to find such object names using conventional Web search engines. So, we propose a new method for finding the object name from the descriptions given by a user. This method consists of two phases, the extraction phase and the validation phase. In the extraction phase, candidate words are extracted by conducting a Web search using a combination of the queries generated from the user's descriptions. In the validation phrase, each candidate word is validated through a Web search using the candidate word. We rank the candidate words based on the user's description. We evaluated our algorithm by performing several tasks to find the object names from questions in Q&amp A sites. We also compared it with the methods using queries consisting of all the words in the description and queries consisting of user-selected and user-generated words. The precision by our algorithm was higher than the precision by the other methods. ©2010 IEEE.
Evaluating credibility of web information

Katsumi Tanaka, Hiroaki Ohshima, Adam Jatowt, Satoshi Nakamura, Yusuke Yamamoto, Kazutoshi Sumiya, Ryong Lee, Daisuke Kitayama, Takayuki Yumoto, Yukiko Kawai, Jianwei Zhang, Shinsuke Nakajima, Yoichi Inagaki

Proceedings of the 4th International Conference on Ubiquitous Information Management and Communication ICUIMC 10, 147-156, 2010 Peer-reviewed

We describe a new concept and method for evaluating the Web information credibility. The quality control of information (text, image, video etc.) on the Web is generally insufficient due to low publishing barriers. As a result, there is a large amount of mistaken and unreliable information on the Web that can have detrimental effects on users. This calls for technology that facilitates the judging of the credibility (expertise and trustworthiness) of Web content and the accuracy of the information that users encounter on the Web. Such technology should be able to handle a wide range of tasks: extracting several credibility-related features from the target Web content, extracting reputation-related information for the target Web content, such as hyperlinks and social bookmarks and evaluating its distribution, and evaluating features of the target content authors. We propose and describe methodologies of analyzing information credibility of Web information: (1) content analysis, (2) social support analysis and (3) author analysis. We overview our recent research activities on Web information credibility evaluation based on this methodologies. © 2010 ACM.
Measuring Peculiarity of Text Using Relation between Words on the Web

Takeru Nakabayashi, Takayuki Yumoto, Manabu Nii, Yutaka Takahashi, Kazutoshi Sumiya

ROLE OF DIGITAL LIBRARIES IN A TIME OF GLOBAL CHANGE, 6102 112-+, 2010 Peer-reviewed

We define the peculiarity of text as a metric of information credibility. Higher peculiarity means lower credibility. We extract the theme word and the characteristic words from text and check whether there is a subject-description relation between them. The peculiarity is defined using the ratio of the subject-description relation between a theme word and characteristic words. We evaluate the extent to which peculiarity can be used to judge by classifying text from Wikipedia and Uncyclopedia in terms of the peculiarity.
Measuring Attention Intensity to Web Pages Based on Specificity of Social Tags

Takayuki Yumoto, Kazutoshi Sumiya

DATABASE SYSTEMS FOR ADVANCED APPLICATIONS, 6193 264-+, 2010 Peer-reviewed

Social bookmarks are used to find Web pages drawing much attention. However, tendency of pages to collect bookmarks is different by their topic. Therefore, the number of bookmarks can be used to know attention intensity to pages but it cannot be used as the metric of the intensity itself. We define the relative quantity of social bookmarks (RQS) for measuring the attention intensity to a Web page. The RQS is calculated using the number of social bookmarks of related pages. Related pages are found using similarity based on specificity of social tags. We define two types of specificity, local specificity, which is the specificity for a user, and global, which is the specificity common in a social bookmark service.
How-to information search by lightweight analysis of Web pages

Ryouji Nonaka, Takayuki Yumoto, Manabu Nii, Yutaka Takahashi

ACM International Conference Proceeding Series, 350-354, 2009 Peer-reviewed

We propose a method for searching for comprehensible how-to information on the Web. In our how-to information search, we use lightweight analysis of Web pages to extract how-to information from Web pages obtained by conventional Web search engines and rank them according to their easily-viewable-degree. In the extraction process, we focus on expressions in Web page text blocks that describe procedures. In the ranking process, we focus on images, the effect of letter string and the length of the how-to information. Copyright 2009 ACM.
Searching for comparison points between two objects from the Web

Shinya Aoki, Takayuki Yumoto, Manabu Nii, Yutaka Takahashi

ACM International Conference Proceeding Series, 344-349, 2009 Peer-reviewed

Recently, we have been able to often compare two objects using search engines. However, we often browse high ranked Web pages by search engines, which may give biased information. We propose a method for searching Web pages where two objects are compared using a search engine, extracting comparison points from those Web pages, and showing these points to users. Comparison points are keywords for comparing objects. The proposed method can be used to extract points for efficient comparison by using comparison expressions such as "Liquid Crystal TVs are better ..." and "... than Plasma TVs.", etc. Copyright 2009 ACM.
Converting Topics of User Query Sequences for Cooperative Web Search

Takayuki Yumoto, Yuta Mori, Kazutoshi Sumiya

SEVENTH INTERNATIONAL CONFERENCE ON CREATING, CONNECTING AND COLLABORATING THROUGH COMPUTING, PROCEEDINGS, 121-+, 2009 Peer-reviewed

In this paper, we propose a method of converting a given sequence of search queries about a certain topic into a sequence of search queries about a given different topic. We define the concept of a search skeleton for topic conversion. A search skeleton represents relationships between keywords in a query. A given sequence of search queries is converted into a sequence of search skeletons, which art in turn converted into a sequence of search queries about the target topic. We evaluated our method of search query conversion and found that the precision for deciding types of subtopic keywords in search queries was 84.4%, the precision for finding relational keywords was 35.7%, and the precision for converting dynamic subtopic keywords was 40.0%.
Extracting and Clustering Related Keywords based on History of Query Frequency

Toru Onoda, Takayuki Yumoto, Kazutoshi Sumiya

PROCEEDINGS OF THE SECOND INTERNATIONAL SYMPOSIUM ON UNIVERSAL COMMUNICATION, 162-+, 2008 Peer-reviewed

Query-recommendation systems based on inputted queries have become widespread. These services are effective if users cannot input relevant queries. However, the conventional systems do not take into consideration the relevance between recommended queries. This paper proposes a method of obtaining related queries and clustering them by using the history of query frequencies in query logs. We define similarity in queries based on the history of query frequency and use it for clustering queries. We selected various queries and extracted related queries and then clustered them. We found that our method was useful for clustering queries that were used in around the same term.
Comparative quality evaluation of TV contents based on Web analysis

Yutaka Kabutoya, Takayuki Yumoto, Satoshi Oyama, Keishi Tajima, Katsumi Tanaka

2007 IEEE INTERNATIONAL WORKSHOP ON DATABASES FOR NEXT GENERATION RESEARCHERS, 43-+, 2007 Peer-reviewed

It is difficult to watch TV contents in an active manner such that the user can interactively select TV contents, because TV is originally a broadcast information media. It is also difficult for users to judge whether the information of TV contents is valid because conventional TV contents are not directly linked with related or evidence information. One of the methods to cope with these problems is to provide complementary or comparative information of TV contents obtained from other media such as Web etc. In our research, using the topic structure proposed by Ma et al., we evaluated quality of TV contents, and visualize the qualmity. In this paper we defined "contents coverage, " "generality, " and "social acceptance" as aspects of TV contents' quality, and examined to what extent there is complementary information against TV contents in Web pages. We also inplemented a new system to complement TV contents by Web pages, called "TV contents spectrum analyzer," which visualizes the degrees of generality and social acceptance of TV contents using WWW.
Quality estimation of local contents based on pagerank values of web pages

Yutaka Kabutoya, Takayuki Yumoto, Satoshi Oyama, Keishi Tajima, Katsumi Tanaka

ICDEW 2006 - Proceedings of the 22nd International Conference on Data Engineering Workshops, 134, 2006 Peer-reviewed

Recently, it is getting more frequent to search not Web contents but local contents, e.g., by Google Desktop Search. Google succeeded in the Web search because of its PageRank algorithm for the ranking of the search results. PageRank estimates the quality of Web pages based on their popularity, which in turn is estimated by the number and the quality of pages referring to them through hyperlinks. This algorithm, however, is not applicable when we search local contents without link structure, such as text data. In this research, we propose a method to estimate the quality of local contents without link structure by using the PageRank values of Web contents similar to them. Based on this estimation, we can rank the desktop search results. Furthermore, this method enables us to search contents across different resources such as Web contents and local contents. In this paper, we applied this method to Web contents, calculated the scores that estimate their quality, and we compare them with their page quality scores by PageRank.
Page sets as web search answers

Takayuki Yumoto, Katsumi Tanaka

DIGITAL LIBRARIES: ACHIEVEMENTS, CHALLENGES AND OPPORTUNITIES, PROCEEDINGS, 4312 244-+, 2006 Peer-reviewed

Conventional Web search engines rank their searched results page by page. That is, conventionally, the information unit for both searching and ranking is a single Web page. There are, however, cases where a set of searched pages shows a better similarity (relevance) to a given (keyword) query than each individually searched page. This is because the information a user wishes to have is sometimes distributed on multiple Web pages. In such cases, the information unit used for ranking should be a set of pages rather than a single page. In this paper, we propose the notion of a "page set ranking", which is to rank each pertinent set of searched Web pages. We describe our new algorithm of the page set ranking to efficiently construct and rank page sets. We present some experimental results and the effectiveness of our approach.
Finding pertinent page-pairs from web search results

T Yumoto, K Tanaka

DIGITAL LIBRARIES: IMPLEMENTING STRATEGIES AND SHARING EXPERIENCES, PROCEEDINGS, 3815 301-310, 2005 Peer-reviewed

Conventional Web search engines evaluate each single page as a ranking unit. When the information a user wishes to have is distributed on multiple Web pages, it is difficult to find pertinent search results with these conventional engines. Furthermore, search result lists are hard to check and they do not tell us anything about the relationships between the searched Web pages. We often have to collect Web pages that reflect different viewpoints. Here, a collection of pages may be more pertinent as a search result item than a single Web page.. In this paper, we propose the idea to realize the notion of "multiple viewpoint retrieval" in Web searches. Multiple viewpoint retrieval means searching Web pages that have been described from different viewpoints for a specific topic, gathering multiple collections of Web pages, ranking each collection as a search result and returning them as results. In this paper, we consider the case of page-pairs, We describe a feature-vector based approach to finding pertinent page-pairs. We also analyze the characteristics of page-pairs.
コンテンツ統合言語としてのXQueryの再考

湯本高行, 田中克己

日本データベース学会letters, 3(2) 17-20, Sep, 2004 Peer-reviewed
A dynamic content integration language for video data and web content

T Yumoto, Q Ma, K Sumiya, K Tanaka

FOURTH INTERNATIONAL CONFERENCE ON WEB INFORMATION SYSTEMS ENGINEERING, PROCEEDINGS, 83-92, 2003 Peer-reviewed

Dynamic content integration of multiple information sources is one way of providing richer content that will satisfy the diverse demands of users. In this paper, we propose an XML-based language to compose synchronized content from web and video content. The notable features of this language are as follows: (1) dynamic unit identification of content that is composed into synchronized content and (2) dynamic retrieval of content through pre-defined retrieval criteria. This dynamic identification and retrieval of composable units are based on the author's intentions. Content authors can specify the units of their content that are to be integrated into new content by describing the conditions concerning this content and the conditions concerning the surrounding content. Although the proposed language looks like SMIL (Synchronized Multimedia Integration Language), it differs in its dynamic identification and retrieval capabilities. Indeed, the proposed language works just like the meta-mechanism for conventional SMIL. That is, the script written by the proposed language can generate SMIL data as its output.

Misc.

LLMにおけるAttention機構の役割の分析

大塚空来, 湯本高行

第16回データ工学と情報マネジメントに関するフォーラム, Mar, 2024 Last author
語の共起行列に対するNMFを用いたトピック別企業検索

辻田隆善, 湯本高行

電子情報通信学会技術研究報告, 123(192) 48-53, Sep, 2023 Last author
クラスタの代表点を用いたFew-Shot学習によるコメントのトピック推定

藤原祐也, 湯本高行

電子情報通信学会技術研究報告, 123(192) 42-47, Sep, 2023 Last author
QAデータから構築した共起グラフを用いた関連する症状の組み合わせの発見

本白水健輔, 湯本高行

第15回データ工学と情報マネジメントに関するフォーラム, Mar, 2023 Last author
疑似訓練データを用いたニュース記事間の続報判定

松本直彰, 湯本高行, 山本岳洋, 大島裕明

第14回データ工学と情報マネジメントに関するフォーラム, Mar, 2022

Books and Other Publications

Webデータ分析

笹嶋宗彦, 大島裕明, 山本岳洋, 湯本高行 (Role: Contributor)

朝倉書店, Sep, 2023 (ISBN: 9784254129151)

Research Projects

気づきと提案で情報の精査を促す情報閲覧支援方式の研究

科学研究費助成事業, 日本学術振興会, Apr, 2024 - Mar, 2027

湯本高行
Entity-oriented Investment Big Data Analysis Foundation for Evidence-Based Investment Support

Grants-in-Aid for Scientific Research, Japan Society for the Promotion of Science, Apr, 2019 - Mar, 2023
Webページの大域的・局所的特徴の可視化による情報信頼性の判断支援方式の研究

科学研究費基盤研究(C), Apr, 2017 - Mar, 2020

湯本高行
情報の詳細関係に基づくＷｅｂページの組織化

科学研究費若手研究(B), Apr, 2012 - Mar, 2016

湯本高行
Search and Ranking of How-to Information Pages on the Basis of Understandability

Grants-in-Aid for Scientific Research, Japan Society for the Promotion of Science, Apr, 2010 - Mar, 2012

YUMOTO Takayuki

To the list screen