研究者業績

大島 裕明

オオシマ ヒロアキ  (Hiroaki Ohshima)

基本情報

所属
兵庫県立大学 大学院情報科学研究科 准教授
学位
博士(情報学)(京都大学)

研究者番号
90452317
J-GLOBAL ID
201401077923568388
researchmap会員ID
7000008756

論文

 135
  • Takehiro Yamamoto, Yiqun Liu, Min Zhang, Zhicheng Dou, Ke Zhou, Ilya Markov, Makoto P. Kato, Hiroaki Ohshima, Sumio Fujita
    Proceedings of the 12th NTCIR Conference on Evaluation of Information Access Technologies (NTCIR-12) 2016年6月  
  • Meng Zhao, Hiroaki Ohshima, Katsumi Tanaka
    IPSJ Transactions on Databases 9(2) 1-11 2016年6月  査読有り
    Although long queries are still a small part of the queries submitted to Web search engines, their usage tends to gradually increase. However, the effectiveness of the retrieval decreases with the increase of query length. Long queries are very likely to have few Web pages returned. We target at sentential queries, a type of long queries, and propose a method called sentential query paraphrasing for improving their retrieval performance, especially on recall. We are motivated by the assumption that a sentence is an indivisible whole, which means that removing terms or phrases from a sentence would lead to the missing of some information or query drift. In this paper, we paraphrase sentential queries to avoid missing information and consequently ensure the completeness of the information. Take the sentential query "apples pop a powerful pectin punch," for example. Its meaning will be changed if one or more terms are removed, and few Web pages are returned by conventional search engines. In contrast, querying by its paraphrases, such as "apples contain a lot of pectin" or "apples are rich in pectin," can retrieve more Web pages. The experimental results show that our method can acquire more paraphrases from the noisy Web. Besides, with the help of paraphrases, more Web pages can be retrieved, especially for those sentential queries that could not find any answers with its original expression.------------------------------This is a preprint of an article intended for publication Journal ofInformation Processing(JIP). This preprint should not be cited. Thisarticle should be cited as: Journal of Information Processing Vol.24(2016) No.4(online)------------------------------Although long queries are still a small part of the queries submitted to Web search engines, their usage tends to gradually increase. However, the effectiveness of the retrieval decreases with the increase of query length. Long queries are very likely to have few Web pages returned. We target at sentential queries, a type of long queries, and propose a method called sentential query paraphrasing for improving their retrieval performance, especially on recall. We are motivated by the assumption that a sentence is an indivisible whole, which means that removing terms or phrases from a sentence would lead to the missing of some information or query drift. In this paper, we paraphrase sentential queries to avoid missing information and consequently ensure the completeness of the information. Take the sentential query "apples pop a powerful pectin punch," for example. Its meaning will be changed if one or more terms are removed, and few Web pages are returned by conventional search engines. In contrast, querying by its paraphrases, such as "apples contain a lot of pectin" or "apples are rich in pectin," can retrieve more Web pages. The experimental results show that our method can acquire more paraphrases from the noisy Web. Besides, with the help of paraphrases, more Web pages can be retrieved, especially for those sentential queries that could not find any answers with its original expression.------------------------------This is a preprint of an article intended for publication Journal ofInformation Processing(JIP). This preprint should not be cited. Thisarticle should be cited as: Journal of Information Processing Vol.24(2016) No.4(online)------------------------------
  • Meng Zhao, Hiroaki Ohshima, Katsumi Tanaka
    DIGITAL LIBRARIES: KNOWLEDGE, INFORMATION, AND DATA IN AN OPEN ACCESS SOCIETY 10075 110-123 2016年  査読有り
    Traditional search technologies are based on similarity relationship such that they return content similar documents in accordance with a given one. However, such similarity-based search does not always result in good results, e.g., similar documents will bring little additional information so that it is difficult to increase information gain. In this paper, we propose a method to find similar but different documents of a user-given one by distinguishing coordinate relationship from similarity relationship between documents. Simply, a similar but different document denotes the document with the same topic as that of the given document, but describing different events or concepts. For example, given as the input a news article stating the occurrence of the Oregon school shooting, articles stating the occurrence of other school shooting events, such as the Virginia Tech shooting, are detected and returned to users. Experiments conducted on the New York Times Annotated Corpus verify the effectiveness of our method and illustrate the importance of incorporating coordinate relationship to find similar but different documents.
  • Meng Zhao, Hiroaki Ohshima, Katsumi Tanaka
    Journal of Information Processing 24(4) 721-731 2016年  査読有り
    Although long queries are still a small part of the queries submitted to Web search engines, their usage tends to gradually increase. However, the effectiveness of the retrieval decreases with the increase of query length. Long queries are very likely to have few Web pages returned. We target at sentential queries, a type of long queries, and propose a method called sentential query paraphrasing for improving their retrieval performance, especially on recall. We are motivated by the assumption that a sentence is an indivisible whole, which means that removing terms or phrases from a sentence would lead to the missing of some information or query drift. In this paper, we paraphrase sentential queries to avoid missing information and consequently ensure the completeness of the information. Take the sentential query “apples pop a powerful pectin punch,” for example. Its meaning will be changed if one or more terms are removed, and fewWeb pages are returned by conventional search engines. In contrast, querying by its paraphrases, such as “apples contain a lot of pectin” or “apples are rich in pectin,” can retrieve more Web pages. The experimental results show that our method can acquire more paraphrases from the noisyWeb. Besides, with the help of paraphrases, moreWeb pages can be retrieved, especially for those sentential queries that could not find any answers with its original expression.
  • Yusuke Takeda, Hiroaki Ohshima, Katsumi Tanaka
    17th International Conference on Information Integration and Web-Based Applications and Services, iiWAS 2015 - Proceedings (68) 1-10 2015年12月11日  査読有り
    A recent study on information refinding reported that 44% of Web page visits and 33% of Web queries involved revisiting previously browsed pages. We propose methods for finding previously browsed pages regarded as coordinate pages of currently browsed pages. Intuitively, the notion of coordinate pages means that both of them belong to an identical class. To find the coordinate pages for given pages, we use a user's browsing and search behavior, such as her query log and tab usage, as well as link navigation. Our page revisiting methods were implemented within a Web browser, so that users can find those previously browsed pages while browsing and searching. We conducted experiments in which our methods out-performed conventional baseline methods in terms of page revisiting.
  • Meng Zhao, Hiroaki Ohshima, Katsumi Tanaka
    17th International Conference on Information Integration and Web-Based Applications and Services, iiWAS 2015 - Proceedings (67) 1-10 2015年12月11日  査読有り
    The effectiveness of retrieval decreases with the increase in query length. We target at sentential queries and pro- pose a method for improving their retrieval performance, called query rewriting. Briey, given a sentential query, our method acquires paraphrases from the noisy Web and uses them to avoid returning no answers. In particular, since a relation can be represented either intensionally (referred to as paraphrase templates) or extensionally (referred to as co- ordinate tuples), the mutual reinforcement between them are taken into account. The experimental results show that for declarative sentences, the average precision of our method is 68:1%, compared to 44:2% of the baseline. Besides, the rela- tive recall of our method is 95:9%, nearly 3 times compared to that of the baseline. While for questions, the average precision of our method is 46:9%, compared to 39:9% of the baseline. We also show the effectiveness of query rewriting in two applications.
  • Meng Zhao, Hiroaki Ohshima, Katsumi Tanaka
    DATABASE SYSTEMS FOR ADVANCED APPLICATIONS, DASFAA 2015 9052 135-151 2015年  査読有り
    We propose a method to acquire paraphrases from the Web in accordance with a given sentence. For example, consider an input sentence "Lemon is a high vitamin c fruit". Its paraphrases are expressions or sentences that convey the same meaning but are different syntactically, such as "Lemons are rich in vitamin c", or "Lemons contain a lot of vitamin c". We aim at finding sentence-level paraphrases from the noisy Web, instead of domain-specific corpora. By observing search results of paraphrases, users are able to estimate the likelihood of the sentence as a fact. We evaluate the proposed method on five distinct semantic relations. Experiments show our average precision is 60.5 %, compared to TE/ASE method with average precision of 44.15 %. Besides, we can acquire 3 paraphrases more than TE/ASE method per input.
  • Christian Nitschke, Yuki Minami, Masayuki Hiromoto, Hiroaki Ohshima, Takashi Sato
    International Conference on Control, Automation and Systems 678-685 2014年12月16日  査読有り
    Unmanned aerial vehicles (UAVs) have many applications and quickly gain popularity with the availability of low-cost micro aerial vehicles (MAVs). Robotics is a popular interdisciplinary education target as it involves understanding and collaboration of several disciplines. Thus, UAVs can serve as an ideal study platform. However, as robotics requires technical background, skills and initial efforts, it is commonly applied in long-term courses. In this paper we successfully exploit the opposite case of robotics in short-term education for students without background, in form of a one-day contest on automatic visual UAV navigation. We provide an extensive survey, and show that existing material and tools do not fit the task and lack in technical aspects. We introduce a novel open-source programming library that comprises programs to guide learning by experience and allow rapid development. It makes contributions to marker-based tracking, with a novel nested-marker design and accurate calibration parameters estimated from 14 Parrot AR.Drone 2.0 front cameras. We show a detailed discussion of the contest results, which represents an extensive user study regarding robotics in education and the effectiveness of the library. The achievement of a steep learning curve for a complex subject has important implications in interdisciplinary design education, as it allows deep understanding of potentials and limitations to facilitate decision-making, unconventional problem solutions and novel applications.
  • Takehiro Yamamoto, Makoto, P. Kato, Hiroaki Ohshima, Katsumi Tanaka
    Proceedings of the 11th NTCIR Conference on Evaluation of Information Access Technologies(NTCIR-11) 2014年6月  
  • Makoto P. Kato, Takehiro Yamamoto, Hiroaki Ohshima, Katsumi Tanaka
    WWW 2014 Companion - Proceedings of the 23rd International Conference on World Wide Web 313-314 2014年4月7日  査読有り
    This study investigated query formulations by users with Cognitive Search Intents (CSI), which are needs for the cognitive characteristics of documents to be retrieved, e.g. comprehensibility, subjectivity, and concreteness. We proposed an example-based method of specifying search intents to observe unbiased query formulations. Our user study revealed that about half our subjects did not input any keywords representing CSIs, even though they were conscious of given CSIs.
  • 佃 洸摂, 大島 裕明, 山本 光穂, 岩崎 弘利, 田中 克己
    情報処理学会論文誌:データベース 7(1) 1-17 2014年3月  査読有り
  • Makoto P. Kato, Takehiro Yamamoto, Hiroaki Ohshima, Katsumi Tanaka
    SIGIR 2014 - Proceedings of the 37th International ACM SIGIR Conference on Research and Development in Information Retrieval 577-586 2014年  査読有り
    This study investigated query formulations by users with Cognitive Search Intents (CSIs), which are users' needs for the cognitive characteristics of documents to be retrieved, e.g. comprehensibility, subjectivity, and concreteness. Our four main contributions are summarized as follows: (i) we proposed an example-based method of specifying search intents to observe query formulations by users without biasing them by presenting a verbalized task description (ii) we conducted a questionnaire-based user study and found that about half our subjects did not input any keywords representing CSIs, even though they were conscious of CSIs (iii) our user study also revealed that over 50% of subjects occasionally had experiences with searches with CSIs, while our evaluations demonstrated that the performance of a current Web search engine was much lower when we not only considered users' topical search intents but also CSIs and (iv) we demonstrated that a machine-learning-based query expansion could improve the performances for some types of CSIs. Our findings suggest users over-adapt to current Web search engines, and create opportunities to estimate CSIs with non-verbal user input. Copyright 2014 ACM.
  • Kosetsu Tsukuda, Hiroaki Ohshima, Katsumi Tanaka
    2014 IEEE/WIC/ACM INTERNATIONAL JOINT CONFERENCES ON WEB INTELLIGENCE (WI) AND INTELLIGENT AGENT TECHNOLOGIES (IAT), VOL 2 15-21 2014年  査読有り
    In this paper, methods for ranking coordinate terms and hypernyms of a given query according to their appropriateness are proposed. Although previous studies have proposed methods for discovering coordinate terms or hypernyms of a query, they focused on only discovering such terms and evaluating discovered terms based on a binary evaluation: appropriate or inappropriate. Unlike these studies, we rank coordinate terms and hypernyms of a query and evaluate the terms by considering their appropriateness. In the proposed method, a bipartite graph is created based on hypernyms of a query and hyponyms of each hypernym using a hypernym-hyponym dictionary. Subsequently, we apply a HITS-based algorithm to the bipartite graph and rank coordinate terms and hypernyms based on their appropriateness. The experimental results obtained using 50 queries demonstrate that our method could rank appropriate coordinate terms and hypernyms higher than other comparable methods.
  • Christian Nitschke, Yuki Minami, Masayuki Hiromoto, Hiroaki Ohshima, Takashi Sato
    2014 14TH INTERNATIONAL CONFERENCE ON CONTROL, AUTOMATION AND SYSTEMS (ICCAS 2014) 678-685 2014年  査読有り
    Unmanned aerial vehicles (UAVs) have many applications and quickly gain popularity with the availability of low-cost micro aerial vehicles (MAVs). Robotics is a popular interdisciplinary education target as it involves understanding and collaboration of several disciplines. Thus, UAVs can serve as an ideal study platform. However, as robotics requires technical background, skills and initial efforts, it is commonly applied in long-term courses. In this paper we successfully exploit the opposite case of robotics in short-term education for students without background, in form of a one-day contest on automatic visual UAV navigation. We provide an extensive survey, and show that existing material and tools do not fit the task and lack in technical aspects. We introduce a novel open-source programming library that comprises programs to guide learning by experience and allow rapid development. It makes contributions to marker-based tracking, with a novel nested-marker design and accurate calibration parameters estimated from 14 Parrot AR. Drone 2.0 front cameras. We show a detailed discussion of the contest results, which represents an extensive user study regarding robotics in education and the effectiveness of the library. The achievement of a steep learning curve for a complex subject has important implications in interdisciplinary design education, as it allows deep understanding of potentials and limitations to facilitate decision-making, unconventional problem solutions and novel applications.
  • Yoshinori Kitaguchi, Hiroaki Ohshima, Katsumi Tanaka
    EMERGENCE OF DIGITAL LIBRARIES - RESEARCH AND PRACTICES 8839 417-422 2014年  査読有り
    A method to formulate queries to search for concrete, practical, or detailed "actions" on the web is proposed. Sometimes, a user can only express a web search query as an abstract action. For example, if the user is a beginner golfer, they may use "to improve golf" as a web search query. The search results for this query are unlikely to contain many pages about concrete actions related to improving golf skills. To obtain more concrete information, more concrete actions must be used as web queries. The proposed method generates tuples of words such as (shanking, stop) and (distance, adjust), that consist of a noun and a verb. The proposed algorithm repeatedly searches for nouns from verbs and verbs from nouns in a bootstrapping manner. The proposed method verifies the usefulness of tuples. To reduce search costs, the proposed method also excludes useless tuples; i. e., tuples that cannot be used to obtain new useful tuples.
  • 佃 洸摂, 大島 裕明, 加藤 誠, 田中 克己
    情報処理学会論文誌:データベース 6(5) 49-61 2013年12月  査読有り
  • 佃 洸摂, 大島 裕明, 山本 光穂, 岩崎 弘利, 田中 克己
    日本データベース学会論文誌 11(3) 21-26 2013年2月  査読有り
  • Kosetsu Tsukuda, Hiroaki Ohshima, Mitsuo Yamamoto, Hirotoshi Iwasaki, Katsumi Tanaka
    Proceedings of the ACM Symposium on Applied Computing 878-885 2013年  査読有り
    Although many studies have addressed the problem of finding Web pages seeking relevant and popular information from a query, very few have focused on the discovery of unexpected information. This paper provides and evaluates methods for discovering unexpected information for a keyword query. For example, if the user inputs "Michael Jackson," our system first discovers the unexpected related term "karate" and then returns the unexpected information "Michael Jackson is good at karate." Discovering unexpected information is useful in many situations. For example, when a user is browsing a news article on the Web, unexpected information about a person associated with the article can pique the user's interest. If a user is sightseeing or driving, providing unexpected, additional information about a building or the region is also useful. Our approach collects terms related to a keyword query and evaluates the degree of unexpectedness of each related term for the query on the basis of (i) the relationships of coordinate terms of both the keyword query and related terms, and (ii) the degree of popularity of each related term. Experimental results show that considering these two factors are effective for discovering unexpected information. Copyright 2013 ACM.
  • 加藤 誠, 大島 裕明, 田中 克己
    日本データベース学会論文誌 11(1) 49-54 2012年6月  査読有り
  • Yoshinori Hara, Yutaka Yamauchi, Yoshinori Yamakawa, Junya Fujisawa, Hiroaki Ohshima, Katsumi Tanaka
    Annual SRII Global Conference, SRII 906-913 2012年  査読有り
    In high-quality Japanese services, providers are often said to sense what their customers want from subtle cues and deliver a customized service without explicitly advertising the effort. To understand this subtle service, often called "Omonpakari," we studied a high-end Sushi restaurant using multidisciplinary approach-using neuroscience to analyze the cognitive characteristic, ethnomethodology to analyze the interactive structure, and computer science to analyze the social evaluations. The study based on neuroscience showed that the service brain model could explain the cognition of "Omonpakari" service regardless of customers' gender, knowledge and the social context. The ethnomethodological analysis revealed that customers performed a role, complying with cultural norms and behaving like a culturally appropriate customer even if they might not be. The analysis using computer science techniques showed that expertise was the key factor of evaluation of the services. These findings suggest an alternative model of service in which there is a productive tension, or dialectic, between the provider and the customer. © 2012 IEEE.
  • Yu Kawano, Hiroaki Ohshima, Katsumi Tanaka
    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 7238(1) 382-396 2012年  査読有り
    We propose a method to generate facets dynamically to enhance the navigation of objects returned by a web-based search query. Facets denote axes for classifying a currently viewed object and related objects and are used as navigation signs to indicate their positions. Facets are generated by detecting hypernyms and coordinate terms of expressions that characterize objects. To be effectively used for browsing search results, generated facets are ranked. We implemented a prototype system that shows returned images from an image search classified by multiple facets. The results of an experiment to assess the facets showed that the average precision of correct facets in all queries obtained using our system is up to 82.7% for the top three and up to 77.6% for the top five ranked facets. © 2012 Springer-Verlag.
  • Yuki Sugiyama, Makoto P. Kato, Hiroaki Ohshima, Katsumi Tanaka
    Proceedings - IEEE International Conference on Multimedia and Expo 272-277 2012年  査読有り
    We propose a relative relevance feedback method for image retrieval systems. Relevance feedback is an effective method to modify a user's query by selecting relevant and irrelevant items in the search result. However, users cannot always find exactly relevant items in the first few search result pages, especially when the initial query is not specified due to the lack of user's knowledge. Thus, we propose relative relevance feedback in the present paper, which allows users to select relatively relevant and irrelevant items, and modifies a query by taking into account the relativity of user's feedback. Our experimental result shows that the relative relevance feedback outperforms a conventional relevance feedback for image retrieval tasks. © 2012 IEEE.
  • Makoto P. Kato, Hiroaki Ohshima, Katsumi Tanaka
    SIGIR'12 - Proceedings of the International ACM SIGIR Conference on Research and Development in Information Retrieval 811-820 2012年  査読有り
    We introduce the problem of domain adaptation for content-based retrieval and propose a domain adaptation method based on relative aggregation points (RAPs). Content-based retrieval including image retrieval and spoken document retrieval enables a user to input examples as a query, and retrieves relevant data based on the similarity to the examples. However, input examples and relevant data can be dissimilar, especially when domains from which the user selects examples and from which the system retrieves data are different. In content-based geographic object retrieval, for example, suppose that a user who lives in Beijing visits Kyoto, Japan, and wants to search for relatively inexpensive restaurants serving popular local dishes by means of a content-based retrieval system. Since such restaurants in Beijing and Kyoto are dissimilar due to the difference in the average cost and areas' popular dishes, it is difficult to find relevant restaurants in Kyoto based on examples selected in Beijing. We propose a solution for this problem by assuming that RAPs in different domains correspond, which may be dissimilar but play the same role. A RAP is defined as the expectation of instances in a domain that are classified into a certain class, e.g. the most expensive restaurant, average restaurant, and restaurant serving the most popular dishes. Our proposed method constructs a new feature space based on RAPs estimated in each domain and bridges the domain difference for improving content-based retrieval in heterogeneous domains. To verify the effectiveness of our proposed method, we evaluated various methods with a test collection developed for content-based geographic object retrieval. Experimental results show that our proposed method achieved significant improvements over baseline methods. Moreover, we observed that the search performance of content-based retrieval in heterogeneous domains was significantly lower than that in homogeneous domains. This finding suggests that relevant data for the same search intent depend on the search context, that is, the location where the user searches and the domain from which the system retrieves data. © 2012 ACM.
  • Meng Zhao, Hiroaki Ohshima, Katsumi Tanaka
    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 7651 284-297 2012年  査読有り
    In this paper, we propose a new image search method, called "panoramic image search", and show its application to similar landscape discovery. In order to perform the "panoramic image search", we introduce an image ranking method called PanoramaRank: a combination of image similarity and image adjacency, where image similarity is the retrieval score obtained from the classic vocabulary tree based image retrieval framework, and image adjacency is computed using a RANSAC verified SURF matching process. Our proposing notion means to search for images physically surrounded to given query image(s). A landscape is a view of an area comprising several geographical features, having a common and meaningful atmosphere. We believe a collection of images is necessary for describing a landscape. Besides, images in this collection have to be roughly similar and roughly adjacent to each other directly or indirectly. In order to discover similar landscapes, (1)find images describing the same landscape as user-selected query image(s) by employing PanoramaRank. (2)Similar images taken in different locations are retrieved, of which belong to the same location are treated as an insufficient representation of a similar landscape to the original one. (3)PanoramaRank is applied once more to find a whole landscape for each location separately. (4)Based on several comparison criteria, landscape similarity ranking has been worked out. Moreover, images of landscapes similar to a given landscape image, especially those not presented in results based on the individual pair-wised measure, can be found. Experimental results and evaluation are also presented. © 2012 Springer-Verlag.
  • 高橋 侑久, 大島 裕明, 山本 光穂, 岩崎 弘利, 小山 聡, 田中 克己
    情報処理学会論文誌 52(12) 3542-3557 2011年12月  査読有り
    本稿では,Web百科事典Wikipediaの中から歴史的観点から重要な項目を発見する手法を提案する.本研究では,歴史上の出来事や人物などを歴史エンティティと呼び,歴史エンティティの重要度を他の歴史エンティティに与えた影響の大きさと考える.我々の提案手法では,まず歴史エンティティの時間的なインパクトを計算する.Wikipedia項目間のリンク関係が,歴史エンティティ間の影響関係を表すものと見なし,反復計算アルゴリズムを用いて,歴史エンティティのインパクトを様々な時間に対し計算する.そして,各エンティティが持つインパクトが広く大きいほど重要であると考え,歴史エンティティの重要度計算を行う.空間インパクトに着目した同様の手法に対し,実現するうえでの課題を考察する.提案手法といくつかのベースライン手法に対し,Wikipediaデータを用いた評価実験を行う.We propose a method to find a historically significant article from Wikipedia. We treat an article as a historical entity and evaluate the significance of historical entities (people, events, and so on.). Here, the significance of a historical entity means how it affected other historical entities. Our proposed method first calculates the temporal impact of historical entities. The impact of a historical entity varies according to time. We assume that a Wikipedia link between historical entities represents an impact propagation. That is, when an entity has a link to another entity, we regard the former is influenced by the latter. Historical entities in Wikipedia usually have the date and location of their occurrence. Our proposed iteration algorithm propagates such initial tempo-spacial information through links in the similar manner as PageRank, so the tempo-spacial impact scores of all the historical entities can be calculated. We assume that a historical entity is significant if it influences many other entities that are far from it temporally or geographically. We demonstrate a prototype system and show the results of experiments that prove the effectiveness of our method.
  • 川野 悠, 大島 裕明, 田中 克己
    情報処理学会論文誌 52(12) 3483-3495 2011年12月  査読有り
    本稿では,アノテーション付き画像データベース検索において,クエリに応じてファセットを動的に抽出することで検索結果をある観点から分類して表示する手法を提案する.本研究におけるファセットとは画像集合を分類するための観点である.画像に付与された語に対して,上位語と同位語を発見することでそれぞれファセットを動的に生成する.画像を閲覧するユーザは興味のあるファセットを選択することで,様々な観点から分類された画像を絞り込んで閲覧することができる.実験では上位3位における全クエリの平均適合率は最高82.0%,上位5位では75.6%という結果が得られた.In this paper, we propose a method to dynamically generating facets depending on an input query in annotated image search. In the study, facets are viewpoints to classify images in Web image search. Facets are generated by finding hypernyms and coordinate terms of words which characterize the images. If a user is interested in a facet and he chooses the facet, he can browse images classified according to the facet. In the experiment, we achieved that the average precision of correct facets in all queries is up to 82.0% in the top 3 and is up to 75.6% in the top 5 of ranked facets.
  • 加藤 誠, 大島 裕明, 小山 聡, 田中 克己
    情報処理学会論文誌 52(12) 3448-3460 2011年12月  査読有り
    本論文では,ユーザがあまり知らない地域での例示による地理情報検索を提案する.飲食店などの地理オブジェクトはキーワードや属性を指定することで検索されることが多い.しかしながら,これらの手法は検索対象地域に対してあまり知識を持っていないユーザにとって利用することが困難な場合がある.地理情報例示検索では,よく知っている地域の例を与えることであまり知らない地域の地理オブジェクトを検索することが可能であり,検索対象地域に関して事前知識を必要としないことが大きな利点となる.選択した例によって動的に決定される特徴空間において,与えられた例に最も近い地理オブジェクトが検索システムの結果として出力される.我々は選択されたオブジェクトと選択されなかったオブジェクトの差異を増幅することによって,頑健に動的な特徴空間を求める方法を提案した.提案手法の有効性を示すために,我々は日本のWeb飲食店紹介サービスのデータを用いて飲食店検索システムを実装し,複数の地域間で実験を行った.既存の特徴空間決定手法と比較することにより,提案手法である差異増幅手法の有効性を確かめた.We propose a query-by-example geographic object search method for users that do not know well about the place they are in. Geographic objects, such as restaurants, are often retrieved using an attribute-based or keyword query. These methods, however, are difficult to use for users that have little knowledge on the place where they want to search. The proposed query-by-example method allows users to query by selecting examples in a familiar place for retrieving objects in an unfamiliar place. The closest objects to an input are returned based on a distance metric, which is dynamically determined by the selected examples. One of the challenges is to predict an effective distance metric, which varies for individuals. Our proposed method is used to robustly estimate the distance metric by amplifying the difference between selected and non-selected examples. We developed a restaurant search using data obtained from a Japanese restaurant Web guide to evaluate our method. Experimental results showed the effectiveness of our method, and that the performance exceeded a previously proposed method.
  • Makoto P. Kato, Meng Zhao, Kosetsu Tsukuda, Yoshiyuki Shoji, Takehiro Yamamoto, Hiroaki Ohshima, Katsumi Tanaka
    Proceedings of NTCIR-9 Workshop Meeting 202-207 2011年12月  
  • 高橋 良平, 小山 聡, 大島 裕明, 田中 克己
    電子情報通信学会論文誌 J94-A(7) 467-475 2011年7月  査読有り
    投稿型レシピサイトに投稿される料理レシピでは,レシピをより魅力的に見せるために様々な修飾表現が用いられる.レシピの名前に付けられた修飾表現は,内容を的確に表現している場合もあるが,中には誇張表現も存在する.また,そのレシピの内容を表現する的確な修飾表現が付けられていないために,他の閲覧者がそのレシピを検索によって発見できない場合や,検索結果に表示されていても見逃してしまう可能性がある.本論文では,材料や作り方といった,レシピの内容について書かれた部分から,修飾表現と適合する語と相反する語を抽出することで,料理レシピ名の修飾表現の適合度を判定する手法を提案し,これらの問題の改善を図る.
  • 高橋 侑久, 大島 裕明, 山本 光穂, 岩崎 弘利, 小山 聡, 田中 克己
    日本データベース学会論文誌 10(1) 25-30 2011年6月  査読有り
  • 杉山 祐樹, 加藤 誠, 大島 裕明, 田中 克己
    日本データベース学会論文誌 10(1) 19-24 2011年6月  査読有り
  • Mitsuo Yamamoto, Yuku Takahashi, Hirotoshi Iwasaki, Satoshi Oyama, Hiroaki Ohshima, Katsumi Tanaka
    WEB AND WIRELESS GEOGRAPHICAL INFORMATION SYSTEMS 6574 21-+ 2011年  査読有り
    We propose techniques for achieving the geographical navigation of historical events described in Web pages as "Virtual History Tour". First, we develop a method for extracting information on the historical events from the Web and organizing it into a chronological table. Our method can effectively handle ambiguous cases - homonyms and multiple location names in a sentence - by using the number of co-occurrences among events, person names, location names, and addresses in the Web. Next, we propose a method for ranking historical entities according to their impacts at specific time and location. We extend the PageRank algorithm to calculate the temporal and spatial impacts of entities. Finally, we introduce our concrete application demonstrating how users can browse historical events through timeline and map interfaces.
  • Yuku Takahashi, Hiroaki Ohshima, Mitsuo Yamamoto, Hirotoshi Iwasaki, Satoshi Oyama, Katsumi Tanaka
    HT 2011 - Proceedings of the 22nd ACM Conference on Hypertext and Hypermedia 83-92 2011年  査読有り
    We propose a method to evaluate the significance of historical entities (people, events, and so on.). Here, the sigfinicance of a historical entity means how it affected other historical entities. Our proposed method first calculates the tempo-spacial impact of historical entities. The impact of a historical entity varies according to time and location. His- torical entities are collected from Wikipedia. We assume that a Wikipedia link between historical entities represents an impact propagation. That is, when an entity has a link to another entity, we regard the former is inuenced by the latter. Historical entities in Wikipedia usually have the date and location of their occurrence. Our proposed iteration algorithm propagates such initial tempo-spacial information through links in the similar manner as PageRank, so the tempo-spacial impact scores of all the historical entities can be calculated. We assume that a historical entity is significant if it inuences many other entities that are far from it temporally or geographically. We demonstrate a prototype system and show the results of experiments that prove the effectiveness of our method. © 2011 ACM.
  • 高橋 良平, 小山 聡, 大島 裕明, 田中 克己
    日本データベース学会論文誌 9(1) 41-46 2010年6月  査読有り
  • Hiroaki Ohshima, Katsumi Tanaka
    Journal of Software 5(2) 195-205 2010年2月  査読有り
    We propose a high-speed method of detecting ontological knowledge from the Web. Ontological knowledge in this paper means a term related to a given term. For example, hypernyms and hyponyms are basic related terms that are treated in dictionaries. Synonyms and coordinate terms are also well-defined related terms. Topic terms and description terms represent topics of the given term and they are vaguely defined. There are other related terms such as abbreviations and nicknames. The proposed method can be used for detecting many kinds of related terms. It extracts related terms from text resources only from Web search results, which consist of the titles, snippets, and URLs of Web pages. We use two different kinds of lexico-syntactic patterns to extract related terms from the search results, and these are called bi-directional lexico-syntactic patterns. The proposed method can be applied to both languages where words are separated by a space such as English and Korean and ones where words are not separated by a space such as Japanese and Chinese. The proposed method does not need any advanced natural language processing such as morphological analysis or syntactic parsing. It works relatively fast and has excellent precision. We also propose a method of automatically discovering superior bi-directional lexico-syntactic patterns using Web search engines because it is sometimes difficult to find appropriate patterns to detect related terms in a certain relationship. © 2010 Academy Publisher.
  • Katsumi Tanaka, Satoshi Nakamura, Hiroaki Ohshima, Yusuke Yamamoto, Yusuke Yanbe, Makoto Kato
    Journal of Software 5(2) 154-159 2010年2月  査読有り
    We describe a new concept for improving Web search performance and/or increasing the information credibility of search results using Web1.0 and Web2.0 content in a complementary manner. Conventional Web search engines still suffer from a low precision/recall ratio, especially for searching multimedia content (images, videos etc.). The quality control of Web search is generally insufficient due to low publishing barriers. As a result, there is a large amount of mistaken and unreliable information on the Web that can have detrimental effects on users. This calls for technology that facilitates the judging of the trustworthiness or credibility of content and the accuracy of the information that users encounter on the Web. Such technology should be able to handle a wide range of tasks: extracting credible information related to a given topic, organizing this information, detecting its provenance, and clarifying background, facts, and other related opinions and their distribution. We propose and describe a concept of enhancing the search performance of conventional Web search engines and analyzing information credibility of Web information using the interaction between Web1.0 and Web2.0 content. We also overview our recent research activities on Web search and information credibility based on this concept. © 2010 Academy Publisher.
  • Makoto P. Kato, Satoshi Oyama, Ohshima Hiroaki, Katsumi Tanaka
    Proceedings of the 4th International Conference on Ubiquitous Information Management and Communication ICUIMC 10 311-320 2010年  査読有り
    We propose a method of searching for geographic entities in an unknown place with a query by example in a known place. Geographic entity searches are installed in many Web sites such as those for shopping and restaurants. Most of the sites hold classic attribute-based or keyword-based interfaces for entity retrieval however, specifying each attribute users want is time-consuming, and keywords are not effective for representing users' complex intentions. The proposed query by example method in a map interface allows users to intuitively query by selecting entities in places they know well. The most similar entities to an input are returned based on the similarity varying with individuals. Our proposed method is robust for estimating the similarity using not only selected examples, but also implicit negative feedback, which is predicted by how the user selects examples as a query in the map interface. Experimental results proved the effectiveness of our method, and the performance exceeded that of a previously proposed method. © 2010 ACM.
  • Katsumi Tanaka, Hiroaki Ohshima, Adam Jatowt, Satoshi Nakamura, Yusuke Yamamoto, Kazutoshi Sumiya, Ryong Lee, Daisuke Kitayama, Takayuki Yumoto, Yukiko Kawai, Jianwei Zhang, Shinsuke Nakajima, Yoichi Inagaki
    Proceedings of the 4th International Conference on Ubiquitous Information Management and Communication ICUIMC 10 147-156 2010年  査読有り
    We describe a new concept and method for evaluating the Web information credibility. The quality control of information (text, image, video etc.) on the Web is generally insufficient due to low publishing barriers. As a result, there is a large amount of mistaken and unreliable information on the Web that can have detrimental effects on users. This calls for technology that facilitates the judging of the credibility (expertise and trustworthiness) of Web content and the accuracy of the information that users encounter on the Web. Such technology should be able to handle a wide range of tasks: extracting several credibility-related features from the target Web content, extracting reputation-related information for the target Web content, such as hyperlinks and social bookmarks and evaluating its distribution, and evaluating features of the target content authors. We propose and describe methodologies of analyzing information credibility of Web information: (1) content analysis, (2) social support analysis and (3) author analysis. We overview our recent research activities on Web information credibility evaluation based on this methodologies. © 2010 ACM.
  • Hiroaki Ohshima, Satoshi Oyama, Katsumni Tanaka
    DATABASE SYSTEMS FOR ADVANCED APPLICATIONS, PT I, PROCEEDINGS 5981 491-+ 2010年  査読有り
    We propose an extensible platform for bridging private databases and Web services. Our main idea is to Make a Web service and its results be a set of virtual tables in a relational database (RDB) environment,. As private data that cannot he disclosed is stored in private RDBs, these virtual tables realize a bridge between Private RDBs and Web services.
  • Ryohei Takahashi, Satoshi Oyama, Hiroaki Ohshima, Katsumi Tanaka
    WEB-AGE INFORMATION MANAGEMENT, PROCEEDINGS 6184 429-+ 2010年  査読有り
    To make online advertisements or user-generated content more attractive, people often use modifiers sod as "authentic," "impressive," "special," and so on. Some of these are exaggerations. That is, sometimes modifiers that are attached to Web entities do not represent the content; appropriately. In this paper, we proposed a method to evaluate the truthfulness of modifiers attached to Web entity names by extracting relevant and conflicting terms front the content texts.
  • Natsuki Takata, Hiroaki Ohshima, Satoshi Oyama, Katsumi Tanaka
    WEB-AGE INFORMATION MANAGEMENT, PROCEEDINGS 6184 441-+ 2010年  査読有り
    We propose a method of discovering alternative answers from the Web that are to a question posted on a Web question S. at (Q&A) site and differ from existing answers to the question on the Q&A site. Our method first automatically generates queries for conventional Web search engines to collect Web Pages that can contain answers to the target question. Each collected Web page is evaluated by calculating two kinds of scores: cute represents the probability that the page has information that answers a question in the Q&A content and the other represents the possibility that it has an alternative answer. The method is implemented and the system is evaluated using actual Q&A contents.
  • Makoto P. Kato, Ohshima Hiroaki, Satoshi Oyama, Katsumi Tanaka
    International Conference on Information and Knowledge Management, Proceedings 1541-1544 2010年  査読有り
    We propose a query-by-example geographic object search method for users that do not know well about the place they are in. Geographic objects, such as restaurants, are often retrieved using an attribute-based or keyword query. These queries, however, are difficult to use for users that have little knowledge on the place where they want to search. The proposed query-by-example method allows users to query by selecting examples in familiar places for retrieving objects in unfamiliar places. One of the challenges is to predict an effective distance metric, which varies for individuals. Another challenge is to calculate the distance between objects in heterogeneous domains considering the feature gap between them, for example, restaurants in Japan and China. Our proposed method is used to robustly estimate the distance metric by amplifying the difference between selected and non-selected examples. By using the distance metric, each object in a familiar domain is evenly assigned to one in an unfamiliar domain to eliminate the difference between those domains. We developed a restaurant search using data obtained from a Japanese restaurant Web guide to evaluate our method. © 2010 ACM.
  • 加藤誠, 大島裕明, 小山聡, 田中克己
    情報処理学会論文誌トランザクション(CD-ROM) 2009(1) DETABESU,VOL.2,NO.2,110-125 2009年11月15日  
  • 加藤 誠, 大島 裕明, 小山 聡, 田中 克己
    情報処理学会論文誌:データベース 2(2) 110-125 2009年6月  査読有り
    本稿では,関係を入力として与え,その関係との類似度に基づいてオブジェクト名を検索する手法についての提案を行う.一般的な検索エンジンを用いた場合,"京都と八ツ橋の関係と類似するような,ニュージーランドに対するもの" を検索することは,以下の 2 つの点で困難である.1 つは,ユーザはニュージーランドに関してある程度の知識を必要とする点であり,もう 1 つは,京都と八ツ橋の関係を言語や数値などで表現する必要がある点である.これらの条件を必要とすることなく,入力として A,B,C が与えられた場合,A と B で成り立つ関係の集合 Relation と,C と D で成り立つ関係の集合 Relation' が類似するような D の名称を検索する手法を本稿では扱う.我々は,この検索の実現方法の 1 つとして,共起する語の出現分布の差分に基づく手法を提案する.提案手法は 2 つのプロセスからなる.まず,Web 検索エンジンの結果として得られるテキストから,2 つの語 A,B を強く結び付けるような語を発見する.次に,得られた語集合と語 C を用いて検索を行うことにより,A と B の関係と類似する関係を持つ,C に対する語 D を発見する.実験では,33 の関係,854 のテストセットを用いて,提案手法とベースライン手法,言語パターンを用いた手法との比較を行った.提案手法はベースライン手法,言語パターンよりも優れた精度を示し,上位 20 件に正解を含めることができたテストセットの割合は 49.8% であった.In this paper, we propose a method that searches for object names based on similarity of relations input by users. For example, it is difficult to search for what is to New Zealand as Yatsuhashi is to Kyoto by using a traditional information retrieval method because of the following two reasons. First, users need to have much information on New Zealand. Second, users have to represent the relation between Kyoto and Yatsuhashi by words or values. Object name search based on relational similarity is to search for D which is to C as B is to A without the two restrictions. We propose a method to realize it using the difference between distributions of co-occuring terms. From Web search results, our method finds terms which strongly connect two terms A and B, searches for Web pages with a term C and them, and discovers a term D which is to C as B is to A from them. We experimented with 33 relationships and 854 test sets to compare the proposed method, a baseline one and a lexicon pattern-based one. The proposed method got more precise results than the baseline and the lexicon pattern-based ones, and the percentage of test sets which obtained answers in top 20 was 49.8%.
  • 加藤 誠, 大島 裕明, 小山 聡, 田中 克己
    日本データベース学会論文誌 8(1) 11-16 2009年6月  査読有り
  • Takuya Kobayashi, Hiroaki Ohshima, Satoshi Oyama, Katsumi Tanaka
    WICOW 09 67-74 2009年  査読有り
    The value of a brand name is an important factor that consumers often take into consideration when making their purchasing decisions. However, it is difficult for users to evaluate correctly the value of a brand name, especially when they encounter it for the first time. In reality, sometimes a brand's description or its use is purposely manipulated so as to give an impression of high value. In another way, a non-existing brand name may be used to attract consumers. We call such names "glorified terms." In this paper, we propose a method for evaluating a brand's value from texts on the Web. To this end, we first acquire candidates of attributes useful for evaluating whether a term is a brand name or a glorified term. The candidates are evaluated according to the idea whereby explanations about a real brand name often contain attributes describing its quality. We implemented a prototype system especially for agricultural and livestock products. The system judges whether a given one is a glorified term or a well-known brand name from several viewpoints. We conducted preliminary experiments and we achieved 74% - 85% accuracy rate.
  • Hiroaki Ohshima, Katsumi Tanaka
    Proceedings of the 3rd International Conference on Ubiquitous Information Management and Communication, ICUIMC'09 441-449 2009年  査読有り
    We propose a method for detecting related terms of a given term quickly using a conventional Web search engine. There are many kinds of related terms. For example, hypernyms and hyponyms are basic related terms that are treated in dictionaries. Synonyms and coordinate terms are also well defined related terms. Topic terms and description terms represent topics of the given term and they are vaguely defined. There are other related terms such as abbreviations and nicknames. The proposed method can be used these many kinds of related terms. It extracts related terms from text resources only from Web search results, which consist of titles, snippets, and URLs of Web pages. We use two different kind of lexico-syntactic patterns to extract related terms from the search results, and they are called bi-directional lexico-syntactic patterns. The proposed method can be applied to both languages where words are separated by a space such as English and Korean and ones where words are not separated by a space such as Japanese and Chinese. The proposed method does not need any advanced natural language processing such as morphological analysis or syntactic parsing. It works relatively fast with good precision. Copyright 2009 ACM.
  • Takuya Kobayashi, Hiroaki Ohshima, Satoshi Oyama, Katsumi Tanaka
    Proceedings of the ACM Symposium on Applied Computing 1316-1317 2009年  査読有り
    In this paper we modeled review information to assess its credibility. We think the information to support users in credibility evaluation of reviews is necessary. This paper presents method for detecting reviewers' activity areas and biases. We also discuss the problem of glorified terms in online reviews. As these terms cause cognitive bias, supporting information that enables accurate understanding is needed. Copyright 2009 ACM.
  • Makoto Nakatani, Adam Jatowt, Hiroaki Ohshima, Katsumi Tanaka
    DATABASE SYSTEMS FOR ADVANCED APPLICATIONS, PROCEEDINGS 5463 570-584 2009年  査読有り
    In Web search, it is often difficult for users to judge which page they should chouse among search results and which page provides high quality and credible content. For example, some results may describe query tropics from narrow or inclined viewpoints, or they may contain only shallow information. While there are, many factors influencing quality perception of search results, we propose two important aspects that determine their usefulness, "topic coverage" and "topic detailedness". "Topic coverage" means the extent to which a page covers typical topics related to query terms. On the other hand, "topic detailedness" measures how many special topics are discussed in a Web page. We propose a method to discover typical topic terms and special topics terms for a search query by using the information gained from the structural features of Wikipedia, the free encyclopedia. Moreover, we propose an application to calculate topic coverage and topic detailedness of Web Search results, by using terms extracted from Wikipedia.
  • Hiroaki Ohshima, Adam Jatowt, Satoshi Oyama, Satoshi Nakamura, Katsumi Tanaka
    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 5802 379-386 2009年  査読有り
    Recently, the Web has made dramatic impact on our lives becoming for many people a main information source. We believe that the continuous study of user needs and their search behavior is a necessary key factor for a technology to be able to keep along with society changes. In this paper we report the results of a large scale online questionnaire conducted in order to investigate the ways in which users search the Web and the kinds of needs they have. We have analyzed the results based on the respondents' attributes such as age and gender. The findings should be considered as hypotheses for further systematic studies. © 2009 Springer-Verlag Berlin Heidelberg.

MISC

 205

書籍等出版物

 4

講演・口頭発表等

 4

共同研究・競争的資金等の研究課題

 19

産業財産権

 3

学術貢献活動

 2