Meng Zhao, Hiroaki Ohshima, Katsumi Tanaka
IPSJ Transactions on Databases 9(2) 1-11 2016年6月 査読有り
Although long queries are still a small part of the queries submitted to Web search engines, their usage tends to gradually increase. However, the effectiveness of the retrieval decreases with the increase of query length. Long queries are very likely to have few Web pages returned. We target at sentential queries, a type of long queries, and propose a method called sentential query paraphrasing for improving their retrieval performance, especially on recall. We are motivated by the assumption that a sentence is an indivisible whole, which means that removing terms or phrases from a sentence would lead to the missing of some information or query drift. In this paper, we paraphrase sentential queries to avoid missing information and consequently ensure the completeness of the information. Take the sentential query "apples pop a powerful pectin punch," for example. Its meaning will be changed if one or more terms are removed, and few Web pages are returned by conventional search engines. In contrast, querying by its paraphrases, such as "apples contain a lot of pectin" or "apples are rich in pectin," can retrieve more Web pages. The experimental results show that our method can acquire more paraphrases from the noisy Web. Besides, with the help of paraphrases, more Web pages can be retrieved, especially for those sentential queries that could not find any answers with its original expression.------------------------------This is a preprint of an article intended for publication Journal ofInformation Processing(JIP). This preprint should not be cited. Thisarticle should be cited as: Journal of Information Processing Vol.24(2016) No.4(online)------------------------------Although long queries are still a small part of the queries submitted to Web search engines, their usage tends to gradually increase. However, the effectiveness of the retrieval decreases with the increase of query length. Long queries are very likely to have few Web pages returned. We target at sentential queries, a type of long queries, and propose a method called sentential query paraphrasing for improving their retrieval performance, especially on recall. We are motivated by the assumption that a sentence is an indivisible whole, which means that removing terms or phrases from a sentence would lead to the missing of some information or query drift. In this paper, we paraphrase sentential queries to avoid missing information and consequently ensure the completeness of the information. Take the sentential query "apples pop a powerful pectin punch," for example. Its meaning will be changed if one or more terms are removed, and few Web pages are returned by conventional search engines. In contrast, querying by its paraphrases, such as "apples contain a lot of pectin" or "apples are rich in pectin," can retrieve more Web pages. The experimental results show that our method can acquire more paraphrases from the noisy Web. Besides, with the help of paraphrases, more Web pages can be retrieved, especially for those sentential queries that could not find any answers with its original expression.------------------------------This is a preprint of an article intended for publication Journal ofInformation Processing(JIP). This preprint should not be cited. Thisarticle should be cited as: Journal of Information Processing Vol.24(2016) No.4(online)------------------------------