Loading...
|
Please use this identifier to cite or link to this item:
https://nccur.lib.nccu.edu.tw/handle/140.119/67157
|
Title: | 由搜尋引擎使用記錄探勘與時序相關的多義查詢 Identifying Time Sensitive Ambiguous Query from Query Logs of Search Engines |
Authors: | 徐國獻 Hsu, Kuo Hsien |
Contributors: | 沈錳坤 Shan, Man Kwan 徐國獻 Hsu, Kuo Hsien |
Keywords: | 搜尋引擎 搜尋意圖 與時序相關的查詢關鍵詞 多義的查詢關鍵詞 search engine search intent time sensitive query ambiguous query |
Date: | 2012 |
Issue Date: | 2014-07-01 12:14:57 (UTC+8) |
Abstract: | 在使用搜尋引擎時,同一個查詢關鍵詞可能會有不同的語意。舉例來說,「蘋果」這個查詢關鍵詞可以代表「蘋果日報」也可以代表「蘋果電腦」,這就是所謂的「多義查詢」。除此之外,一個查詢關鍵詞的多個語意在每一個時段會有不同的使用比例,例如:當使用者在早上搜尋「蘋果」的時候,有比較高的比例是想要去查詢「蘋果日報」,比較少的比例想要看到「蘋果電腦」;在下午搜尋「蘋果」的時候,則是比較想要看到與「蘋果電腦」相關的商品及網站。我們將這種查詢關鍵詞稱為「與時序相關的多義查詢」。
本研究主要的目的就是透過搜尋引擎使用記錄,探勘出這種查詢關鍵詞 。我們一共提出了兩種方法,第一種方法是針對每個查詢關鍵詞分別計算多義程度以及時序敏感度,並設定一個門檻值去找出超過門檻值的查詢關鍵詞;第二種方法則是對每個查詢關鍵詞的搜尋結果網頁,去計算兩兩之間的多義距離以及時序敏感距離,再透過階層式分群演算法分別建立出多義階層樹以及時序敏感階層樹,然後再分析這兩個階層樹的相似度,以找出兼具多義性及時序敏感性的查詢。
從實驗結果發現,第二種方法表現比第一種方法好。實驗的基準值,是直接使用全部的查詢關鍵詞,去重新排名整個搜尋引擎的改善效能,結果為0.75%;而使用第二種方法偵測出來的查詢關鍵詞,去重新排名的改善效能為5.61%,比基準值高,也比第一種方法3.34%高出許多。第二種方法,一共偵測出632個查詢關鍵詞,其中可以歸納出四種不同的類型:「入口網站類別」、「快速連結網站類別」、「集團網站類別」以及「同名網站類別」。
偵測出來的查詢關鍵詞,不僅可以應用在搜尋引擎的效能調整上,以及用來最佳化搜尋引擎的廣告,也可以用來幫助搜尋引擎決定要呈現哪些搜尋快現,以及搜尋快現的顯示順序。 In search engine, a query term could indicate different intents. For example, query term “apple” could stand for Apple store website and also Apple Daily newspaper website. This kind of queries are called “ambiguous query.” Through exploring the usage of search engine, we found the different intents of the same query term may be distributed in different time interval with different usage ratio. For example, more users search “apple” for apple dairy newspaper website in the morning, and search “apple” for Apple store website in the afternoon. We categorize this kind of query as “time sensitive ambiguous query.”
In this thesis, we target on discovering “time sensitive ambiguous query” and propose two methods, MATISD and SHATIS. The first method, MATISD, evaluates the ambiguity degree and time sensitivity degree respectively for each query term, and finds out the time sensitive ambiguous query terms with threshold for both degrees. The second method, SHATIS, is to generate the distance between pairs of search results belonging to the same query term, and use that to build ambiguity hierarchy and time sensitivity hierarchy for each query term. Then, we identify time sensitive ambiguous query by the similarity of tree structure between ambiguity hierarchy and time sensitivity hierarchy.
According to the experimental result, we found SHATIS performs better than MATISD. The baseline is to re-rank search engine by the order of users clicks with whole the query terms, and the improvement result is 0.75%. With the queries detected by SHATIS to re-rank search engine, we got improvement result 5.61%, and that’s better than baseline 0.75% and MATISD result 3.34%. SHATIS totally identifies 632 time sensitive ambiguous query terms, most of which are related to big website, such as “yahoo”, “google” and “pchome”. Some are the queries related to big company group, such as “Far East group.”
Identifying time sensitive ambiguous query could be used to improve ranking performance of search engines and advertisement. It could be also used to improve the slotting and the trigger of Direct Display on search engine. |
Reference: | [1] R. Agrawal, S. Gollapudi, A. Halverson, and S. Ieong, Diversifying Search Results, 2nd ACM International Conference on Web Search and Data Mining WSDM, Barcelona, Spain, 2009.
[2] A. Bernstein, E. Kaufmann, C. Bürki, and M. Klein, How Similar Is It? Towards Personalized Similarity Measures in Ontologies, Wirtschaftsinformatik 2005, Physica-Verlag HD, 2005.
[3] G. Capannini, F. M. Nardini, R. Perego, and F. Silvestri, Efficient Diversification of Search Results using Query Logs, International Conference on World Wide Web WWW, Hyderabad, India, 2011.
[4] J. G. Carbonell, and J. Goldstein, The Use of MMR, Diversity-Based Reranking for Reordering Documents and Producing Summaries, ACM SIGIR International Conference on Information Retrieval, 1998.
[5] P. Clough, M. Sanderson, M. Abouammoh, S. Navarro, and M. Paramita, Multiple Approaches to Analyzing Query Diversity, ACM SIGIR International Conference on Information Retrieval, 2009.
[6] W. Dakka, L. Gravano, and P. G. Ipeirotis, Answering General Time-Sensitive Queries, IEEE Transactions on Knowledge and Data Engineering, Vol. 24, Issue 2, 2011.
[7] B. Edelman, M. Ostrovsky, and M. Schwarz. Internet Advertising and The Generalized Second Price Auction: Selling Billions of Dollars Worth of Keywords, American Economic Review, American Economic Association, vol. 97(1), 2007.
[8] J. Han and M. Kamber, Data Mining: Concepts and Techniques, Morgan Kaufmann Publishers, 2nd Edition, 2006.
[9] R. Jones, and F. Diaz, Temporal Profiles of Queries, ACM Transactions on Information Systems, Vol. 25, No. 3, 2007.
[10] Y. Kalfoglou, and M. Schorlemmer. Ontology Mapping: the State of the Art. Knowledge Engineering Review. Vol.18, No.1, 2003
[11] A. Kulkarni, J. Teevan, K. M. Svore, S. T. Dumais, Understanding Temporal Query Dynamics, 4th ACM International Conference on Web Search and Data Mining WSDM, Hong Kong, China, 2011.
[12] S. Kullback, and R. A. Leibler, On Information and Sufficiency, Annals of Mathematical Statistics. Vol.22, No. 1, 1951
[13] C. D. Manning, P. Raghavan, and H. Schutze. Introduction to Information Retrieval. Cambridge University Press, 2008.
[14] A. Mehta, A. Saberi, U. Vazirani, and V. Vazirani, Adwords and Generalized Online Matching. Journal of the ACM, Vol.54, No.5, 2007
[15] D. Metzler, R. Jones, F. Peng, and R. Zhang, Improving Search Relevance for Implicitly Temporal Queries, ACM SIGIR International Conference on Information Retrieval, 2009.
[16] U. Priss, Formal Concept Analysis in Information Science. ARIST. Vol.40, No.1, 2006.
[17] D. Rose, and D. Levinson. Understanding User Goals in Web Search, International Conference on World Wide Web WWW, 2004.
[18] R. Song, Z. Luo, J. Y. Nie, Y. Yu, and H. W. Hon, Identification of Ambiguous Queries in Web Search, Information Processing and Management, Vol. 45, Issue. 2, 2009.
[19] M. J. Welch, J. Cho, and C. Olston, Search Result Diversity for Information Queries, International World Wide Web Conference WWW, Hyderabad, India, 2011.
[20] J. Yang, and J. Leskovec. Patterns of Temporal Variation in Online Media, 4th ACM International Conference on Web Search and Data Mining WSDM, 2011.
[21] Jaccard Distance http://en.wikipedia.org/wiki/Jaccard_index
[22] Spearman’s rank correlation coefficient http://en.wikipedia.org/wiki/Spearman`s_rank-_correlation_coefficient |
Description: | 碩士 國立政治大學 資訊科學學系 99971001 101 |
Source URI: | http://thesis.lib.nccu.edu.tw/record/#G0999710011 |
Data Type: | thesis |
Appears in Collections: | [資訊科學系] 學位論文
|
Files in This Item:
File |
Size | Format | |
index.html | 0Kb | HTML2 | 203 | View/Open |
|
All items in 政大典藏 are protected by copyright, with all rights reserved.
|