政大機構典藏-National Chengchi University Institutional Repository(NCCUR):Item 140.119/67157
English  |  正體中文  |  简体中文  |  Post-Print筆數 : 27 |  全文筆數/總筆數 : 113392/144379 (79%)
造訪人次 : 51231161      線上人數 : 946
RC Version 6.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
搜尋範圍 查詢小技巧:
  • 您可在西文檢索詞彙前後加上"雙引號",以獲取較精準的檢索結果
  • 若欲以作者姓名搜尋,建議至進階搜尋限定作者欄位,可獲得較完整資料
  • 進階搜尋
    政大機構典藏 > 資訊學院 > 資訊科學系 > 學位論文 >  Item 140.119/67157
    請使用永久網址來引用或連結此文件: https://nccur.lib.nccu.edu.tw/handle/140.119/67157


    題名: 由搜尋引擎使用記錄探勘與時序相關的多義查詢
    Identifying Time Sensitive Ambiguous Query from Query Logs of Search Engines
    作者: 徐國獻
    Hsu, Kuo Hsien
    貢獻者: 沈錳坤
    Shan, Man Kwan
    徐國獻
    Hsu, Kuo Hsien
    關鍵詞: 搜尋引擎
    搜尋意圖
    與時序相關的查詢關鍵詞
    多義的查詢關鍵詞
    search engine
    search intent
    time sensitive query
    ambiguous query
    日期: 2012
    上傳時間: 2014-07-01 12:14:57 (UTC+8)
    摘要: 在使用搜尋引擎時,同一個查詢關鍵詞可能會有不同的語意。舉例來說,「蘋果」這個查詢關鍵詞可以代表「蘋果日報」也可以代表「蘋果電腦」,這就是所謂的「多義查詢」。除此之外,一個查詢關鍵詞的多個語意在每一個時段會有不同的使用比例,例如:當使用者在早上搜尋「蘋果」的時候,有比較高的比例是想要去查詢「蘋果日報」,比較少的比例想要看到「蘋果電腦」;在下午搜尋「蘋果」的時候,則是比較想要看到與「蘋果電腦」相關的商品及網站。我們將這種查詢關鍵詞稱為「與時序相關的多義查詢」。
    本研究主要的目的就是透過搜尋引擎使用記錄,探勘出這種查詢關鍵詞 。我們一共提出了兩種方法,第一種方法是針對每個查詢關鍵詞分別計算多義程度以及時序敏感度,並設定一個門檻值去找出超過門檻值的查詢關鍵詞;第二種方法則是對每個查詢關鍵詞的搜尋結果網頁,去計算兩兩之間的多義距離以及時序敏感距離,再透過階層式分群演算法分別建立出多義階層樹以及時序敏感階層樹,然後再分析這兩個階層樹的相似度,以找出兼具多義性及時序敏感性的查詢。
    從實驗結果發現,第二種方法表現比第一種方法好。實驗的基準值,是直接使用全部的查詢關鍵詞,去重新排名整個搜尋引擎的改善效能,結果為0.75%;而使用第二種方法偵測出來的查詢關鍵詞,去重新排名的改善效能為5.61%,比基準值高,也比第一種方法3.34%高出許多。第二種方法,一共偵測出632個查詢關鍵詞,其中可以歸納出四種不同的類型:「入口網站類別」、「快速連結網站類別」、「集團網站類別」以及「同名網站類別」。
    偵測出來的查詢關鍵詞,不僅可以應用在搜尋引擎的效能調整上,以及用來最佳化搜尋引擎的廣告,也可以用來幫助搜尋引擎決定要呈現哪些搜尋快現,以及搜尋快現的顯示順序。
    In search engine, a query term could indicate different intents. For example, query term “apple” could stand for Apple store website and also Apple Daily newspaper website. This kind of queries are called “ambiguous query.” Through exploring the usage of search engine, we found the different intents of the same query term may be distributed in different time interval with different usage ratio. For example, more users search “apple” for apple dairy newspaper website in the morning, and search “apple” for Apple store website in the afternoon. We categorize this kind of query as “time sensitive ambiguous query.”
    In this thesis, we target on discovering “time sensitive ambiguous query” and propose two methods, MATISD and SHATIS. The first method, MATISD, evaluates the ambiguity degree and time sensitivity degree respectively for each query term, and finds out the time sensitive ambiguous query terms with threshold for both degrees. The second method, SHATIS, is to generate the distance between pairs of search results belonging to the same query term, and use that to build ambiguity hierarchy and time sensitivity hierarchy for each query term. Then, we identify time sensitive ambiguous query by the similarity of tree structure between ambiguity hierarchy and time sensitivity hierarchy.
    According to the experimental result, we found SHATIS performs better than MATISD. The baseline is to re-rank search engine by the order of users clicks with whole the query terms, and the improvement result is 0.75%. With the queries detected by SHATIS to re-rank search engine, we got improvement result 5.61%, and that’s better than baseline 0.75% and MATISD result 3.34%. SHATIS totally identifies 632 time sensitive ambiguous query terms, most of which are related to big website, such as “yahoo”, “google” and “pchome”. Some are the queries related to big company group, such as “Far East group.”
    Identifying time sensitive ambiguous query could be used to improve ranking performance of search engines and advertisement. It could be also used to improve the slotting and the trigger of Direct Display on search engine.
    參考文獻: [1] R. Agrawal, S. Gollapudi, A. Halverson, and S. Ieong, Diversifying Search Results, 2nd ACM International Conference on Web Search and Data Mining WSDM, Barcelona, Spain, 2009.
    [2] A. Bernstein, E. Kaufmann, C. Bürki, and M. Klein, How Similar Is It? Towards Personalized Similarity Measures in Ontologies, Wirtschaftsinformatik 2005, Physica-Verlag HD, 2005.
    [3] G. Capannini, F. M. Nardini, R. Perego, and F. Silvestri, Efficient Diversification of Search Results using Query Logs, International Conference on World Wide Web WWW, Hyderabad, India, 2011.
    [4] J. G. Carbonell, and J. Goldstein, The Use of MMR, Diversity-Based Reranking for Reordering Documents and Producing Summaries, ACM SIGIR International Conference on Information Retrieval, 1998.
    [5] P. Clough, M. Sanderson, M. Abouammoh, S. Navarro, and M. Paramita, Multiple Approaches to Analyzing Query Diversity, ACM SIGIR International Conference on Information Retrieval, 2009.
    [6] W. Dakka, L. Gravano, and P. G. Ipeirotis, Answering General Time-Sensitive Queries, IEEE Transactions on Knowledge and Data Engineering, Vol. 24, Issue 2, 2011.
    [7] B. Edelman, M. Ostrovsky, and M. Schwarz. Internet Advertising and The Generalized Second Price Auction: Selling Billions of Dollars Worth of Keywords, American Economic Review, American Economic Association, vol. 97(1), 2007.
    [8] J. Han and M. Kamber, Data Mining: Concepts and Techniques, Morgan Kaufmann Publishers, 2nd Edition, 2006.
    [9] R. Jones, and F. Diaz, Temporal Profiles of Queries, ACM Transactions on Information Systems, Vol. 25, No. 3, 2007.
    [10] Y. Kalfoglou, and M. Schorlemmer. Ontology Mapping: the State of the Art. Knowledge Engineering Review. Vol.18, No.1, 2003
    [11] A. Kulkarni, J. Teevan, K. M. Svore, S. T. Dumais, Understanding Temporal Query Dynamics, 4th ACM International Conference on Web Search and Data Mining WSDM, Hong Kong, China, 2011.
    [12] S. Kullback, and R. A. Leibler, On Information and Sufficiency, Annals of Mathematical Statistics. Vol.22, No. 1, 1951
    [13] C. D. Manning, P. Raghavan, and H. Schutze. Introduction to Information Retrieval. Cambridge University Press, 2008.
    [14] A. Mehta, A. Saberi, U. Vazirani, and V. Vazirani, Adwords and Generalized Online Matching. Journal of the ACM, Vol.54, No.5, 2007
    [15] D. Metzler, R. Jones, F. Peng, and R. Zhang, Improving Search Relevance for Implicitly Temporal Queries, ACM SIGIR International Conference on Information Retrieval, 2009.
    [16] U. Priss, Formal Concept Analysis in Information Science. ARIST. Vol.40, No.1, 2006.
    [17] D. Rose, and D. Levinson. Understanding User Goals in Web Search, International Conference on World Wide Web WWW, 2004.
    [18] R. Song, Z. Luo, J. Y. Nie, Y. Yu, and H. W. Hon, Identification of Ambiguous Queries in Web Search, Information Processing and Management, Vol. 45, Issue. 2, 2009.
    [19] M. J. Welch, J. Cho, and C. Olston, Search Result Diversity for Information Queries, International World Wide Web Conference WWW, Hyderabad, India, 2011.
    [20] J. Yang, and J. Leskovec. Patterns of Temporal Variation in Online Media, 4th ACM International Conference on Web Search and Data Mining WSDM, 2011.
    [21] Jaccard Distance http://en.wikipedia.org/wiki/Jaccard_index
    [22] Spearman’s rank correlation coefficient http://en.wikipedia.org/wiki/Spearman`s_rank-_correlation_coefficient
    描述: 碩士
    國立政治大學
    資訊科學學系
    99971001
    101
    資料來源: http://thesis.lib.nccu.edu.tw/record/#G0999710011
    資料類型: thesis
    顯示於類別:[資訊科學系] 學位論文

    文件中的檔案:

    檔案 大小格式瀏覽次數
    index.html0KbHTML2202檢視/開啟


    在政大典藏中所有的資料項目都受到原著作權保護.


    社群 sharing

    著作權政策宣告 Copyright Announcement
    1.本網站之數位內容為國立政治大學所收錄之機構典藏,無償提供學術研究與公眾教育等公益性使用,惟仍請適度,合理使用本網站之內容,以尊重著作權人之權益。商業上之利用,則請先取得著作權人之授權。
    The digital content of this website is part of National Chengchi University Institutional Repository. It provides free access to academic research and public education for non-commercial use. Please utilize it in a proper and reasonable manner and respect the rights of copyright owners. For commercial use, please obtain authorization from the copyright owner in advance.

    2.本網站之製作,已盡力防止侵害著作權人之權益,如仍發現本網站之數位內容有侵害著作權人權益情事者,請權利人通知本網站維護人員(nccur@nccu.edu.tw),維護人員將立即採取移除該數位著作等補救措施。
    NCCU Institutional Repository is made to protect the interests of copyright owners. If you believe that any material on the website infringes copyright, please contact our staff(nccur@nccu.edu.tw). We will remove the work from the repository and investigate your claim.
    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - 回饋