政大機構典藏-National Chengchi University Institutional Repository(NCCUR):Item 140.119/60196
English  |  正體中文  |  简体中文  |  Post-Print筆數 : 27 |  全文笔数/总笔数 : 113318/144297 (79%)
造访人次 : 51080314      在线人数 : 945
RC Version 6.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
搜寻范围 查询小技巧:
  • 您可在西文检索词汇前后加上"双引号",以获取较精准的检索结果
  • 若欲以作者姓名搜寻,建议至进阶搜寻限定作者字段,可获得较完整数据
  • 进阶搜寻
    政大機構典藏 > 商學院 > 資訊管理學系 > 學位論文 >  Item 140.119/60196


    请使用永久网址来引用或连结此文件: https://nccur.lib.nccu.edu.tw/handle/140.119/60196


    题名: 資訊檢索之學術智慧
    Research Intelligence Involving Information Retrieval
    作者: 杜逸寧
    Tu, Yi-Ning
    贡献者: 諶家蘭
    林我聰

    Seng, Jia-Lang
    Lin, Woo-Tsong

    杜逸寧
    Tu, Yi-Ning
    关键词: 議題的發現與追蹤
    資料探勘
    資訊檢索
    學術智慧
    貝氏估計
    新穎度指標
    發表量指標
    引文分析
    Topic discovery and tracking
    data mining
    information retrieval
    Bayesian estimation
    academic intelligence
    novelty index
    published volume index
    citation analysis
    日期: 2009
    上传时间: 2013-09-04 16:55:05 (UTC+8)
    摘要: 偵測新興議題對於研究者而言是一個相當重要的問題,研究者如何在有限的時間和資源下探討同一領域內的新興議題將比解決已經成熟的議題帶來較大的貢獻和影響力。本研究將致力於協助研究者偵測新興且具有未來潛力的研究議題,並且從學術論文中探究對於研究者在做研究中有幫助的學術智慧。在搜尋可能具有研究潛力的議題時,我們假設具有研究潛力的議題將會由同一領域中較具有影響力的作者和刊物發表出,因此本研究使用貝式估計的方法去推估同一領域中相關的研究者和學術刊物對於該領域的影響力,進而藉由這些資訊可以找出未來具有潛力的新興候選議題。此外就我們所知的議題偵測文獻中對於認定一個議題是否已經趨於成熟或者是否新穎且具有研究的潛力仍然缺乏有效及普遍使用的衡量工具,因此本研究試圖去發展有效的衡量工具以評估議題就本身的發展生命週期是否仍然具有繼續投入的學術價值。
    本研究從許多重要的資料庫中挑選了和資料探勘和資訊檢索相關的論文並且驗證這些在會議論文中所涵蓋的議題將會領導後續幾年期刊論文相似的議題。此外本研究也使用了一些已經存在的演算法並且結合這些演算法發展一個檢測的流程幫助研究者去偵測學術論文中的領導趨勢並發掘學術智慧。本研究使用貝式估計的方法試圖從已經發表的資訊和被引用的資訊來建構估計作者和刊物的影響力的事前機率與概似函數,並且計算出同一領域重要的作者和刊物的影響力,當這些作者和刊物的論文發表時將會相對的具有被觀察的價值,進而檢定這些新興候選議題是否會成為新興議題。而找出的重要研究議題雖然已經縮小探索的範圍,但是仍然有可能是發展成熟的議題使得具有影響力的作者和刊物都必須討論,因此需要評估議題未來潛力的指標或工具。然而目前文獻中對於評估議題成熟的方法僅著重在議題的出現頻率而忽視了議題的新穎度也是重要的指標,另一方面也有只為了找出新議題並沒有顧及這個議題是否具有未來的潛力。更重要的是單一的使用出現頻率的曲線只能在議題已經成熟之後才能確定這是一個重要的議題,使得這種方法成為落後的指標。
    本研究試圖提出解決這些困境的指標進而發展成衡量新興議題潛力的方法。這些指標包含了新穎度指標、發表量指標和偵測點指標,藉由這些指標和曲線可以在新興議題的偵測中提供更多前導性的資訊幫助研究者去建構各自領域中新興議題的偵測標準。偵測點所代表的意義並非這個議題開始新興的正確日期,它代表了這個議題在自己發展的生命週期上最具有研究的潛力和價值的時間點,因此偵測點會根據後來的蓬勃發展而在時間上產生遞延的結果,這表示我們的指標可以偵測出議題生命力的延續。相對於傳統的次數分配曲線可以看出議題的崛起和衰退,本研究的發表量指標更能以生命週期的概念去看出議題在各個時間點的發展潛力。本研究希望從這些過程中所發現的學術智慧可以幫助研究者建構各自領域的議題偵測標準,節省大量人力與時間於探究新興議題。本研究所提出的新方法不僅可以解決影響因子這個指標的缺點,此外還可以使用作者和刊物的影響力去針對一個尚未累積任何索引次數的論文進行潛力偵測,解決Google 學術搜尋目前總是在論文已經被很多檢索之後才能確定論文重要性的缺點,學者總是希望能夠領先發現重要的議題或論文。然而,我們以議題為導向的檢索方法相信可以更確實的滿足研究者在搜尋議題或論文上的需求。
    This research presents endeavors that seek to identify the emerging topics for researchers and pinpoint research intelligence via academic papers. It is intended to reveal the connection between topics investigated by conference papers and journal papers which can help the research decrease the plenty of time and effort to detect all the academic papers. In order to detect the emerging research topics the study uses the Bayesian estimation approach to estimate the impact of the authors and publications may have on a topic and to discover candidate emerging topics by the combination of the impact authors and publications. Finally the research also develops the measurement tools which could assess the research potential of these topics to find the emerging topics.
    This research selected huge of papers in data mining and information retrieval from well-known databases and showed that the topics covered by conference papers in a year often leads to similar topics covered by journal papers in the subsequent year and vice versa. This study also uses some existing algorithms and combination of these algorithms to propose a new detective procedure for the researchers to detect the new trend and get the academic intelligence from conferences and journals. The research uses the Bayesian estimation approach and citation analysis methods to construct the prior distribution and likelihood function of the authors and publications in a topic. Because the topics published by these authors and publications will get more attention and valuable than others. Researchers can assess the potential of these candidate emerging topics. Although the topics we recommend decrease the range of the searching space, these topics may so popular that even all of the impact authors and publications discuss it. The measurement tools or indices are need. But the current methods only focus on the frequency of subjects, and ignore the novelty of subjects which is critical and beyond the frequency study or only focus one of them and without considering the potential of the topics. Some of them only use the curve of published frequency will make the index as a backward one. This research tackles the inadequacy to propose a set of new indices of novelty for emerging topic detection. They are the novelty index (NI) and the published volume index (PVI). These indices are then utilized to determine the detection point (DP) of emerging topics. The detection point (DP) is not the real time which the topic starts to be emerging, but it represents the topic have the highest potential no matter in novelty or hotness for research in its life cycle. Different from the absolute frequent method which can really find the exact emerging period of the topic, the PVI uses the accumulative relative frequency and tries to detect the research potential timing of its life cycle. Following the detection points, the intersection decides the worthiness of a new topic. Readers following the algorithms presented this thesis will be able to decide the novelty and life span of an emerging topic in their field. The novel methods we proposed can improve the limitations of impact factor proposed by ISI. Besides, it uses the impact power of the authors and the publication in a topic to measure the impact power of a paper before it really has been an impact paper can solve the limitations of Google scholar’s approach. We suggest that the topic oriented thinking of our methods can really help the researchers to solve their problems of searching the valuable topics.
    參考文獻: Allan, J., Carbonell, J., Doddington, G., Yamron, J., & Yang, T. (1998). Topic detection and tracking pilot study: Final report. In: Proceedings of the DARPA Broadcast News Transcription an Understanding Workshop.

    Allan, J., Papka, R., & Lavrenko, V., (1998). On-line new event detection and tracking. In: Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval, pp. 37-45.

    Aurora, P. P., Rafael, B. L., & Jose, R. S. (2007). Topic discovery based on text mining techniques. Information Processing & Management, 43, pp. 742-768.

    Berry, M.W. (2004) Survey of text mining-clustering, classification, and retrieval. Springer, pp. 185-224.
    Bolelli, L., Ertekin, S., Zhou, D., & Giles, C. L. (2009). Finding topic trends in digital libraries, In: Proceedings of the 9th ACM/IEEE-CS joint conference on Digital libraries, pp. 69-72.

    Chen, K.Y., Luesukprasert, L., & Chou, S. C. (2007). Hot topic extraction based on timeline analysis and multidimensional sentence modeling. IEEE Transactions on Knowlede and Data Enginerting, 19(8), pp. 1016-1025.

    Chou, T. C., & Chen, M. C. (2008). Using incremental plsi for threshold-resilient online event analysis. IEEE Transactions on Knowlede and Data Enginerting, 20(3), pp. 289-299.
    Clifton,
    C., Cooley, R., & Rennie, J. (2004). Topcat: data mining for topic indentification in a text corpus. IEEE Transactions on Knowlede and Data Enginerting, 16(8), pp. 949-964.

    Cui, C., & Kitagawa, H. (2005). Topic activation analysis for document streams based on document arrival rate and relevance. In: Proceedings of the 2005 ACM symposium on applied computing, pp. 1089-1095.

    Felix, M. A., Benjamin, V. Q., Zaida, C. R., Elena, C. A., Victor, H. S., Francisco J. M. F. (2005). Domain analysis and information retrieval through the construction of heliocentric maps based on ISI-JCR category cocitation. Information Processing & Management, 41(6), pp. 1521-1533.

    Franz, M., & McCarley, J. C. (2001). Unsupervised and supervised clustering for topic tracking. In: Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 310-317.

    Hatzivassiloglou, V., Gravano, L., & Maganti, A. (2000). An investigation of linguistic features and clustering algorithms. In: Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval, pp. 224-231.

    Jin, Y., Myaeng, S. H., & Jung, Y. (2007). Use of place information for improved event tracking. Information Processing & Management, 43, pp. 365-378.

    Jo, Y., Lagoze, C., & Giles, C. L. (2007). Detecting research topics via the correlation between graphs and texts. In: Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, pp.370-379.

    Joachims, T. (1998). Text categorization with Support Vector Machines: learning with many relevant features. In: Proceedings of the EMNLP Conference.

    Kollios, G., Gunopulos, D., Koudas, N., & Berchtold, S. (2003). Efficient biased sampling for approximate clustering and outlier detection in large data sets. IEEE Transactionson Knowlede and Data Enginerting, 15(5), pp. 1170-1187.

    Kleinberg, J. (2002). Bursty and hierarchical structure in streams. In: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 91-101.

    Kuramochi, M., & Karypis, G. (2004). An efficient algorithm for discovering frequent subgraphs. IEEE Transactionson on Knowlede and Data Enginerting, 16(9), pp. 1038-1051.

    Lee, C., Lee, G. G., & J, M. (2007). Dependency structure language model for topic detection and tracking. Information Processing & Management, 43, pp. 1249-1259.

    Lee, Z., Gosain, S., & Im, I. (1997). Topics of interest in IS: evolution of themes and differences between research and practice. Information & Management, 36, pp. 233-246.

    Liu, Y., Niculescu-Mizil, A., & Gryc, W. (2009). Topic-link LDA: joint models of topic and author community, In :Proceedings of the 26th Annual International Conference on Machine Learning, pp. 665-672.

    Malone, J., McGarry, K., & Bowerman, C. (2006). Automated trend analysis of proteomics data using an intelligent data mining architecture, Expert Systems with Applications, 30, pp. 24-33.

    Manmatha, R., Feng, A., & Allan, J. (2002). A critical examination of TDT’s cost function. In: Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 403-404.

    Markkonen, J., Ahonen-Myka, H., & Salmenkivi, M. (2004). Simple semantics in topic detection and tracking. Information Retrieval, 7, pp. 347-368.

    Morinaga, S., & Yamanishi, K. (2004). Tracking dynamics of topic trends using a finite mixture model. In: Proceedings of the 10th ACM SIGKDD international
    conference on Knowledge discovery and data mining, pp.811-816.

    Moulinier, I., Raskinis, G., & Ganascia, J. (1996). Text categorization: A symbolic approach. In: Annual Symposium on Document Analysis and information retrieval (SDAIR).

    Nallapati, R., Ahmed, A., Xing, E. P., & Cohen, W. W. (2008). Joint latent topic models for text and citations. In: Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 542-550.

    Ontrup, J., Ritter, H., Scholz, S. W., & Wagner R. (2008). Detecting, assessing and monitoring relevant topics in virtual information environments. IEEE Transactionson Knowlede and Data Enginerting, 20(7).

    Ozmutlu, H. C., & Cavdur, F. (2005). Application of automatic topic identification on excited web search engine data logs. Information Processing & Management, 41, pp. 1243-1262.

    Ozmutlu, S. (2006). Automatic new topic identification using multiple linear regression. Information Processing & Management, 42, pp. 934-950.

    Porter, M. (1980). An algorithm for suffix stripping. Program (Automated Library and Information Systems), 14(3), pp. 130-137.

    Rosen-Zvi, M., Chemudugunta, C., Griffiths, T., Smyth, P., & Steyvers, M. (2010). Learning author-topic models from text corpora, Transactions on Information Systems, 28 (1).

    Salton, G. (1989). Automatic text processing: The transformation, analysis and retrieval of information by computer, Addison-Wesley, Reading, MA.

    Salton, G., Wong, A., & Yang, C. S. (1975). A vector space model for automatic indexing. Communications of the ACM, 18(11), pp. 613-620.
    Salton, G., & Buckley, C. (1988). Term weighting approaches in automatic text retrieval. Information Processing and Management, 24(5), pp. 513-523.

    Salton, G., & McGill, M. J. (1983). Introduction to modern information retrieval. McGraw Hill Publishing Company.

    Schultz, J. M., & Liberman, M. (1999). Topic detection and tracking using idf-weighted cosine coefficient. In: Proceedings of the DARPA Broadcast News Transcription an Understanding Workshop.

    Schutze, H., Hull, D., & Pedersen, J. (1995). A comparison of classifiers and document representations for the routing problem. In: Proceedings of the 18st annual international ACM SIGIR conference on Research and development in information retrieval, pp.229-237.

    Steyvers, M., Smyth, P., & Griffiths, T. (2004). Probabilistic author topic models for information discovery. In: Proceedings of the 10th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 306-315.

    Stokes, N., & Carthy, J. (2001). Combining semantic and syntactic document classifiers to improve first story detection. In: Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 424-425.

    Swan, R., & Allan, J. (2000). Automatic generation of overview timelines. In: Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval, pp. 49-56.

    Tu, Y. N., & Seng, J. L. (2009). Research Intelligence Involving Information Retrieval – An example of Conferences and Journals, Expert Systems with Applications, 47(6).

    Tu, Y. N., & Seng, J. L. (2010). Indices of Novelty for Emerging Topic Detection. (working paper).

    Tan, P. N., Steinbach, M. & Kumar, V. (2006). Introduction to data mining. Addison-Wesley, pp. 69-84.

    Thelwall, M. (2005). Scientific web intelligence: Finding relationships in university webs, Communications of the ACM, 48(7), pp. 93-96.

    Thelwall, M., & Harries, G. (2004). Do better scholars’ Web publications have significantly higher online impact? Journal of the American Society for Information Science and Technology, 55(2), pp. 149-159.

    Thelwall, M., Vaughan, L., Cothey, V., Li, X., & Smith, A. (2003). Which academic subjects have most online impact? A pilot study and a new classification process, Online Information Review, 27(5), pp. 333-343.

    Tho, Q. T., Hui, S. C., & Fong, A. C. M. (2007). A citation-based document retrieval system for finding research expertise, Information Processing and Management, 43(1), pp. 248-264.

    Walls, F., Jin, H., Sista, S., & Schwartz, R. (1999). Topic detection in broadcast news, In: Proceedings of the DARPA Broadcast News Transcription an Understanding Workshop.

    Wang, X., Zhai, C., Hu, X., & Sproat, R. (2007). Mining correlated bursty topic patterns from coordinated text streams, In: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 784-793.

    Wu, K., Chen, M., & Sun, Y. (2004). Automatic topics discovery from hyperlinked documents, Information Processing & Management, 40, pp. 239-255.

    Yang, H. C., & Lee, C. H. (2004). A text mining approach on automatic generation of web directories and hierarchies, Expert Systems with Applications, 27, pp. 645-663.

    Yang, H. C., & Lee, C. H. (2005). A text mining approach for automatic construction of hypertexts, Expert Systems with Applications, 29, pp. 723-734.

    Yang, Y., Ault, T., Pierce T., & Lattimer, C. W. (2000). Improving text categorization methods for event tracking, In: Proceedings of the 23th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 65-72.

    Yang, Y. & Pedersen, J. (1997). A comparative study on feature selection in text categorization, In: International Conference on Machine Learning.

    Yang, Y. & Wilbur, J. (1996). Using corpus statistics to remove redundant words in text categorization, Journal of the American Society for Information Science, 47(5), pp. 357-369.

    Yang, Y., Zhang, J., Carbonell, J., & Jin, Chun. (2002). Topic-conditioned novelty detection, In: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, pp.688-693.

    Yang, Y., Yoo, S., Zhang, J., & Kisiel, B. (2005). Robustness of adaptive filtering methods in a cross-benchmark evaluation, In: Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 98-105.

    Zhang, Y., Callan, J., & Minka, T. (2002). Novelty and redundancy detection in adaptive filtering, In: Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 81-88.

    Zhang, Y., Surendran, A. C., Platt, J. C., & Narasimhan, M. (2008). Learning from multi-topic web documents for contextual advertisement, In: Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, pp.1051-1059.
    描述: 博士
    國立政治大學
    資訊管理研究所
    94356509
    98
    資料來源: http://thesis.lib.nccu.edu.tw/record/#G0094356509
    数据类型: thesis
    显示于类别:[資訊管理學系] 學位論文

    文件中的档案:

    档案 描述 大小格式浏览次数
    650901.pdf1918KbAdobe PDF2482检视/开启
    650902.pdf1918KbAdobe PDF2408检视/开启


    在政大典藏中所有的数据项都受到原著作权保护.


    社群 sharing

    著作權政策宣告 Copyright Announcement
    1.本網站之數位內容為國立政治大學所收錄之機構典藏,無償提供學術研究與公眾教育等公益性使用,惟仍請適度,合理使用本網站之內容,以尊重著作權人之權益。商業上之利用,則請先取得著作權人之授權。
    The digital content of this website is part of National Chengchi University Institutional Repository. It provides free access to academic research and public education for non-commercial use. Please utilize it in a proper and reasonable manner and respect the rights of copyright owners. For commercial use, please obtain authorization from the copyright owner in advance.

    2.本網站之製作,已盡力防止侵害著作權人之權益,如仍發現本網站之數位內容有侵害著作權人權益情事者,請權利人通知本網站維護人員(nccur@nccu.edu.tw),維護人員將立即採取移除該數位著作等補救措施。
    NCCU Institutional Repository is made to protect the interests of copyright owners. If you believe that any material on the website infringes copyright, please contact our staff(nccur@nccu.edu.tw). We will remove the work from the repository and investigate your claim.
    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - 回馈