English  |  正體中文  |  简体中文  |  Post-Print筆數 : 27 |  Items with full text/Total items : 116849/147881 (79%)
Visitors : 64138771      Online Users : 609
RC Version 6.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
Scope Tips:
  • please add "double quotation mark" for query phrases to get precise results
  • please goto advance search for comprehansive author search
  • Adv. Search
    HomeLoginUploadHelpAboutAdminister Goto mobile version
    Please use this identifier to cite or link to this item: https://nccur.lib.nccu.edu.tw/handle/140.119/157128


    Title: 以階層式狄利克雷歷程混合模型進行文本分析
    Application of Hierarchical Dirichlet Process Mixture Model in Text Analysis
    Authors: 楊立行
    Contributors: 心理系
    Keywords: 文本分析;主題模型;階層式狄利克雷歷程混合模型
    Text Analysis;LDA;Hierarchical Dirichlet Process Mixture Model
    Date: 2022-03
    Issue Date: 2025-05-28 14:06:55 (UTC+8)
    Abstract: 拜機率模型與演算法精進之賜,現代文本分析(text analysis)大幅提升了研究者可以處理文字資料的能力。除了從文本中抽取結構特徵,例如關鍵詞,主題模型(topic model)更可以抽取文本的內容特徵,即主題,根據詞彙之間的共現性(co-occurrence)提供研究者對本文在意義層次上的了解。著名的主題模型LDA(Latent Dirichlet Allocation)奠基於狄利克雷歷程混合模型(Dirichlet Process Mixture Model),將主題視為機率分配模型,並透過貝氏推論(Bayesian inference)估計最適合的主題個數以及每個主題模型的參數,以最大可能地預測詞彙在文本中出現的機率分配。然而,殊為可惜的是LDA並無法針對多群文本內容進行主題異同的比較。相對地,階層式狄利克雷歷程混合模型(Hierarchical Dirichlet Process Mixture Model)可以同時比較多群資料。因此,本計畫建議以階層式狄利克雷歷程混合模型做為比較多群文本主題的主題模型。為檢視此一構想,本計畫認為階層式狄利克雷歷程混合模型應能達成三個目標。第一、應能呈現多群文本之間的主題差異。第二、應能呈現多群文本之間的主題一致性。第三、應能捕捉主題依時改變的軌跡。先導研究結果顯示模型確實可以偵測兩群文本主題之間的差異。延續先導研究,本計畫擬以三種不同資料庫分別檢視上述三個目標達成的可能性,並希望藉此作為改進文字資料分析工具的建議。
    Text analysis has become a powerful analysis tool by virtue of the progress of probability models and computational algorithms. Not only can text analysis extract the structural features of texts (e.g., key words), but also it can extract the features of contents (i.e., topics). The model used to extract the topics of texts is call topic model. The most famous topic model is LDA (Latent Dirichlet Allocation), which based on the Dirichlet process treats a text as a collection of topics and topics as a collection of words. LDA assumes that each topic is a distribution over the words in it and different topics correspond to different distributions. The parameters of each topic distribution is estimated by Bayesian inference. However, LDA cannot deal with the comparison between the topics of different sets of texts. In order to extend the utility of topic model in social sciences, it is suggested in this proposal to substitute the hierarchical Dirichlet process mixture model (HDPMM) for LDA, as its hierarchical structure allows different data sets to share the same base models. In order to examine to what extent HDPMM when being used as a topic model can deal with the comparison on topics between different text sets, three main goals are proposed here. First, HDPMM should be able to reveal the difference on topics between text sets. Second, HDPMM should be able to reveal the consistence on topics between text sets. Third, HDPMM should be able to reveal the change along time on topics within the same text set. A pilot study provides a positive support to this research idea that HDPMM can distinguish between the topics of two text sets. Following the pilot study, HDPMM will be applied to three different databases for examining the plausibility to use HDPMM as a topic model.
    Relation: 科技部, MOST109-2410-H004-078, 109.08-110.07
    Data Type: report
    Appears in Collections:[心理學系] 國科會研究計畫

    Files in This Item:

    File Description SizeFormat
    index.html0KbHTML39View/Open


    All items in 政大典藏 are protected by copyright, with all rights reserved.


    社群 sharing

    著作權政策宣告 Copyright Announcement
    1.本網站之數位內容為國立政治大學所收錄之機構典藏,無償提供學術研究與公眾教育等公益性使用,惟仍請適度,合理使用本網站之內容,以尊重著作權人之權益。商業上之利用,則請先取得著作權人之授權。
    The digital content of this website is part of National Chengchi University Institutional Repository. It provides free access to academic research and public education for non-commercial use. Please utilize it in a proper and reasonable manner and respect the rights of copyright owners. For commercial use, please obtain authorization from the copyright owner in advance.

    2.本網站之製作,已盡力防止侵害著作權人之權益,如仍發現本網站之數位內容有侵害著作權人權益情事者,請權利人通知本網站維護人員(nccur@nccu.edu.tw),維護人員將立即採取移除該數位著作等補救措施。
    NCCU Institutional Repository is made to protect the interests of copyright owners. If you believe that any material on the website infringes copyright, please contact our staff(nccur@nccu.edu.tw). We will remove the work from the repository and investigate your claim.
    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - Feedback