政大機構典藏-National Chengchi University Institutional Repository(NCCUR):Item 140.119/131634

English | 正體中文 | 简体中文 | Post-Print筆數 : 27 | Items with full text/Total items : 115393/146433 (79%)
Visitors : 54905051 Online Users : 327

RC Version 6.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.

Scope

please add "double quotation mark" for query phrases to get precise results

please goto advance search for comprehansive author search

Adv. Search

Home ‧ Login ‧ Upload ‧ Help ‧ About ‧ Administer

Goto mobile version

政大機構典藏 > 資訊學院 > 資訊科學系 > 學位論文 > Item 140.119/131634

Please use this identifier to cite or link to this item: https://nccur.lib.nccu.edu.tw/handle/140.119/131634

Title:	以LDA機率模型進行PTT論壇文章主題分類並分析文章留言與文章主題之關聯 Using Latent Dirichlet Allocation Model for Topic Modeling with Articles of PTT Forum and Analyzing Relevance of Article Comments
Authors:	郭泓志 Kuo, Haung-Chi
Contributors:	江玥慧劉昭麟郭泓志 Kuo, Haung-Chi
Keywords:	文件主題模型社群網路分析 PTT Topic Modeling Latent Dirichlet Allocation Latent Dirichlet Allocation PTT Topic Modeling Social Network Analysis
Date:	2020
Issue Date:	2020-09-02 12:16:01 (UTC+8)
Abstract:	隨著科技日新月異，人們在網路上的社群平台與論壇發言越來越普遍，各個國家不同領域的人集合在同一個區域討論分享意見越來越頻繁，但是如何能自動化的分類出每個發言族群討論的內容為一件難事，基於許多分類方法，本研究使用台灣知名的論壇PTT為資料來源，以LDA（Latent Dirichlet Allocation）模型將文章分類出主題群，使用Word2Vec模型分類出回應給同一篇文章的留言之討論主題，觀察其留言與文章主題的關聯性，可作為進一步了解論壇內交流狀況之基礎。 With the rapid development of technology, people`s interaction on social networking platforms becomes more and more common. People from different fields in various countries gather in the same area to discuss and share opinions more and more frequently, but how can classify topics of discussion automatically is a difficult thing. This study uses Taiwan’s well-known online forum PTT as a data source, and adopts the LDA (Latent Dirichlet Allocation) model to classify articles into topic groups. Results of the model are used to further investigate if the comments of an article are related to the article in terms of topic groups. Analyzing the association between the comments and the articles can be used as a basis for further understanding of the communication in the PTT forum.
Reference:	[1] Hong, L., & Davison, B. D. (2010, July). Empirical study of topic modeling in twitter. In Proceedings of the first workshop on social media analytics (pp. 80-88). ACM. [2] Everett, B. (2013). An introduction to latent variable models. Springer Science & Business Media. [3] Landauer, T. K., McNamara, D. S., Dennis, S., & Kintsch, W. (Eds.). (2013). Handbook of latent semantic analysis. Psychology Press. [4] Manning, C., Raghavan, P., & Schütze, H. (2010). Introduction to information retrieval. Natural Language Engineering, 16(1), 100-103. [5] Hofmann, T. (2000). Learning the similarity of documents: An information-geometric approach to document retrieval and categorization. In Advances in neural information processing systems (pp. 914-920). [6] David M. Blei, Andrew Y. Ng, Michael I. Jordan. 2003. Latent Dirichlet Allocation. University of California, United States. [7] T. Mikolov, I. Sutskever, K. Chen, G. Corrado & J.Dean. 2013. Distributed Representations of Words and Phrases and their Compositionality. In Advances in Neural Information Processing Systems (pp.3111–3119) [8] PTT. (1995.9.14). Retrieved December 23, 2019, from https://www.ptt.cc/bbs/index.html [9] Jurafsky, D. (2000). Speech & language processing. Pearson Education India. [10] Nenkova, A., & McKeown, K. (2012). A survey of text summarization techniques. In Mining text data (pp. 43-76). Springer, Boston, MA. [11] Chaffar, S., & Inkpen, D. (2011, May). Using a heterogeneous dataset for emotion analysis in text. In Canadian conference on artificial intelligence (pp. 62-67). Springer, Berlin, Heidelberg. [12] 廖經庭. 2007. BBS 站的客家族群認同建構：以 PTT 「Hakka Dream」版為例. 碩士論文. 國立中央大學, 桃園市, 台灣. [13] 蔣佳峰. 2017. PTT災害事件擷取系統. 碩士論文. 國立中央大學, 桃園市, 台灣. [14] 陳弘君. 2017. 社群媒體中鄉民對於政治議題之迴聲室效應：以PTT八卦版為例. 碩士論文. 私立元智大學, 桃園市, 台灣. [15] J. K. Pritchard, M. Stephens and P. Donnelly. 2000. Inference of Population Structure Using Multilocus Genotype Data. Genetics, 155(2), (pp.945-959). University of Oxford, Oxford OX1 3TG, United Kingdom. [16] Katherine A. Heller, Zoubin Ghahramani. 2001. Bayesian Hierarchical Clustering. University College London 17 Queen Square, London, WC1N 3AR, UK [17] Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society: Series B (Methodological), 39(1), 1-22. [18] Jensen, J. L. W. V. (1906). Sur les fonctions convexes et les inégalités entre les valeurs moyennes. Acta mathematica, 30, 175-193. [19] Kullback, S., & Leibler, R. A. (1951). On information and sufficiency. The annals of mathematical statistics, 22(1), 79-86. [20] 沈裕傑. 2008. 以語句為主之LDA模型於文件摘要之應用Sentence-Based Latent Dirichlet Allocation for Text Summarization. 碩士論文. 國立成功大學, 台南市,台灣. [21] Rajaraman, A., & Ullman, J. D. (2011). Mining of massive datasets. Cambridge University Press. [22] Newman, D., Lau, J. H., Grieser, K., & Baldwin, T. (2010, June). Automatic evaluation of topic coherence. In Human language technologies: The 2010 annual conference of the North American chapter of the association for computational linguistics (pp. 100-108). [23] Moody, C. E. (2016). Mixing dirichlet topic models and word embeddings to make lda2vec. arXiv preprint arXiv:1605.02019. [24] Wang, X., Wei, F., Liu, X., Zhou, M., & Zhang, M. (2011, October). Topic sentiment analysis in twitter: a graph-based hashtag sentiment classification approach. In Proceedings of the 20th ACM international conference on Information and knowledge management (pp. 1031-1040). ACM. [25] Quercia, D., Askham, H., & Crowcroft, J. (2012, June). TweetLDA: supervised topic classification and link prediction in Twitter. In Proceedings of the 4th Annual ACM Web Science Conference (pp. 247-250). ACM. [26] Ramage, D., Hall, D., Nallapati, R., & Manning, C. D. (2009, August). Labeled LDA: A supervised topic model for credit attribution in multi-labeled corpora. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1-Volume 1 (pp. 248-256). Association for Computational Linguistics. [27] Pavitt, C., & Johnson, K. K. (1999). An examination of the coherence of group discussions. Communication Research, 26(3), 303-321. [28] Li, W., Xu, J., He, Y., Yan, S., & Wu, Y. (2019). Coherent comment generation for chinese articles with a graph-to-sequence model. arXiv preprint arXiv:1906.01231. [29] Gensim. (n.d.). Retrieved December 23, 2019, from https://radimrehurek.com/gensim/models/word2vec.html [30] Crummy. (1996). Retrieved December 24, 2019, from https://www.crummy.com/software/BeautifulSoup/ [31] MongoDB. (2009). Retrieved December 31, 2019, from https://www.mongodb.com/ [32] Luhn, H. P. (1958). The automatic creation of literature abstracts. IBM Journal of research and development, 2(2), 159-165. [33] Jieba. (n.d.). Retrieved December 31, 2019, from https://github.com/fxsjy/jieba [34] Wikipedia. (2001). Retrieved May 22, 2020, from https://dumps.wikimedia.org/zhwiki/20200501/ [35] Singhal, A. (2001). Modern information retrieval: A brief overview. IEEE Data Eng. Bull., 24(4), 35-43. [36] Gibbs, N. E., Poole Jr, W. G., & Stockmeyer, P. K. (1975). A Comparison of Several Bandwidth and Profile Reduction Algorithms (No. TR-6). COLLEGE OF WILLIAM AND MARY WILLIAMSBURG VA. [37] Chang, J., Gerrish, S., Wang, C., Boyd-Graber, J. L., & Blei, D. M. (2009). Reading tea leaves: How humans interpret topic models. In Advances in neural information processing systems (pp. 288-296). [38] Newman, D., Lau, J. H., Grieser, K., & Baldwin, T. (2010, June). Automatic evaluation of topic coherence. In Human language technologies: The 2010 annual conference of the North American chapter of the association for computational linguistics (pp. 100-108). [39] Mimno, D., Wallach, H., Talley, E., Leenders, M., & McCallum, A. (2011, July). Optimizing semantic coherence in topic models. In Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing (pp. 262-272). [40] Sievert, C., & Shirley, K. (2014, June). LDAvis: A method for visualizing and interpreting topics. In Proceedings of the workshop on interactive language learning, visualization, and interfaces (pp. 63-70).
Description:	碩士國立政治大學資訊科學系 107753013
Source URI:	http://thesis.lib.nccu.edu.tw/record/#G0107753013
Data Type:	thesis
DOI:	10.6814/NCCU202001523
Appears in Collections:	[資訊科學系] 學位論文

Files in This Item:

File	Description	Size	Format
301301.pdf		4166Kb	Adobe PDF2	32	View/Open

All items in 政大典藏 are protected by copyright, with all rights reserved.

社群 sharing

著作權政策宣告 Copyright Announcement

1.本網站之數位內容為國立政治大學所收錄之機構典藏，無償提供學術研究與公眾教育等公益性使用，惟仍請適度，合理使用本網站之內容，以尊重著作權人之權益。商業上之利用，則請先取得著作權人之授權。
The digital content of this website is part of National Chengchi University Institutional Repository. It provides free access to academic research and public education for non-commercial use. Please utilize it in a proper and reasonable manner and respect the rights of copyright owners. For commercial use, please obtain authorization from the copyright owner in advance.

2.本網站之製作，已盡力防止侵害著作權人之權益，如仍發現本網站之數位內容有侵害著作權人權益情事者，請權利人通知本網站維護人員(nccur@nccu.edu.tw)，維護人員將立即採取移除該數位著作等補救措施。
NCCU Institutional Repository is made to protect the interests of copyright owners. If you believe that any material on the website infringes copyright, please contact our staff(nccur@nccu.edu.tw). We will remove the work from the repository and investigate your claim.

DSpace Software Copyright © 2002-2004 MIT & Hewlett-Packard / Enhanced by NTU Library IR team Copyright © - Feedback