Loading...
|
Please use this identifier to cite or link to this item:
https://nccur.lib.nccu.edu.tw/handle/140.119/145072
|
Title: | 應用主題建模技術探討數位媒體經營策略 Exploring digital media management strategies using topic modeling techniques |
Authors: | 賴冠州 Lai, Kuan-Chou |
Contributors: | 鄭宇庭 Cheng, Yu-Ting 賴冠州 Lai, Kuan-Chou |
Keywords: | 數位媒體 自然語言處理 文章分群 主題模型 資料降維 Digital media Natural language processing Document clustering Topic modeling Dimensionality reduction |
Date: | 2023 |
Issue Date: | 2023-06-02 11:42:08 (UTC+8) |
Abstract: | 隨著現代科技的進步與普及,越來越多人開始依賴網路來取得所需資訊,這 也改變了人們獲取資訊的方式。在這個資訊遍佈的時代,瞭解資訊的結構、內容 以及主題成分變得非常重要。本研究旨在運用 LDA 主題模型,針對數位媒體過 去 2018 至 2022 年共約 56.3 萬篇文章進行分析,以期瞭解文章的主題成分表徵 和各主題分布等洞察,進而探討主題模型在經營上的應用與意涵。
研究發現,在使用 LDA 主題模型的過程中,詞彙表的大小會直接影響模型 的成效。詞彙表越大,模型的成效就越差。因此,最佳的詞彙表大小為 1000。此 外,經過實驗得知,主題數的選擇也是非常關鍵的,最佳的主題數介於 20 至 30 之間。總結來說,選擇 1000 大小的詞彙表和 20 個主題數,可以有效地進行主題 建模任務。
另一方面,原文章類別能提供的資訊有限,沒辦法進行有效的文章成效分析。 相比之下,LDA 模型不僅能夠捕捉更細緻地文章主題成分,這些主題資訊更真 實地反映出經營策略和社會脈動的轉變。在經營策略上,數位媒體可以利用 LDA 模型提供的資訊做出更明智的決策,進而提升讀者的閱讀體驗。值得注意的是, 研究結果顯示,平均每篇文章瀏覽數最好的前三名主題分別為娛樂、家庭和台灣 國際關係,而這些面向的商業洞察是過往無法得到的。這些發現對於數位媒體的 經營策略提供了非常有價值的決策依據。
最後,LDA 模型不僅提供了許多應用情境的可能性,包括延伸閱讀推薦、文 章檢索系統等,還可以進一步結合訪客瀏覽行為資料,進行受眾主題偏好分析、 相似受眾搜尋、個人化推薦和精準廣告投放等,提升數位媒體營運效率。 With the advancement and popularization of modern technology, more and more people are relying on the internet to obtain the information they need. In this era of abundant information, it has become very important to understand the structure, content, and thematic components of information. This study aims to use topic modeling techniques to analyze a total of approximately 563,000 articles from digital media published from 2018 to 2022, in order to gain insights into the representation of thematic components and the distribution of each topic in the articles, and to explore the applications and implications of topic modeling in business.
The study found that selecting a vocabulary size of 1000 and a number of topics of 20 can effectively perform the task of topic modeling. On the other hand, the LDA model can not only capture the topics of articles, but also analyze the thematic proportions of articles in more detail, reflecting the changes in business strategies and social trends. In terms of business strategy, digital media can use the information provided by the LDA model to make more informed decisions and enhance readers` reading experience. It is worth noting that the study results show that the top three topics with the best average number of page views are entertainment, family, and Taiwan`s international relations. These findings provide valuable decision-making basis for the business strategies of digital media.
Finally, the LDA model provides many possibilities for applications, including recommender systems, article retrieval systems, audience thematic preference analysis, etc., enhancing the operational efficiency of digital media. |
Reference: | 英文文獻
Angelov, D. (2020). Top2vec: Distributed representations of topics. arXiv preprint arXiv:2008.09470.
Blei, D. M. (2012). Probabilistic topic models. Communications of the ACM, 55(4), 77-84.
Blei, D. M., & Jordan, M. I. (2004). Variational methods for the Dirichlet process. Proceedings of the twenty-first international conference on Machine learning,
Blei, D. M., Kucukelbir, A., & McAuliffe, J. D. (2017). Variational inference: A review for statisticians. Journal of the American statistical Association, 112(518), 859-877.
Blei, D. M., & Lafferty, J. D. (2006). Dynamic topic models. Proceedings of the 23rd international conference on Machine learning,
Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent dirichlet allocation. Journal of Machine Learning Research, 3(Jan), 993-1022.
Broderick, T., Boyd, N., Wibisono, A., Wilson, A. C., & Jordan, M. I. (2013). Streaming variational bayes. advances in neural information processing systems, 26.
Chen, X., Hu, X., Shen, X., & Rosen, G. (2010). Probabilistic topic modeling for genomic data interpretation. 2010 IEEE international conference on bioinformatics and biomedicine (BIBM),
Cho, K., Van Merriënboer, B., Bahdanau, D., & Bengio, Y. (2014). On the properties of neural machine translation: Encoder-decoder approaches. arXiv preprint arXiv:1409.1259.
Chung, J., Gulcehre, C., Cho, K., & Bengio, Y. (2014). Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555.
Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
GitHub. (2017). Stop Words. GitHub. https://github.com/goto456/stopwords.
Graves, A., Jaitly, N., & Mohamed, A.-r. (2013). Hybrid speech recognition with deep
bidirectional LSTM. 2013 IEEE workshop on automatic speech recognition
and understanding,
Griffiths, T. L., & Steyvers, M. (2004). Finding scientific topics. Proceedings of the
National academy of Sciences, 101(suppl_1), 5228-5235.
Grootendorst, M. (2022). BERTopic: Neural topic modeling with a class-based TF-IDF procedure. arXiv preprint arXiv:2203.05794.
Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural computation, 9(8), 1735-1780.
Hoffman, M., Bach, F., & Blei, D. (2010). Online learning for latent dirichlet
allocation. advances in neural information processing systems, 23. Hoffman, M. D., Blei, D. M., Wang, C., & Paisley, J. (2013). Stochastic variational
inference. Journal of Machine Learning Research.
Huang, Z., Xu, W., & Yu, K. (2015). Bidirectional LSTM-CRF models for sequence tagging. arXiv preprint rXiv:1508.01991.
Konietzny, S. G., Dietz, L., & McHardy, A. C. (2011). Inferring functional modules of protein families with probabilistic topic models. BMC bioinformatics, 12, 1-14.
Li, P.-H., & Ma, W. (2019). CkipTagger. GitHub.
https://github.com/ckiplab/ckiptagger.
Liu, B., Liu, L., Tsykin, A., Goodall, G. J., Green, J. E., Zhu, M., Kim, C. H., & Li, J. (2010). Identifying functional miRNA–mRNA regulatory modules with correspondence latent dirichlet allocation. Bioinformatics, 26(24), 3105-3111. Liu, C., Jin, T., Hoi, S. C., Zhao, P., & Sun, J. (2017). Collaborative topic regression for online recommender systems: an online and Bayesian approach. Machine Learning, 106, 651-670.
McInnes, L., Healy, J., & Melville, J. (2018). Umap: Uniform manifold approximation and projection for dimension reduction. arXiv preprint arXiv:1802.03426.
Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781.
Moody, C. E. (2016). Mixing dirichlet topic models and word embeddings to make lda2vec. arXiv preprint arXiv:1605.02019.
Olah, C. (2015). Understanding lstm networks. https://colah.github.io/posts/2015-08-Understanding-LSTMs/
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., & Dubourg, V. (2011). Scikit-learn: Machine learning in Python. the Journal of machine Learning research, 12, 2825-2830.
Porteous, I., Newman, D., Ihler, A., Asuncion, A., Smyth, P., & Welling, M. (2008). Fast collapsed gibbs sampling for latent dirichlet allocation. Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining,
Schuster, M., & Paliwal, K. K. (1997). Bidirectional recurrent neural networks. IEEE 43
transactions on Signal Processing, 45(11), 2673-2681.
Siami-Namini, S., Tavakoli, N., & Namin, A. S. (2019). The performance of LSTM and BiLSTM in forecasting time series. 2019 IEEE International Conference on Big Data (Big Data),
Teh, Y., Jordan, M., Beal, M., & Blei, D. (2004). Sharing clusters among related groups: Hierarchical Dirichlet processes. advances in neural information processing systems, 17.
Van der Maaten, L., & Hinton, G. (2008). Visualizing data using t-SNE. Journal of Machine Learning Research, 9(11).
Wang, C., & Blei, D. M. (2011). Collaborative topic modeling for recommending scientific articles. Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining,
Wang, C., Paisley, J., & Blei, D. M. (2011). Online variational inference for the hierarchical Dirichlet process. Proceedings of the fourteenth international
conference on artificial intelligence and statistics,
Wang, H., Wang, N., & Yeung, D.-Y. (2015). Collaborative deep learning for recommender systems. Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining,
Wattenberg, M., Viégas, F., & Johnson, I. (2016). How to use t-SNE effectively.
Distill, 1(10), e2. https://distill.pub/2016/misread-tsne/ Yang, M., & Ma, W. (2022). CkipTransformer. GitHub.
https://github.com/ckiplab/ckip-transformers.
中文文獻
台灣數位媒體應用暨行銷協會. (2022). 2021 台灣數位廣告統計報告.
https://www.magazine.org.tw/uploads/editors/hide_article_list/165543710352.pdf
資誠聯合會計師事務所. (2022). 2022-2026 台灣娛樂暨媒體業展望.
https://www.pwc.tw/zh/publications/topic-report/assets/taiwan-entertainment- and-media-outlook-2022-2026.pdf |
Description: | 碩士 國立政治大學 企業管理研究所(MBA學位學程) 106363079 |
Source URI: | http://thesis.lib.nccu.edu.tw/record/#G0106363079 |
Data Type: | thesis |
Appears in Collections: | [MBA Program] Theses
|
Files in This Item:
File |
Size | Format | |
index.html | 0Kb | HTML2 | 139 | View/Open |
|
All items in 政大典藏 are protected by copyright, with all rights reserved.
|