政大機構典藏-National Chengchi University Institutional Repository(NCCUR):Item 140.119/145717
English  |  正體中文  |  简体中文  |  Post-Print筆數 : 27 |  全文筆數/總筆數 : 113822/144841 (79%)
造訪人次 : 51789179      線上人數 : 627
RC Version 6.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
搜尋範圍 查詢小技巧:
  • 您可在西文檢索詞彙前後加上"雙引號",以獲取較精準的檢索結果
  • 若欲以作者姓名搜尋,建議至進階搜尋限定作者欄位,可獲得較完整資料
  • 進階搜尋
    請使用永久網址來引用或連結此文件: https://nccur.lib.nccu.edu.tw/handle/140.119/145717


    題名: 應用主題建模技術探討數位媒體經營策略
    Exploring digital media management strategies using topic modeling techniques
    作者: 賴冠州
    Lai, Kuan-Chou
    貢獻者: 鄭宇庭
    Cheng, Yu-Ting
    賴冠州
    Lai, Kuan-Chou
    關鍵詞: 數位媒體
    自然語言處理
    文章分群
    主題模型
    資料降維
    Digital media
    Natural language processing
    Document clustering
    Topic modeling
    Dimensionality reduction
    日期: 2023
    上傳時間: 2023-07-06 15:19:12 (UTC+8)
    摘要: 隨著現代科技的進步與普及,越來越多人開始依賴網路來取得所需資訊,這 也改變了人們獲取資訊的方式。在這個資訊遍佈的時代,瞭解資訊的結構、內容 以及主題成分變得非常重要。本研究旨在運用 LDA 主題模型,針對數位媒體過 去 2018 至 2022 年共約 56.3 萬篇文章進行分析,以期瞭解文章的主題成分表徵 和各主題分布等洞察,進而探討主題模型在經營上的應用與意涵。

    研究發現,在使用 LDA 主題模型的過程中,詞彙表的大小會直接影響模型 的成效。詞彙表越大,模型的成效就越差。因此,最佳的詞彙表大小為 1000。此 外,經過實驗得知,主題數的選擇也是非常關鍵的,最佳的主題數介於 20 至 30 之間。總結來說,選擇 1000 大小的詞彙表和 20 個主題數,可以有效地進行主題 建模任務。

    另一方面,原文章類別能提供的資訊有限,沒辦法進行有效的文章成效分析。 相比之下,LDA 模型不僅能夠捕捉更細緻地文章主題成分,這些主題資訊更真 實地反映出經營策略和社會脈動的轉變。在經營策略上,數位媒體可以利用 LDA 模型提供的資訊做出更明智的決策,進而提升讀者的閱讀體驗。值得注意的是, 研究結果顯示,平均每篇文章瀏覽數最好的前三名主題分別為娛樂、家庭和台灣 國際關係,而這些面向的商業洞察是過往無法得到的。這些發現對於數位媒體的 經營策略提供了非常有價值的決策依據。

    最後,LDA 模型不僅提供了許多應用情境的可能性,包括延伸閱讀推薦、文 章檢索系統等,還可以進一步結合訪客瀏覽行為資料,進行受眾主題偏好分析、 相似受眾搜尋、個人化推薦和精準廣告投放等,提升數位媒體營運效率。
    With the advancement and popularization of modern technology, more and more people are relying on the internet to obtain the information they need. In this era of abundant information, it has become very important to understand the structure, content, and thematic components of information. This study aims to use topic modeling techniques to analyze a total of approximately 563,000 articles from digital media published from 2018 to 2022, in order to gain insights into the representation of thematic components and the distribution of each topic in the articles, and to explore the applications and implications of topic modeling in business.

    The study found that selecting a vocabulary size of 1000 and a number of topics of 20 can effectively perform the task of topic modeling. On the other hand, the LDA model can not only capture the topics of articles, but also analyze the thematic proportions of articles in more detail, reflecting the changes in business strategies and social trends. In terms of business strategy, digital media can use the information provided by the LDA model to make more informed decisions and enhance readers` reading experience. It is worth noting that the study results show that the top three topics with the best average number of page views are entertainment, family, and Taiwan`s international relations. These findings provide valuable decision-making basis for the business strategies of digital media.

    Finally, the LDA model provides many possibilities for applications, including recommender systems, article retrieval systems, audience thematic preference analysis, etc., enhancing the operational efficiency of digital media.
    參考文獻: 英文文獻
    Angelov, D. (2020). Top2vec: Distributed representations of topics. arXiv preprint arXiv:2008.09470.
    Blei, D. M. (2012). Probabilistic topic models. Communications of the ACM, 55(4), 77-84.
    Blei, D. M., & Jordan, M. I. (2004). Variational methods for the Dirichlet process. Proceedings of the twenty-first international conference on Machine learning,
    Blei, D. M., Kucukelbir, A., & McAuliffe, J. D. (2017). Variational inference: A review for statisticians. Journal of the American statistical Association, 112(518), 859-877.
    Blei, D. M., & Lafferty, J. D. (2006). Dynamic topic models. Proceedings of the 23rd international conference on Machine learning,
    Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent dirichlet allocation. Journal of Machine Learning Research, 3(Jan), 993-1022.
    Broderick, T., Boyd, N., Wibisono, A., Wilson, A. C., & Jordan, M. I. (2013). Streaming variational bayes. advances in neural information processing systems, 26.
    Chen, X., Hu, X., Shen, X., & Rosen, G. (2010). Probabilistic topic modeling for genomic data interpretation. 2010 IEEE international conference on bioinformatics and biomedicine (BIBM),
    Cho, K., Van Merriënboer, B., Bahdanau, D., & Bengio, Y. (2014). On the properties of neural machine translation: Encoder-decoder approaches. arXiv preprint arXiv:1409.1259.
    Chung, J., Gulcehre, C., Cho, K., & Bengio, Y. (2014). Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555.
    Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
    GitHub. (2017). Stop Words. GitHub. https://github.com/goto456/stopwords.
    Graves, A., Jaitly, N., & Mohamed, A.-r. (2013). Hybrid speech recognition with deep
    bidirectional LSTM. 2013 IEEE workshop on automatic speech recognition
    and understanding,
    Griffiths, T. L., & Steyvers, M. (2004). Finding scientific topics. Proceedings of the
    National academy of Sciences, 101(suppl_1), 5228-5235.
    Grootendorst, M. (2022). BERTopic: Neural topic modeling with a class-based TF-IDF procedure. arXiv preprint arXiv:2203.05794.
    Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural computation, 9(8), 1735-1780.
    Hoffman, M., Bach, F., & Blei, D. (2010). Online learning for latent dirichlet
    allocation. advances in neural information processing systems, 23. Hoffman, M. D., Blei, D. M., Wang, C., & Paisley, J. (2013). Stochastic variational
    inference. Journal of Machine Learning Research.
    Huang, Z., Xu, W., & Yu, K. (2015). Bidirectional LSTM-CRF models for sequence tagging. arXiv preprint rXiv:1508.01991.
    Konietzny, S. G., Dietz, L., & McHardy, A. C. (2011). Inferring functional modules of protein families with probabilistic topic models. BMC bioinformatics, 12, 1-14.
    Li, P.-H., & Ma, W. (2019). CkipTagger. GitHub.
    https://github.com/ckiplab/ckiptagger.
    Liu, B., Liu, L., Tsykin, A., Goodall, G. J., Green, J. E., Zhu, M., Kim, C. H., & Li, J. (2010). Identifying functional miRNA–mRNA regulatory modules with correspondence latent dirichlet allocation. Bioinformatics, 26(24), 3105-3111. Liu, C., Jin, T., Hoi, S. C., Zhao, P., & Sun, J. (2017). Collaborative topic regression for online recommender systems: an online and Bayesian approach. Machine Learning, 106, 651-670.
    McInnes, L., Healy, J., & Melville, J. (2018). Umap: Uniform manifold approximation and projection for dimension reduction. arXiv preprint arXiv:1802.03426.
    Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781.
    Moody, C. E. (2016). Mixing dirichlet topic models and word embeddings to make lda2vec. arXiv preprint arXiv:1605.02019.
    Olah, C. (2015). Understanding lstm networks. https://colah.github.io/posts/2015-08-Understanding-LSTMs/
    Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., & Dubourg, V. (2011). Scikit-learn: Machine learning in Python. the Journal of machine Learning research, 12, 2825-2830.
    Porteous, I., Newman, D., Ihler, A., Asuncion, A., Smyth, P., & Welling, M. (2008). Fast collapsed gibbs sampling for latent dirichlet allocation. Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining,
    Schuster, M., & Paliwal, K. K. (1997). Bidirectional recurrent neural networks. IEEE 43
    transactions on Signal Processing, 45(11), 2673-2681.
    Siami-Namini, S., Tavakoli, N., & Namin, A. S. (2019). The performance of LSTM and BiLSTM in forecasting time series. 2019 IEEE International Conference on Big Data (Big Data),
    Teh, Y., Jordan, M., Beal, M., & Blei, D. (2004). Sharing clusters among related groups: Hierarchical Dirichlet processes. advances in neural information processing systems, 17.
    Van der Maaten, L., & Hinton, G. (2008). Visualizing data using t-SNE. Journal of Machine Learning Research, 9(11).
    Wang, C., & Blei, D. M. (2011). Collaborative topic modeling for recommending scientific articles. Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining,
    Wang, C., Paisley, J., & Blei, D. M. (2011). Online variational inference for the hierarchical Dirichlet process. Proceedings of the fourteenth international
    conference on artificial intelligence and statistics,
    Wang, H., Wang, N., & Yeung, D.-Y. (2015). Collaborative deep learning for recommender systems. Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining,
    Wattenberg, M., Viégas, F., & Johnson, I. (2016). How to use t-SNE effectively.
    Distill, 1(10), e2. https://distill.pub/2016/misread-tsne/ Yang, M., & Ma, W. (2022). CkipTransformer. GitHub.
    https://github.com/ckiplab/ckip-transformers.

    中文文獻
    台灣數位媒體應用暨行銷協會. (2022). 2021 台灣數位廣告統計報告.
    https://www.magazine.org.tw/uploads/editors/hide_article_list/165543710352.pdf
    資誠聯合會計師事務所. (2022). 2022-2026 台灣娛樂暨媒體業展望.
    https://www.pwc.tw/zh/publications/topic-report/assets/taiwan-entertainment- and-media-outlook-2022-2026.pdf
    描述: 碩士
    國立政治大學
    企業管理研究所(MBA學位學程)
    106363079
    資料來源: http://thesis.lib.nccu.edu.tw/record/#G0106363079
    資料類型: thesis
    顯示於類別:[企業管理研究所(MBA學位學程)] 學位論文

    文件中的檔案:

    沒有與此文件相關的檔案.



    在政大典藏中所有的資料項目都受到原著作權保護.


    社群 sharing

    著作權政策宣告 Copyright Announcement
    1.本網站之數位內容為國立政治大學所收錄之機構典藏,無償提供學術研究與公眾教育等公益性使用,惟仍請適度,合理使用本網站之內容,以尊重著作權人之權益。商業上之利用,則請先取得著作權人之授權。
    The digital content of this website is part of National Chengchi University Institutional Repository. It provides free access to academic research and public education for non-commercial use. Please utilize it in a proper and reasonable manner and respect the rights of copyright owners. For commercial use, please obtain authorization from the copyright owner in advance.

    2.本網站之製作,已盡力防止侵害著作權人之權益,如仍發現本網站之數位內容有侵害著作權人權益情事者,請權利人通知本網站維護人員(nccur@nccu.edu.tw),維護人員將立即採取移除該數位著作等補救措施。
    NCCU Institutional Repository is made to protect the interests of copyright owners. If you believe that any material on the website infringes copyright, please contact our staff(nccur@nccu.edu.tw). We will remove the work from the repository and investigate your claim.
    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - 回饋