政大機構典藏-National Chengchi University Institutional Repository(NCCUR):Item 140.119/111784
English  |  正體中文  |  简体中文  |  Post-Print筆數 : 27 |  Items with full text/Total items : 113451/144438 (79%)
Visitors : 51243975      Online Users : 917
RC Version 6.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
Scope Tips:
  • please add "double quotation mark" for query phrases to get precise results
  • please goto advance search for comprehansive author search
  • Adv. Search
    HomeLoginUploadHelpAboutAdminister Goto mobile version
    Please use this identifier to cite or link to this item: https://nccur.lib.nccu.edu.tw/handle/140.119/111784


    Title: 結合中文斷詞系統與雙分群演算法於音樂相關臉書粉絲團之分析:以KKBOX為例
    Combing Chinese text segmentation system and co-clustering algorithm for analysis of music related Facebook fan page: A case of KKBOX
    Authors: 陳柏羽
    Chen, Po Yu
    Contributors: 徐國偉
    Hsu, Kuo Wei
    陳柏羽
    Chen, Po Yu
    Keywords: 雙分群
    中文斷詞
    臉書粉絲專頁貼文
    Co-clustering
    Chinese text segmentation system
    Facebook fan page
    Date: 2017
    Issue Date: 2017-08-10 09:58:23 (UTC+8)
    Abstract: 近年智慧型手機與網路的普及,使得社群網站與線上串流音樂蓬勃發展。臉書(Facebook)用戶截至去年止每月總體平均用戶高達18.6億人 ,粉絲專頁成為公司企業特別關注的行銷手段。粉絲專頁上的貼文能夠在短時間內經過點閱、分享傳播至用戶的頁面,達到比起電視廣告更佳的效果,也節省了許多的成本。本研究提供了一套針對臉書粉絲專頁貼文的分群流程,考量到貼文字詞的複雜性,除了抓取了臉書粉絲專頁的貼文外,也抓取了與其相關的KKBOX網頁資訊,整合KKBOX網頁中的資料,對中文斷詞系統(Jieba)的語料庫進行擴充,以提高斷詞的正確性,接著透過雙分群演算法(Minimum Squared Residue Co-Clustering Algorithm)對貼文進行分群,並利用鑑別率(Discrimination Rate)與凝聚率(Agglomerate Rate)配合主成份分析(Principal Component Analysis)所產生的分佈圖來對分群結果進行評估,選出較佳的分群結果進一步去分析,進而找出分類的根據。在結果中,發現本研究的方法能夠有效的區分出不同類型的貼文,甚至能夠依據使用字詞、語法或編排格式的不同來進行分群。
    In recent years, because both smartphones and the Internet have become more popular, social network sites and music streaming services have grown vigorously. The monthly average of Facebook users hit 1.86 billion last years and Facebook Fan Page has become a popular marketing tool. Posts on Facebook can be broadcasted to millions of people in a short period of time by LIKEing and SHAREing pages. Using Facebook Fan Page as a marketing tool is more effective than advertising on television and can definitely reduce the costs. This study presents a process to cluster posts on Facebook Fan Page. Considering the complicated word usage, we grasped information on Facebook Fan Page and related information on the KKBOX website. First, we integrated the information on the website of KKBOX and expanded the text corpus of Jibea to enhance the accuracy of word segmentation. Then, we clustered the posts into several groups through Minimum Squared Residue Co-Clustering Algorithm and used discrimination Rate and Agglomerate Rate to analyze the distribution chart of Principal Component Analysis. After that, we found the suitable classification and could further analyze it. How posts are classified can then be found. As a result, we found that the method of this study can effectively cluster different kinds of posts and even cluster these posts according to its words, syntax and arrangement.
    Reference: [1] 蕭世平,“台灣地區線上音樂會員使用狀況與業者行銷策略研究”,南臺科技大學資訊傳播研究所碩士論文,2007。
    [2] 鄭博元,“設計與實作一個臉書粉絲頁資料抓取器”,政治大學資訊科學研究所碩士論文,2015。
    [3] 陳稼興, 謝佳倫, & 許芳誠,“以遺傳演算法為基礎的中文斷詞研究”,資訊管理研究第二卷第二期,pp. 27-44,2000。
    [4] 王瑞平,“應用平行語料建構中文斷詞組件”,政治大學資訊科學研究所碩士論文,2012。
    [5] Tsai, Y. F., & Chen, K. J.,“Reliable and Cost-Effective Pos-Tagging”, International Journal of Computational Linguistics & Chinese Language Processing, Vol. 9 #1, pp. 83-96, 2004.
    [6] Ma, W. Y., & Chen, K. J.,“A Bottom-up Merging Algorithm for Chinese Unknown Word Extraction”, Proceedings of ACL, Second SIGHAN Workshop on Chinese Language Processing, pp. 31-38, 2003.
    [7] Ma, W. Y., & Chen, K. J.,“Introduction to CKIP Chinese Word Segmentation System for the First International Chinese Word Segmentation Bakeoff”, Proceedings of ACL, Second SIGHAN Workshop on Chinese Language Processing, pp. 168-171, 2003.
    [8] 黃俊堯,“看懂,然後知輕重。「互聯網+」的10堂必修課”,pp. 21-29,台北:先覺出版社,2015。
    [9] 張家寧,“以概念萃取為基礎之文件分群與視覺化”,交通大學資訊科學與工程研究所碩士論文,2006。
    [10] 徐俊傑,“網際網路資訊應用研究”,台灣科技大學資訊管理系行政院國家科學委員會專題研究計畫,2007。
    [11] Hartigan, J. A.,“Direct Clustering of a Data Matrix”, Journal of the American Statistical Association Volume 67, Issue 337, 1972.
    [12] 陳貫中,“以雙分群方法分析基因微矩陣資料”,交通大學資訊科學與工程研究所碩士論文,2006。
    [13] 張智愷,“基於動態調整權重之co-cluster演算法”,交通大學資訊科學與工程研究所碩士論文,2011。
    [14] Mirkin, B.,“Mathematical Classification and Clustering”, Kluwer Academic Publishers,1996.
    [15] Dhillon, I. S.,“Co-clustering documents and words using bipartite spectral graph partitioning”, in Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining, ser. KDD ’01, pp. 269–274, 2001.
    [16] Dhillon, I. S., Mallela, S., & Modha, D. S.,“Information-theoretic co-clustering”, in Proceedings of the ninth ACM SIGKDD international conference on KKluwer Academic Publishersnowledge discovery and data mining, pp. 89–98, 2003.
    [17] Kwon, B., & Cho, H.,“Scalable Co-Clustering Algorithm”, Algorithms and Architectures for Parallel Processing, Lecture Notes in Computer Science, Vol. 6081, pp. 32–43, 2010.
    [18] Cho, H., Dhillon, I. S., Guan, Y., & Sra, S.,“Minimum sum-squared residue co-clustering of gene expression data”, in Proceedings of the fourth SIAM international conference on data mining, 2004.
    [19] Cho, H., & Dhillon, I. S.,“Coclustering of Human Cancer Microarrays Using Minimum Sum-Squared Residue Coclustering”, IEEE/ACM Transactions on Computational Biology and Bioinformatics, Vol. 5, NO. 3, 2008.
    [20] Cheng, Y., & Church, G. M., “Biclustering of Expression Data”, in Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology, Vol. 8, pp. 93-103, 2000.
    [21] Martínez, A. M., & Kak, A. C.,“Pca versus lda”, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 23, No. 2, pp. 228-233, 2001.
    [22] Zhang, Y., & Wu, L.,“An MR brain images classifier via principal component analysis and kernel support vector machine”, Progress In Electromagnetics Research 130, pp. 369-388, 2012.
    [23] 林育臣,“群聚技術之研究”,朝陽科技大學資訊管理研究所碩士論文,2002。
    [24] 陳榮昌,“群聚演算法及群聚參數的分析與探討”,朝陽科技大學資訊管理研究所碩士論文,2003。
    [25] 吳振銘, “應用改良式 K-means 分群法於個人化音樂推薦服務系統之實現”,高雄應用科技大學電子工程系研究所碩士論文,2012。
    [26] Mihalcea, R., & Tarau, P.,“TextRank: Bringing Order into Texts”, Proceedings of the Conference on Empirical Methods in Natural Language Processing, Vol. 4, pp. 404-411, 2004.
    [27] De Choudhury, M., Gamon, M., Counts, S., & Horvitz, E.,“Predicting depression via social media”, In Proceedings of the 7th International AAAI Conference on Weblogs and Social Media, 13, pp. 1-10, 2013.
    [28] Yin, J., Lampert, A., Cameron, M., Robinson, B., & Power, R.,“Using social media to enhance emergency situation awareness”, IEEE Intelligent Systems, 27(6), pp. 52-59, 2012.
    [29] Benson, E., Haghighi, A., & Barzilay, R.,“Event discovery in social media feeds”, Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies-Volume 1, Association for Computational Linguistics, pp. 389-398, 2011.
    [30] Girvan, M., & Newman, M. E.,“Community structure in social and biological networks”, Proceedings of the national academy of sciences, 99(12), pp. 7821-7826, 2002.
    [31] Pohl, D., Bouchachia, A., & Hellwagner, H.,“Online indexing and clustering of social media data for emergency management”, Neurocomputing, 172, pp. 168-179, 2016.
    [32] Papadopoulos, S., Kompatsiaris, Y., Vakali, A., & Spyridonos, P., “Community detection in social media”, Data Mining and Knowledge Discovery, 24(3), pp.515-554, 2012.
    [33] Azizifard, N., “Social Network Clustering”, International Journal of Information Technology and Computer Science, 6(1), 76, 2013.
    [34] Reuter, T., Cimiano, P., Drumond, L., Buza, K., & Schmidt-Thieme, L., “Scalable Event-Based Clustering of Social Media Via Record Linkage Techniques”, In Fifth International AAAI Conference on Weblogs and Social Media, 2011.
    [35] 吳怡瑾,方友杉, & 喻欣凱,“運用文件分群與概念關聯分析技術協助網誌瀏覽: 任務導向評估方法”,輔仁大學資訊管理學研究所,圖書資訊學研究,4(1), pp. 133-164, 2009.
    [36] Becker, H., Naaman, M., & Gravano, L.,“Learning similarity metrics for event identification in social media”, In Proceedings of the third ACM international conference on Web search and data mining, pp. 291-300, 2010.
    [37] 蔡宜龍,“特殊領域文件分群之系統設計與研究--以佛學資料為例”,國立臺灣大學資訊工程研究所碩士論文,未出版論文,2002。
    [38] Ferrara, E., JafariAsbagh, M., Varol, O., Qazvinian, V., Menczer, F., & Flammini, A.,“Clustering memes in social media”, In Advances in social networks analysis and mining, IEEE/ACM international conference on pp. 548-555, 2013.
    [39] Wang, X., Tang, L., Gao, H., & Liu, H.,“Discovering overlapping groups in social media”, In Data Mining, 2010 IEEE 10th International Conference on pp. 569-578, 2010.
    [40] 尹其言, & 楊建民,“應用文件分群與文字探勘技術於機器學習領域趨勢分析以 SSCI 資料庫為例”, 長榮大學學報, 14(2), pp. 1-16, 2010.
    [41] Steinbach, M., Karypis, G., & Kumar, V.,“A comparison of document clustering techniques”, In KDD workshop on text mining, Vol. 400, No. 1, pp. 525-526, 2000.
    [42] Hotho, A., Staab, S., & Stumme, G.,“Ontologies improve text document clustering”, In Data Mining, ICDM 2003. Third IEEE International Conference on pp. 541-544, 2003
    [43] 黃純敏,陳聰宜, & 詹雅筑,“新聞事件偵測與追蹤之分群分類演算法研究”, 資訊科技國際期刊, 8(1), pp. 1-9, 2014
    [44] Ting, X., & Jufang, L.,“A Comparative Study between Single-Pass Algorithm and K-means Algorithm in Web Topic Detection.”, 中國國防科學技術大學信息系統與管理學院, 2014.
    [45] Willett, P.,“Recent trends in hierarchic document clustering: a critical review”, Information Processing & Management, 24(5), pp. 577-597, 1988.
    [46] Yan, Y., Chen, L., & Tjhi, W. C.,“Fuzzy semi-supervised co-clustering for text documents”, Fuzzy Sets and Systems, 215, pp. 74-89, 2012.
    [47] 詹欣逸,“利用WordNet判斷字詞包含關係-應用於動態階層文件分群”, 國立中央大學資訊管理研究所碩士論文, 2013.
    [48] 謝昆霖,楊義清,林俊男, & 林育弘,“模糊群聚分析程序於生物 DNA 序列之研究”, Journal of Information Technology and Applications, 2(1), pp. 17-22, 2007.
    Description: 碩士
    國立政治大學
    資訊科學學系
    102753012
    Source URI: http://thesis.lib.nccu.edu.tw/record/#G0102753012
    Data Type: thesis
    Appears in Collections:[Department of Computer Science ] Theses

    Files in This Item:

    File SizeFormat
    301201.pdf6846KbAdobe PDF21175View/Open


    All items in 政大典藏 are protected by copyright, with all rights reserved.


    社群 sharing

    著作權政策宣告 Copyright Announcement
    1.本網站之數位內容為國立政治大學所收錄之機構典藏,無償提供學術研究與公眾教育等公益性使用,惟仍請適度,合理使用本網站之內容,以尊重著作權人之權益。商業上之利用,則請先取得著作權人之授權。
    The digital content of this website is part of National Chengchi University Institutional Repository. It provides free access to academic research and public education for non-commercial use. Please utilize it in a proper and reasonable manner and respect the rights of copyright owners. For commercial use, please obtain authorization from the copyright owner in advance.

    2.本網站之製作,已盡力防止侵害著作權人之權益,如仍發現本網站之數位內容有侵害著作權人權益情事者,請權利人通知本網站維護人員(nccur@nccu.edu.tw),維護人員將立即採取移除該數位著作等補救措施。
    NCCU Institutional Repository is made to protect the interests of copyright owners. If you believe that any material on the website infringes copyright, please contact our staff(nccur@nccu.edu.tw). We will remove the work from the repository and investigate your claim.
    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - Feedback