English  |  正體中文  |  简体中文  |  Post-Print筆數 : 27 |  Items with full text/Total items : 113160/144130 (79%)
Visitors : 50751620      Online Users : 486
RC Version 6.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
Scope Tips:
  • please add "double quotation mark" for query phrases to get precise results
  • please goto advance search for comprehansive author search
  • Adv. Search
    HomeLoginUploadHelpAboutAdminister Goto mobile version
    政大機構典藏 > 商學院 > 資訊管理學系 > 學位論文 >  Item 140.119/54564
    Please use this identifier to cite or link to this item: https://nccur.lib.nccu.edu.tw/handle/140.119/54564


    Title: 運用kNN文字探勘分析智慧型終端App群集之研究
    The study of analyzing smart handheld device App`s clusters by using kNN text mining
    Authors: 曾國傑
    Tseng, Kuo Chieh
    Contributors: 楊建民
    曾國傑
    Tseng, Kuo Chieh
    Keywords: App
    kNN
    群集分析
    文字探勘
    App
    kNN
    Clustering
    Text Mining
    Date: 2011
    Issue Date: 2012-10-30 11:21:18 (UTC+8)
    Abstract: 隨著智慧型終端設備日益普及,使用者對App需求逐漸增加,各大企業也因此開創了一種新的互動性行銷方式。同時,App下載所帶來的龐大商機也促使許多開發人員紛紛加入App的開發行列,造成App的數量呈現爆炸性成長,而讓使用者在面對種類繁多的App時,無法做出有效率的選擇。故本研究將透過文字探勘與kNN集群分析技術,分析網友發表的App推薦文並將App進行分群;再藉由參數的調整,期望能透過衡量指標的評估來獲得最佳品質之分群,以便作為使用者選擇App之參考依據。
    為了使大量App進行分群以解決使用者「資訊超載」的問題,本研究以App Store之遊戲類App為分析對象,蒐集了439篇App推薦文章,並依App推薦對象之異同,將其合併成357篇App推薦文章;接著,透過文字探勘技術將文章轉換成可相互比較的向量空間模型,再利用kNN群集分析對其進行分群。同時,藉由參數組合中k值與文件相似度門檻值的調整來獲得最佳品質之分群;其分群品質的評估則透過平均群內相似度等指標來進行衡量;而為了提升分群品質,本研究採用「多階段分群」,以分群後各群集內的文章數量來判斷是否進行再分群或群集合併。
    本研究結果顯示第一階段分群在k值為10、文件相似度門檻值為0.025時,能獲得最佳之分群品質。而在後續階段的分群過程中,因群集內文章數減少,故將k值降低並逐漸提高文件相似度門檻值以獲得分群效果。第二階段結束後,可針對已達到分群停止條件之群集進行關鍵詞彙萃取,並可歸類出「棒球/射擊」與「投擲飛行」等6種App類型;其後階段依循相同分群規則可獲得「守城塔防」等14種App類型。分群結束後,共可分出36個群集並獲得20種App類型。分群過程中,平均群內相似度逐漸增加;平均群間相似度則逐漸下降;分群品質衡量指標由第一階段分群後的12.65%提升到第五階段結束時的75.81%。
    由本研究可知分群之後相似度高的App會逐漸聚集成群,所獲得之各群集命名結果將能作為使用者選擇App之參考依據;App軟體開發人員也能從各群集之關鍵詞彙中了解使用者所注重的遊戲元素,改善App內容以更符合使用者之需求。而以本研究結果為基礎,透過建立專業詞庫改善分群品質、利用文件摘要技術加強使用者對各群集之了解,或建立App推薦系統等皆可做為未來研究之方向。
    With the popularity of Smart Handheld Devices are increasing, the needs of “App” are spreading. Developers whom devote themselves to this opportunity are also rising, making the total number of Apps growing rapidly. Facing these kind of situation, users couldn’t choose the App they need efficiently. This research uses text mining and kNN Clustering technique analyzing the recommendation reviews of App by netizen then clustering the App recommendation articles; Through the adjustments of parameters, we expect to evaluate the measurement indicators to obtain the best quality cluster to use as a basis for users to select Apps.
    In order to solve the information overload for the user, we analyzed apps of the “Games” category form App store and sorted out to 357 App recommendation articles to use as our analysis target. Then we used text mining technique to process the articles and uses kNN clustering analysis to sort out the articles. Simultaneously, we fine tuning the measurement indicators to find the optimal cluster. This research uses multi-phase clustering technique to assure the quality of each cluster.
    We discriminate 36 clusters and 20 categories from the clustering results. During the clustering process, the Mean of Intra-cluster Similarity increases gradually; in the contrary, the Mean of Inter-cluster Similarity reduces. The “Cluster Quality” increases from 12.65% significantly to 75.81%. In conclusion, similar Apps will gradually been clustered by its similarities, and can be used to be a reference by its cluster’s name. The App developers can also understands the game elements which the users pay greater attentions and tailored their contents to match the needs of the users according to the key phrases from each cluster. In further discussion, building specialized terms database of App to improve the quality of the clustering, using summarization technique to robust user understanding of each cluster, or to build up App recommendation system is liking to be further studied via using the results by this research.
    Reference: 英文文獻

    1. 148Apps.biz. (2012). Count of Active Applications in the App Store. Retrieved April 20, 2012, from http://148apps.biz/app-store-metrics/?mpage=appcount
    2. Apple. (2012). iTunes Preview. Retrieved April 20, 2012, from http://itunes.apple.com/us/genre/ios/id36
    3. Chen, K. J., & Liu, S. H. (1992). Word identification for Mandarin Chinese sentences. Proceedings of the 14th conference on Computational linguistics , 101–107. Nantes, France.
    4. Engel, J. F., Blackwell, R. D., & Miniard, P. W. (1993). Consumer Behaviour (7th Revised ed.). Chicago: Dryden Press.
    5. Fayyad, U. M. (1996). Data Mining and Knowledge Discovery: Making Sense Out of Data. IEEE Expert: Intelligent Systems and Their Applications, 11(5), 20–25.
    6. Feldman, R., & Dagan, I. (1995). Knowledge discovery in textual databases (KDT). Proceedings of the First International Conference on Knowledge Discovery and Data Mining , 112–117. Montreal, Canada.
    7. Hennig-Thurau, T., Gwinner, K. P., Walsh, G., & Gremler, D. D. (2004). Electronic word‐of‐mouth via consumer‐opinion platforms: What motivates consumers to articulate themselves on the Internet? Journal of Interactive Marketing, 18(1), 38–52.
    8. Jain, A. K., & Dubes, R. C. (1988). Algorithms for clustering data. Upper Saddle River, NJ, USA: Prentice-Hall, Inc.

    9. Lai, C. H., & Liu, D. R. (2009). Integrating knowledge flow mining and collaborative filtering to support document recommendation. Journal of Systems and Software, 82(12), 2023–2037.
    10. Nie, J. Y., Brisebois, M., & Ren, X. (1996). On Chinese text retrieval. Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval , 225–233. New York, USA.
    11. Salton, G., & Buckley, C. (1988). Term-weighting approaches in automatic text retrieval. Information Processing & Management, 24(5), 513–523.
    12. Salton, G., Wong, A., & Yang, C. S. (1975). A vector space model for automatic indexing. Commun. ACM, 18(11), 613–620.
    13. Simoudis, E. (1996). Reality Check for Data Mining. IEEE Expert: Intelligent Systems and Their Applications, 11(5), 26–33.
    14. Sproat, R. W., & Shih, C. (1990). A statistical method for finding word boundaries in Chinese text. Computer Processing of Chinese and Oriental Languages, 4(4), 336–351.
    15. Sullivan, D. (2001). Document Warehousing and Text Mining: Techniques for Improving Business Operations, Marketing, and Sales. New York, NY, USA: John Wiley; Sons, Inc.
    16. Tan, A. (1999). Text mining: The state of the art and the challenges. Proceedings of the PAKDD 1999 Workshop on Knowledge Disocovery from Advanced Databases , 65–70. Beijing, China.
    17. Teng, W. G., & Lee, H. hsien. (2007). Collaborative Recommendation with Multi-Criteria Ratings. Journal of Computers, 17(4), 69–78.
    18. Yang, Y., Carbonell, J. G., Brown, R. D., Pierce, T., Archibald, B. T., & Liu, X. (1999). Learning approaches for detecting and tracking news events. IEEE Intelligent Systems and their Applications, 14(4), 32–43.
    19. You, J. M., & Chen, K. J. (2006). Improving context vector models by feature clustering for automatic thesaurus construction. Proceedings of the Fifth SIGHAN Workshop on Chinese Language Processing ,1–8. Sydney, Australia.


    中文文獻

    1. SmartMobix. (2012). 移動裝置上使用時間何者最多?使用者:App. 行動智庫. Retrieved January 25, 2012, from http://www.smartmobix.com.tw/flurry_20110622
    2. 吳文峰. (2002). 中文郵件分類器之設計及實作. 逢甲大學資訊工程系碩士論文.
    3. 巫啟台. (2002). 文件之關聯資訊萃取及其概念圖自動建構. 國立成功大學資訊工程學系碩士論文.
    4. 林姿旻. (2011). 數位遊戲之行動載具使用者行為與開發分析─以智慧型手機為例. 國立政治大學數位內容碩士論文.
    5. 胡秀珠. (2011). 55%業者一年內推出App服務. 創新發現誌. Retrieved March 5, 2012, from http://ideas.org.tw/magazine_article.php?f=464
    6. 郭芳菲. (2003). 利用和絃特徵探勘音樂旋律曲風之研究. 國立政治大學資訊科學學系碩士論文.
    7. 陳柏均. (2011). 文件距離為基礎kNN分群技術與新聞事件偵測追蹤之研究. 國立政治大學資訊管理學系碩士論文.
    8. 陳崇正. (2009). 應用網路書籤與VSM相似度演算法於強化實踐社群的形成. 國立中央大學資訊工程學系碩士論文.
    9. 楊智凱. (2007). 唐詩推薦系統之研究. 亞洲大學資訊科學與應用學系碩士論文.
    10. 盧希鵬. (2005). 網路行銷:電子化企業經營策略.台北市:雙葉書廊有限公司.
    11. 胡國信. (2005). 具分群機制之遞增式最鄰近分類學習法 --垃圾郵件過濾之應用. 國立屏東商業技術學院資訊管理學系碩士論文.
    Description: 碩士
    國立政治大學
    資訊管理研究所
    99356010
    100
    Source URI: http://thesis.lib.nccu.edu.tw/record/#G0099356010
    Data Type: thesis
    Appears in Collections:[資訊管理學系] 學位論文

    Files in This Item:

    File SizeFormat
    601001.pdf2008KbAdobe PDF2420View/Open


    All items in 政大典藏 are protected by copyright, with all rights reserved.


    社群 sharing

    著作權政策宣告 Copyright Announcement
    1.本網站之數位內容為國立政治大學所收錄之機構典藏,無償提供學術研究與公眾教育等公益性使用,惟仍請適度,合理使用本網站之內容,以尊重著作權人之權益。商業上之利用,則請先取得著作權人之授權。
    The digital content of this website is part of National Chengchi University Institutional Repository. It provides free access to academic research and public education for non-commercial use. Please utilize it in a proper and reasonable manner and respect the rights of copyright owners. For commercial use, please obtain authorization from the copyright owner in advance.

    2.本網站之製作,已盡力防止侵害著作權人之權益,如仍發現本網站之數位內容有侵害著作權人權益情事者,請權利人通知本網站維護人員(nccur@nccu.edu.tw),維護人員將立即採取移除該數位著作等補救措施。
    NCCU Institutional Repository is made to protect the interests of copyright owners. If you believe that any material on the website infringes copyright, please contact our staff(nccur@nccu.edu.tw). We will remove the work from the repository and investigate your claim.
    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - Feedback