政大機構典藏-National Chengchi University Institutional Repository(NCCUR):Item 140.119/61199
English  |  正體中文  |  简体中文  |  Post-Print筆數 : 27 |  全文筆數/總筆數 : 113451/144438 (79%)
造訪人次 : 51282885      線上人數 : 857
RC Version 6.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
搜尋範圍 查詢小技巧:
  • 您可在西文檢索詞彙前後加上"雙引號",以獲取較精準的檢索結果
  • 若欲以作者姓名搜尋,建議至進階搜尋限定作者欄位,可獲得較完整資料
  • 進階搜尋
    政大機構典藏 > 資訊學院 > 資訊科學系 > 學位論文 >  Item 140.119/61199
    請使用永久網址來引用或連結此文件: https://nccur.lib.nccu.edu.tw/handle/140.119/61199


    題名: 實作推特社群媒體的資料蒐集與管理服務
    Design and Implementation of a Twitter Data Collection and Management Service
    作者: 周玉駿
    Chou, Yu Chun
    貢獻者: 陳恭
    Chen, Kung
    周玉駿
    Chou, Yu Chun
    關鍵詞: 推特
    社群媒體
    Social networks
    NoSQL
    日期: 2012
    上傳時間: 2013-10-01 13:47:07 (UTC+8)
    摘要: 社群網路的興起大幅改變了現代社會的溝通模式。使用者互動時產出的巨量資料,經過蒐集、儲存、分析,能幫助研究人員在許多領域進行更深入的工作,包括災變信息(crisis informatics)、趨勢分析、社會關係(social relation)等。為讓研究人員將心力專注於資料的分析上,建構穩定的資料蒐集與管理平台供研究人員方便處理就有其必要性。
    本研究參考目前推特資料蒐集、大量資料儲存所遇到的狀況及限制,定義出一些基本系統設計方式,並完成一個推特資料蒐集與管理平台。我們
    採用「事件、工作」的模式以儘量減少使用者設定重複蒐集條件,再搭配「一工作、一Access Token」的作法讓系統的工作與工作之間速限不會互相影響;其次,考量到一般狀況下,系統進行大量資料儲存會遇到硬體擴充性問題,本平台蒐集資料後,先儲存於NoSQL,再將資料從NoSQL迅速轉換到一般關聯式資料庫。
    我們並進行了一些資料搜集的實驗,並與許多學者使用的其他兩個工具進行推特蒐集的比較,初步結果顯示我們的平台有一定的優勢。
    The rise of social media, such as Twitter, has significantly influenced the mode of communication in modern society. By collecting, storing and analyzing the massive amount of user interaction data from social media, researchers can conduct more in-depth work in many areas, such as disaster information dissemination (crisis informatics), trend analysis and social network analysis, etc. To help researchers focus on the analysis of data, it is necessary to construct a robust data collection and management platform.
    In this thesis, we investigate the issues and restrictions of current tweets data collection and storage, and present a modular design and implementation of tweet collection and management platform based on Twitter’s API. Two salient features of our platform are event-job based data collection tasks and access token pool. Specifically, researchers may lauch multiple job to collect the tweets related to an event with less duplicate tweets. By adopting the one job one access token approach, multiple jobs can run separately and will not affect the rate limit of each other. Besides, considering the common situation of tweet burst in many events, our platform first stores the collected data into HBase, a popular NoSQL system, and then quickly migrate them to a standard relational database.
    To evaluate our platform, we have conducted a few data collection experiments, and made a comparison with two other popular tweet collection tools, The preliminary results show that our platform has certain advantages over them.
    參考文獻: 1. Twitter Team. Twitter Blog
    https://blog.twitter.com/2012/twitter-turns-six
    March.2012.
    2. Mike Melanson.Twitter Kills the API Whitelist: What it Means for Developers & Innovation. http://readwrite.com/2011/02/11/twitter_kills_the_api_whitelist_what_it_means_for
    February 2011.
    3. Shiels, Maggie. Web slows after Jackson`s death, BBC News. June 26, 2009.
    4. H. Kwak, C. Lee, H. Park, and S. Moon. What is twitter, a social network or a news media? In Proceedings of the 19th International Conference on World Wide Web (WWW), pages 591-600, 2010.
    5. M. Cha, H. Haddadi, F. Benevenuto, and K. P. Gummadi. Measuring user inuence in twitter: The million follower fallacy. In Proceedings of the 4th International AAAI Conference on Weblogs and Social Media (ICWSM), May 2010.
    6. Hrishikesh Bakshi .Framework for Crawling and Local Event Detection Using Twitter Data.In his master’s degree athesis,May 2011.
    7. Mike Melanson.Twitter Kills the API Whitelist: What it Means for Developers & Innovation. http://readwrite.com/2011/02/11/twitter_kills_the_api_whitelist_what_it_means_for
    February 2011.
    8. A. Java, X. Song, T. Finin, and B. Tseng. Why we twitter: understanding microblogging usage and communities. In Proceedings of the 9th WebKDD and 1st SNA-KDD 2007 workshop on Web mining and social network analysis, pages 56-65, 2007.
    9. A. H. Wang. Don`t follow me: Spam detection in twitter. In Proceedings of the International Conference on Security and Cryptography (SECRYPT), July 2010.
    10. Matko Bošnjak, Eduardo Oliveira, José Martins, Eduarda Mendes Rodrigues, Luís Sarmento. TwitterEcho - A Distributed Focused Crawler to Support Open ReSearch with Twitter Data
    11. Kenneth M. Anderson, Aaron Schram. Design and Implementation of a Data Analytics Infrastructure in Support of Crisis Informatics ReSearch (NIER Track),2011.
    12. Kenneth M. Anderson, Aaron Schram. MySQL to NoSQL Data Modeling Challenges in Supporting Scalability,page 3,2012.
    13. Twitter API:
    https://dev.twitter.com/docs/streaming-apis
    14. Cosimo Streppone . Gentle introduction to Oauth. http://dev.opera.com/articles/view/gentle-introduction-to-oauth/
    November 3, 2010.
    15. E. F. Codd, A relational model of data for large shared data banks.Com-mun.ACM,1970.
    16. Adam Lith,Jakob Mattsson.Investigating storage solutions for large data,page 63,2010.
    17. Rick Cattel.Scalable SQL and NoSQL Data Stores,page 10,2011.
    18. Kenneth M. Anderson, Aaron Schram. MySQL to NoSQL Data Modeling Challenges in Supporting Scalability,page 1. 2012.
    描述: 碩士
    國立政治大學
    資訊科學學系
    99971018
    101
    資料來源: http://thesis.lib.nccu.edu.tw/record/#G0099971018
    資料類型: thesis
    顯示於類別:[資訊科學系] 學位論文

    文件中的檔案:

    檔案 大小格式瀏覽次數
    index.html0KbHTML2244檢視/開啟


    在政大典藏中所有的資料項目都受到原著作權保護.


    社群 sharing

    著作權政策宣告 Copyright Announcement
    1.本網站之數位內容為國立政治大學所收錄之機構典藏,無償提供學術研究與公眾教育等公益性使用,惟仍請適度,合理使用本網站之內容,以尊重著作權人之權益。商業上之利用,則請先取得著作權人之授權。
    The digital content of this website is part of National Chengchi University Institutional Repository. It provides free access to academic research and public education for non-commercial use. Please utilize it in a proper and reasonable manner and respect the rights of copyright owners. For commercial use, please obtain authorization from the copyright owner in advance.

    2.本網站之製作,已盡力防止侵害著作權人之權益,如仍發現本網站之數位內容有侵害著作權人權益情事者,請權利人通知本網站維護人員(nccur@nccu.edu.tw),維護人員將立即採取移除該數位著作等補救措施。
    NCCU Institutional Repository is made to protect the interests of copyright owners. If you believe that any material on the website infringes copyright, please contact our staff(nccur@nccu.edu.tw). We will remove the work from the repository and investigate your claim.
    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - 回饋