Loading...
|
Please use this identifier to cite or link to this item:
https://nccur.lib.nccu.edu.tw/handle/140.119/61199
|
Title: | 實作推特社群媒體的資料蒐集與管理服務 Design and Implementation of a Twitter Data Collection and Management Service |
Authors: | 周玉駿 Chou, Yu Chun |
Contributors: | 陳恭 Chen, Kung 周玉駿 Chou, Yu Chun |
Keywords: | 推特 社群媒體 Social networks NoSQL |
Date: | 2012 |
Issue Date: | 2013-10-01 13:47:07 (UTC+8) |
Abstract: | 社群網路的興起大幅改變了現代社會的溝通模式。使用者互動時產出的巨量資料,經過蒐集、儲存、分析,能幫助研究人員在許多領域進行更深入的工作,包括災變信息(crisis informatics)、趨勢分析、社會關係(social relation)等。為讓研究人員將心力專注於資料的分析上,建構穩定的資料蒐集與管理平台供研究人員方便處理就有其必要性。
本研究參考目前推特資料蒐集、大量資料儲存所遇到的狀況及限制,定義出一些基本系統設計方式,並完成一個推特資料蒐集與管理平台。我們
採用「事件、工作」的模式以儘量減少使用者設定重複蒐集條件,再搭配「一工作、一Access Token」的作法讓系統的工作與工作之間速限不會互相影響;其次,考量到一般狀況下,系統進行大量資料儲存會遇到硬體擴充性問題,本平台蒐集資料後,先儲存於NoSQL,再將資料從NoSQL迅速轉換到一般關聯式資料庫。
我們並進行了一些資料搜集的實驗,並與許多學者使用的其他兩個工具進行推特蒐集的比較,初步結果顯示我們的平台有一定的優勢。 The rise of social media, such as Twitter, has significantly influenced the mode of communication in modern society. By collecting, storing and analyzing the massive amount of user interaction data from social media, researchers can conduct more in-depth work in many areas, such as disaster information dissemination (crisis informatics), trend analysis and social network analysis, etc. To help researchers focus on the analysis of data, it is necessary to construct a robust data collection and management platform.
In this thesis, we investigate the issues and restrictions of current tweets data collection and storage, and present a modular design and implementation of tweet collection and management platform based on Twitter’s API. Two salient features of our platform are event-job based data collection tasks and access token pool. Specifically, researchers may lauch multiple job to collect the tweets related to an event with less duplicate tweets. By adopting the one job one access token approach, multiple jobs can run separately and will not affect the rate limit of each other. Besides, considering the common situation of tweet burst in many events, our platform first stores the collected data into HBase, a popular NoSQL system, and then quickly migrate them to a standard relational database.
To evaluate our platform, we have conducted a few data collection experiments, and made a comparison with two other popular tweet collection tools, The preliminary results show that our platform has certain advantages over them. |
Reference: | 1. Twitter Team. Twitter Blog
https://blog.twitter.com/2012/twitter-turns-six
March.2012.
2. Mike Melanson.Twitter Kills the API Whitelist: What it Means for Developers & Innovation. http://readwrite.com/2011/02/11/twitter_kills_the_api_whitelist_what_it_means_for
February 2011.
3. Shiels, Maggie. Web slows after Jackson`s death, BBC News. June 26, 2009.
4. H. Kwak, C. Lee, H. Park, and S. Moon. What is twitter, a social network or a news media? In Proceedings of the 19th International Conference on World Wide Web (WWW), pages 591-600, 2010.
5. M. Cha, H. Haddadi, F. Benevenuto, and K. P. Gummadi. Measuring user inuence in twitter: The million follower fallacy. In Proceedings of the 4th International AAAI Conference on Weblogs and Social Media (ICWSM), May 2010.
6. Hrishikesh Bakshi .Framework for Crawling and Local Event Detection Using Twitter Data.In his master’s degree athesis,May 2011.
7. Mike Melanson.Twitter Kills the API Whitelist: What it Means for Developers & Innovation. http://readwrite.com/2011/02/11/twitter_kills_the_api_whitelist_what_it_means_for
February 2011.
8. A. Java, X. Song, T. Finin, and B. Tseng. Why we twitter: understanding microblogging usage and communities. In Proceedings of the 9th WebKDD and 1st SNA-KDD 2007 workshop on Web mining and social network analysis, pages 56-65, 2007.
9. A. H. Wang. Don`t follow me: Spam detection in twitter. In Proceedings of the International Conference on Security and Cryptography (SECRYPT), July 2010.
10. Matko Bošnjak, Eduardo Oliveira, José Martins, Eduarda Mendes Rodrigues, Luís Sarmento. TwitterEcho - A Distributed Focused Crawler to Support Open ReSearch with Twitter Data
11. Kenneth M. Anderson, Aaron Schram. Design and Implementation of a Data Analytics Infrastructure in Support of Crisis Informatics ReSearch (NIER Track),2011.
12. Kenneth M. Anderson, Aaron Schram. MySQL to NoSQL Data Modeling Challenges in Supporting Scalability,page 3,2012.
13. Twitter API:
https://dev.twitter.com/docs/streaming-apis
14. Cosimo Streppone . Gentle introduction to Oauth. http://dev.opera.com/articles/view/gentle-introduction-to-oauth/
November 3, 2010.
15. E. F. Codd, A relational model of data for large shared data banks.Com-mun.ACM,1970.
16. Adam Lith,Jakob Mattsson.Investigating storage solutions for large data,page 63,2010.
17. Rick Cattel.Scalable SQL and NoSQL Data Stores,page 10,2011.
18. Kenneth M. Anderson, Aaron Schram. MySQL to NoSQL Data Modeling Challenges in Supporting Scalability,page 1. 2012. |
Description: | 碩士 國立政治大學 資訊科學學系 99971018 101 |
Source URI: | http://thesis.lib.nccu.edu.tw/record/#G0099971018 |
Data Type: | thesis |
Appears in Collections: | [資訊科學系] 學位論文
|
Files in This Item:
File |
Size | Format | |
index.html | 0Kb | HTML2 | 242 | View/Open |
|
All items in 政大典藏 are protected by copyright, with all rights reserved.
|