政大機構典藏-National Chengchi University Institutional Repository(NCCUR):Item 140.119/61199

English | 正體中文 | 简体中文 | Post-Print筆數 : 27 | Items with full text/Total items : 116918/147949 (79%)
Visitors : 64738976 Online Users : 494

RC Version 6.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.

Scope

please add "double quotation mark" for query phrases to get precise results

please goto advance search for comprehansive author search

Adv. Search

Home ‧ Login ‧ Upload ‧ Help ‧ About ‧ Administer

Goto mobile version

政大機構典藏 > 資訊學院 > 資訊科學系 > 學位論文 > Item 140.119/61199

Please use this identifier to cite or link to this item: https://nccur.lib.nccu.edu.tw/handle/140.119/61199

Title:	實作推特社群媒體的資料蒐集與管理服務 Design and Implementation of a Twitter Data Collection and Management Service
Authors:	周玉駿 Chou, Yu Chun
Contributors:	陳恭 Chen, Kung 周玉駿 Chou, Yu Chun
Keywords:	推特社群媒體 Social networks NoSQL
Date:	2012
Issue Date:	2013-10-01 13:47:07 (UTC+8)
Abstract:	社群網路的興起大幅改變了現代社會的溝通模式。使用者互動時產出的巨量資料，經過蒐集、儲存、分析，能幫助研究人員在許多領域進行更深入的工作，包括災變信息(crisis informatics)、趨勢分析、社會關係(social relation)等。為讓研究人員將心力專注於資料的分析上，建構穩定的資料蒐集與管理平台供研究人員方便處理就有其必要性。本研究參考目前推特資料蒐集、大量資料儲存所遇到的狀況及限制，定義出一些基本系統設計方式，並完成一個推特資料蒐集與管理平台。我們採用「事件、工作」的模式以儘量減少使用者設定重複蒐集條件，再搭配「一工作、一Access Token」的作法讓系統的工作與工作之間速限不會互相影響；其次，考量到一般狀況下，系統進行大量資料儲存會遇到硬體擴充性問題，本平台蒐集資料後，先儲存於NoSQL，再將資料從NoSQL迅速轉換到一般關聯式資料庫。我們並進行了一些資料搜集的實驗，並與許多學者使用的其他兩個工具進行推特蒐集的比較，初步結果顯示我們的平台有一定的優勢。 The rise of social media, such as Twitter, has significantly influenced the mode of communication in modern society. By collecting, storing and analyzing the massive amount of user interaction data from social media, researchers can conduct more in-depth work in many areas, such as disaster information dissemination (crisis informatics), trend analysis and social network analysis, etc. To help researchers focus on the analysis of data, it is necessary to construct a robust data collection and management platform. In this thesis, we investigate the issues and restrictions of current tweets data collection and storage, and present a modular design and implementation of tweet collection and management platform based on Twitter’s API. Two salient features of our platform are event-job based data collection tasks and access token pool. Specifically, researchers may lauch multiple job to collect the tweets related to an event with less duplicate tweets. By adopting the one job one access token approach, multiple jobs can run separately and will not affect the rate limit of each other. Besides, considering the common situation of tweet burst in many events, our platform first stores the collected data into HBase, a popular NoSQL system, and then quickly migrate them to a standard relational database. To evaluate our platform, we have conducted a few data collection experiments, and made a comparison with two other popular tweet collection tools, The preliminary results show that our platform has certain advantages over them.
Reference:	1. Twitter Team. Twitter Blog https://blog.twitter.com/2012/twitter-turns-six March.2012. 2. Mike Melanson.Twitter Kills the API Whitelist: What it Means for Developers & Innovation. http://readwrite.com/2011/02/11/twitter_kills_the_api_whitelist_what_it_means_for February 2011. 3. Shiels, Maggie. Web slows after Jackson`s death, BBC News. June 26, 2009. 4. H. Kwak, C. Lee, H. Park, and S. Moon. What is twitter, a social network or a news media? In Proceedings of the 19th International Conference on World Wide Web (WWW), pages 591-600, 2010. 5. M. Cha, H. Haddadi, F. Benevenuto, and K. P. Gummadi. Measuring user inuence in twitter: The million follower fallacy. In Proceedings of the 4th International AAAI Conference on Weblogs and Social Media (ICWSM), May 2010. 6. Hrishikesh Bakshi .Framework for Crawling and Local Event Detection Using Twitter Data.In his master’s degree athesis,May 2011. 7. Mike Melanson.Twitter Kills the API Whitelist: What it Means for Developers & Innovation. http://readwrite.com/2011/02/11/twitter_kills_the_api_whitelist_what_it_means_for February 2011. 8. A. Java, X. Song, T. Finin, and B. Tseng. Why we twitter: understanding microblogging usage and communities. In Proceedings of the 9th WebKDD and 1st SNA-KDD 2007 workshop on Web mining and social network analysis, pages 56-65, 2007. 9. A. H. Wang. Don`t follow me: Spam detection in twitter. In Proceedings of the International Conference on Security and Cryptography (SECRYPT), July 2010. 10. Matko Bošnjak, Eduardo Oliveira, José Martins, Eduarda Mendes Rodrigues, Luís Sarmento. TwitterEcho - A Distributed Focused Crawler to Support Open ReSearch with Twitter Data 11. Kenneth M. Anderson, Aaron Schram. Design and Implementation of a Data Analytics Infrastructure in Support of Crisis Informatics ReSearch (NIER Track),2011. 12. Kenneth M. Anderson, Aaron Schram. MySQL to NoSQL Data Modeling Challenges in Supporting Scalability,page 3,2012. 13. Twitter API: https://dev.twitter.com/docs/streaming-apis 14. Cosimo Streppone . Gentle introduction to Oauth. http://dev.opera.com/articles/view/gentle-introduction-to-oauth/ November 3, 2010. 15. E. F. Codd, A relational model of data for large shared data banks.Com-mun.ACM,1970. 16. Adam Lith,Jakob Mattsson.Investigating storage solutions for large data,page 63,2010. 17. Rick Cattel.Scalable SQL and NoSQL Data Stores,page 10,2011. 18. Kenneth M. Anderson, Aaron Schram. MySQL to NoSQL Data Modeling Challenges in Supporting Scalability,page 1. 2012.
Description:	碩士國立政治大學資訊科學學系 99971018 101
Source URI:	http://thesis.lib.nccu.edu.tw/record/#G0099971018
Data Type:	thesis
Appears in Collections:	[資訊科學系] 學位論文

Files in This Item:

File	Size	Format
index.html	0Kb	HTML2	347	View/Open

All items in 政大典藏 are protected by copyright, with all rights reserved.

社群 sharing

著作權政策宣告 Copyright Announcement

1.本網站之數位內容為國立政治大學所收錄之機構典藏，無償提供學術研究與公眾教育等公益性使用，惟仍請適度，合理使用本網站之內容，以尊重著作權人之權益。商業上之利用，則請先取得著作權人之授權。
The digital content of this website is part of National Chengchi University Institutional Repository. It provides free access to academic research and public education for non-commercial use. Please utilize it in a proper and reasonable manner and respect the rights of copyright owners. For commercial use, please obtain authorization from the copyright owner in advance.

2.本網站之製作，已盡力防止侵害著作權人之權益，如仍發現本網站之數位內容有侵害著作權人權益情事者，請權利人通知本網站維護人員(nccur@nccu.edu.tw)，維護人員將立即採取移除該數位著作等補救措施。
NCCU Institutional Repository is made to protect the interests of copyright owners. If you believe that any material on the website infringes copyright, please contact our staff(nccur@nccu.edu.tw). We will remove the work from the repository and investigate your claim.

DSpace Software Copyright © 2002-2004 MIT & Hewlett-Packard / Enhanced by NTU Library IR team Copyright © - Feedback