English  |  正體中文  |  简体中文  |  Post-Print筆數 : 27 |  Items with full text/Total items : 113318/144297 (79%)
Visitors : 51098807      Online Users : 915
RC Version 6.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
Scope Tips:
  • please add "double quotation mark" for query phrases to get precise results
  • please goto advance search for comprehansive author search
  • Adv. Search
    HomeLoginUploadHelpAboutAdminister Goto mobile version
    政大機構典藏 > 資訊學院 > 資訊科學系 > 學位論文 >  Item 140.119/69229
    Please use this identifier to cite or link to this item: https://nccur.lib.nccu.edu.tw/handle/140.119/69229


    Title: 資源感知之社群媒體資料搜集平台:以推特為例
    A resource-aware data collection platform for Twitter
    Authors: 許矢勇
    Shiu, Shih Yung
    Contributors: 陳恭
    Chen, Kung
    許矢勇
    Shiu, Shih Yung
    Keywords: 推特
    資源感知
    社群媒體
    Twitter
    Resource-aware
    Social media
    Date: 2013
    Issue Date: 2014-08-25 15:21:49 (UTC+8)
    Abstract: 近年來社群媒體如推特、臉書、新浪微博等蓬勃地發展,不僅用戶數持續成長,也已成為人們日常生活中與朋友交流以及獲取資訊的一個重要管道。對於傳播與社會學者而言,社群媒體巨擘們掌握的巨量資料,是進行相關主題研究的一個重要資源。各大社群媒體雖然都有適度提供資料擷取的程式介面(API),但也或多或少地對資料搜集者加諸某些限制,導致資料的搜集發生困難。簡言之,研究人員必須在這些社群媒體提供的有限資源的限制下,設法優化所能取的資料集的質與量。有鑑於此,本研究以推特(twitter)為標的,實作一具資源感知之社群媒體資料搜集平台來協助學者蒐集推文(tweet)。
    首先,本平台採用事件-工作的概念,讓使者用針對所關注的事件,選定不同的關鍵字進行蒐集的資料,這些不同的關鍵字即對應到系統的工作。其次,每個工作必須擁有存取代幣(access tokens)才能以蒐集推文,而每個代幣在一定時間內只能取得一定數量的推文,所以代幣是本平台的主要資源。為因應特殊事件發生時,推文暴增的常見情況,本平台提供了一個代幣池(token pool)的機制,讓眾多工作得以分享代幣資源,並善用推特API的存取選項,提供使用者可依蒐集資料時間點的差異,進行可取得推文數量的優化。在系統核心設計上,本研究提出「豪宅家務服務群(Mansion Household Service)」的概念,透過服務群內隨從(minion)們的分工合作,系統能夠在資源有限的情況下,仍然能夠同步執行多個不同的工作,有效降低推特所加諸的限制,對於推文搜集所造成的衝擊。我們並以實證方式,驗證我們平台的推文蒐集能力。
    Recently, with the rapid development of social media such as Twitter, Facebook and Weibo, people have employed social media as a major channel for inter-personal communication and a daily source of various kinds of information. From the viewpoints of social science and humanity scholars, the digital footprints that people left on these social media are a rich resource for the study of human behaviors. However, these social media usually impose certain resource restrictions such as rate limiting on how scholars may use their API to retrieve their data. Therefore, we design and implement a resource-aware data collection platform for Twitter to help scholars retrieve historical tweets in an effective and efficient manner.
    Our platform employs the event-job approach to help users organize the tasks and the tweets to be collected. As each job requires an access token to fetch tweets, our platform provides a pool of tokens for system jobs to share so that access tokens will be maximally utilized. Besides, we leverage the tweet-id options in Twitter API and enable users to optimize the number of tweets to be collected depending on the timing of tweet collection. In the organization of the system core of tweet collection, we propose a so-called “Mansion Household System,” in which four-minions will corporate with each other to launch different jobs simultaneously and thus alleviate the impact from the restrictions which Twitter imposes via access tokens. To validate our design, we have conducted a series of experiments and the results are quite satisfying.
    Reference: 【1】 Shamanth Kumar ,Fred Morstatter, Huan Liu. August 19,2013. Twitter Data Analytics.
    【2】 周玉駿. 2013. 實作推特社群媒體的資料蒐集與管理服務.
    【3】 Adam Marcus, Michael S.Bernstein, Osama Badar, David R.Karger, Samuel Madden, Robert C.Miller. 2012. Processing and Visualizing the Data in Tweets.
    【4】 Lance Reagan Vick, Titus Soporan, Daniel Robert Lewis, Jane Brooks Zurn. 2012. Hybrid Browser/Server Collection of Streaming Social Media Data for Scalable Real-Time Analysis.
    【5】 Matko Bosnjak, Eduardo Oliveira, Jose Martins, Eduarda Mendes Rodrigues, Luis Sarmento. 2012. TwitterEcho-A Distributed Focused Crawler to Support Open Research with Twitter Data.
    【6】 Axel Bruns ,Yuxian Eugene Liang. Apr, 2012. Tools and methods for capturing Twitter data during natural disasters.
    【7】 Twitter Application-only authentication: https://dev.twitter.com/docs/auth/application-only-auth
    【8】 Twitter Search API:
    https://dev.twitter.com/docs/using-search
    【9】 Aditi Das. Jan 17,2008. Understanding JPA,Part1: The object-oriented paradigm of data persistence. http://www.javaworld.com/article/2077817/java-se/understanding-jpa-part-1-the-object-oriented-paradigm-of-data-persistence.html
    【10】 Erich Gamma, Richard Helm, Ralph Johnson, John Vlissides. August 1994. Design Patterns Elements of Reusable Object-Oriented Software.
    【11】 Adam Green, February 15,2013. Twitter API Engagement Programming with PHP and MySQL.
    Description: 碩士
    國立政治大學
    資訊科學學系
    100971001
    102
    Source URI: http://thesis.lib.nccu.edu.tw/record/#G0100971001
    Data Type: thesis
    Appears in Collections:[資訊科學系] 學位論文

    Files in This Item:

    File SizeFormat
    100101.pdf2670KbAdobe PDF2445View/Open


    All items in 政大典藏 are protected by copyright, with all rights reserved.


    社群 sharing

    著作權政策宣告 Copyright Announcement
    1.本網站之數位內容為國立政治大學所收錄之機構典藏,無償提供學術研究與公眾教育等公益性使用,惟仍請適度,合理使用本網站之內容,以尊重著作權人之權益。商業上之利用,則請先取得著作權人之授權。
    The digital content of this website is part of National Chengchi University Institutional Repository. It provides free access to academic research and public education for non-commercial use. Please utilize it in a proper and reasonable manner and respect the rights of copyright owners. For commercial use, please obtain authorization from the copyright owner in advance.

    2.本網站之製作,已盡力防止侵害著作權人之權益,如仍發現本網站之數位內容有侵害著作權人權益情事者,請權利人通知本網站維護人員(nccur@nccu.edu.tw),維護人員將立即採取移除該數位著作等補救措施。
    NCCU Institutional Repository is made to protect the interests of copyright owners. If you believe that any material on the website infringes copyright, please contact our staff(nccur@nccu.edu.tw). We will remove the work from the repository and investigate your claim.
    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - Feedback