English  |  正體中文  |  简体中文  |  Post-Print筆數 : 27 |  Items with full text/Total items : 114205/145239 (79%)
Visitors : 52918255      Online Users : 796
RC Version 6.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
Scope Tips:
  • please add "double quotation mark" for query phrases to get precise results
  • please goto advance search for comprehansive author search
  • Adv. Search
    HomeLoginUploadHelpAboutAdminister Goto mobile version
    政大機構典藏 > 資訊學院 > 資訊科學系 > 學位論文 >  Item 140.119/68266
    Please use this identifier to cite or link to this item: https://nccur.lib.nccu.edu.tw/handle/140.119/68266


    Title: 整合R與Hadoop/MapReduce來分析FOAF社群網路
    Using R and Hadoop/MapReduce for FOAF-based Social Network Analytics
    Authors: 孫肇祥
    Sun, Jhao Siang
    Contributors: 胡毓忠
    Hu, Yuh Jong
    孫肇祥
    Sun, Jhao Siang
    Keywords: RDF(S)
    R and Hadoop/MapReduce
    FOAF
    Hadoop
    MapReduce
    社群網路分析
    FOAF
    Hadoop
    MapReduce
    Social network analytics
    Date: 2013
    Issue Date: 2014-08-06 11:47:06 (UTC+8)
    Abstract: 分散式線上社群網路採用RDF(S)為基礎的FOAF格式於信任的第三方Hadoop cluster來儲存個人資料與其社群網絡。面臨大量的社群網路資料,傳統的分析方式將會遇到許多處理與儲存的問題。本研究透過結合R與Hadoop/MapReduce技術,提出三種分析方式:R + Hadoop Streaming (RHS), R + MySQL (RMS), R + Hive (RH)來解決分析大量FOAF資料運算與儲存的瓶頸。我們首先將FOAF資料集注入Hadoop cluster平台並利用MapReduce的分散式運算,預先消化大部分的資料以解決R統計軟體單機記憶體不足以應付大型檔案的問題,透過後續R的分析我們也同時解決MapReduce運算無法進行深層社群網路分析的問題。透過預先拆解的方式以可以處理更大的FOAF資料使其更有延展性。這個方法可以適用於非結構化或結構化資料。面對每日激增的社群網路資料,如何更進一步的結合R與Hadoop/MapReduce,並 使用HBase或是與既有R的平行化軟體作結合,也是日後可以努力研究的方向。
    The decentralized online social networks are encoded as RDF(S)-based FOAF data format. These FOAF datasets, stored on the trusted Hadoop cluster, are used to represent Web users’ personal data and their social relationships. When using traditional data analysis techniques, we face numerous data processing and storing challenges. In this study, we apply three R and Hadoop/MapReduce integration techniques for high volume FOAF data analysis, including R + Hadoop Streaming (RHS), R + MySQL (RMS), and R + Hive (RH). We first ingest the FOAF datasets and pre-process these datasets through the MapReduce distributed programming paradigm. Then, apply R for FOAF data analysis. This resolves the major problems of impossibly reading high volume of big FOAF data into memory for R analysis and the limitation of social network analysis by using MapReduce computation. High volume of FOAF datasets can be distributed and stored effectively in the Hadoop platform for scalable data processing. The R + Hadoop/MapReduce techniques can be used for analysis on the structured and unstructured data. In the future study, the research issues will be on how to effectively integrate R and Hadoop/MapReduce and leverage the HBase or parallel R programming for high volume big data analytics.
    Reference: [1].Apache Hadoop Project, http://hadoop.apache.org
    [2].Billion Triples Challenge 2012 Dataset, http://km.aifb.kit.edu/projects/btc-2012/
    [3].Bizer, C., Heath, T., & Berners-Lee, T. (2009). Linked data-the story so far.International journal on semantic web and information systems, 5(3), 1-22.
    [4].Bonacich, P. (1987). Power and centrality: A family of measures. American journal of sociology, 1170-1182.
    [5].Chang, F., Dean, J., Ghemawat, S., Hsieh, W. C., Wallach, D. A., Burrows, M., ... & Gruber, R. E. (2008). Bigtable: A distributed storage system for structured data. ACM Transactions on Computer Systems (TOCS), 26(2), 4.
    [6].Daniel J. Weitzner . http://www.w3.org/People/Weitzner.html
    [7].Dean, J., & Ghemawat, S. (2008). MapReduce: simplified data processing on large clusters. Communications of the ACM, 51(1), 107-113.
    [8].Dean, J., & Ghemawat, S. (2008). MapReduce: simplified data processing on large clusters. Communications of the ACM, 51(1), 107-113.
    [9].Department Of Statistics, Purdue University (2012). Divide and Recombine (D&R) with RHIPE. Retrieved from http://www.datadr.org/.
    [10].Ding, L., Zhou, L., Finin, T., & Joshi, A. (2005, January). How the semantic web is being used: An analysis of foaf documents. In System Sciences, 2005. HICSS`05. Proceedings of the 38th Annual Hawaii International Conference on(pp. 113c-113c). IEEE.
    [11].Ding, L., Zhou, L., Finin, T., & Joshi, A. (2005, January). How the semantic web is being used: An analysis of foaf documents. In System Sciences, 2005. HICSS`05. Proceedings of the 38th Annual Hawaii International Conference on(pp. 113c-113c). IEEE.
    [12].Dirk Eddelbuettel(2014, July 7) . CRAN Task View: High-Performance and Parallel Computing with R , Retrieved July 7, 2014, from http://cran.r-project.org/web/views/HighPerformanceComputing.html
    [13].Erétéo, G., Gandon, F., Corby, O., & Buffa, M. (2009). Semantic social network analysis. arXiv preprint arXiv:0904.3701.
    [14].FOAF Vocabulary Specification 0.99/Namespace Document 14 January 2014 - Paddington Edition. http://xmlns.com/foaf/spec/
    [15].Freeman, L. C. (1979). Centrality in social networks conceptual clarification. Social networks, 1(3), 215-239.
    [16].G. K. Zipf, Selected Studies of the Principle of Relative Frequency in Language. Harvard University Press, 1932
    [17].Ghemawat, S., Gobioff, H., & Leung, S. T. (2003, October). The Google file system. In ACM SIGOPS Operating Systems Review (Vol. 37, No. 5, pp. 29-43). ACM.
    [18].Ghemawat, S., Gobioff, H., & Leung, S. T. (2003, October). The Google file system. In ACM SIGOPS Operating Systems Review (Vol. 37, No. 5, pp. 29-43). ACM.
    [19].Golbeck, J., & Rothstein, M. (2008, July). Linking Social Networks on the Web with FOAF: A Semantic Web Case Study. In AAAI (Vol. 8, pp. 1138-1143).
    [20].http://en.wikipedia.org/wiki/Information_Sciences_Institute
    [21].http://www.ldodds.com/foaf/foaf-a-matic.html
    [22].Jonathan Seidman .,& Ramesh Venkataramaiah (2011). Distributed Data Analysis with Hadoop and R.
    [23].Mori, J., Matsuo, Y., Ishizuka, M., & Faltings, B. (2004, September). Keyword extraction from the web for foaf metadata. In Proceedings of the 1st Workshop on Friend of a Friend, Social Networking and the (Semantic) Web.
    [24].MySQL database, http://www.mysql.com/
    [25].MySQL Limits on Table Size, http://dev.mysql.com/doc/refman/5.1/en/table-size-limit.html
    [26].Paolillo, J. C., & Wright, E. (2004). The challenges of FOAF characterization. InProceedings of the 1st Workshop on Friend of a Friend, Social Networking and the (Semantic) Web.
    [27].Paolillo, J. C., & Wright, E. (2006). Social network analysis on the semantic web: Techniques and challenges for visualizing FOAF. In Visualizing the semantic web(pp. 229-241). Springer London.
    [28].Piccolboni, A. (2014,May 25) RevolutionAnalytics/RHadoop. Retrieved from https://github.com/RevolutionAnalytics/RHadoop/wiki.
    [29].Resource Description Framework (RDF), http://www.w3.org/RDF/
    [30].Rickert, J. B. (2010). Big Data Analysis with Revolution R Enterprise.
    [31].Ryan R. Rosario(2010). Taking R to the Limit. Los Angeles R Users` Group
    [32].The Apache HBase, http://hbase.apache.org/
    [33].The Apache Hive, https://hive.apache.org/
    [34].The Apache ZooKeeper, http://zookeeper.apache.org/
    [35].The Friend of a Friend (FOAF) project, http://www.foaf-project.org/
    [36].The R Project for Statistical Computing, http://www.r-project.org/
    [37].Yeung, C. M. A., Liccardi, I., Lu, K., Seneviratne, O., & Berners-Lee, T. (2009, January). Decentralization: The future of online social networking. In W3C Workshop on the Future of Social Networking Position Papers (Vol. 2, pp. 2-7).
    Description: 碩士
    國立政治大學
    資訊科學學系
    95971012
    102
    Source URI: http://thesis.lib.nccu.edu.tw/record/#G0095971012
    Data Type: thesis
    Appears in Collections:[資訊科學系] 學位論文

    Files in This Item:

    File SizeFormat
    101201.pdf9081KbAdobe PDF2270View/Open


    All items in 政大典藏 are protected by copyright, with all rights reserved.


    社群 sharing

    著作權政策宣告 Copyright Announcement
    1.本網站之數位內容為國立政治大學所收錄之機構典藏,無償提供學術研究與公眾教育等公益性使用,惟仍請適度,合理使用本網站之內容,以尊重著作權人之權益。商業上之利用,則請先取得著作權人之授權。
    The digital content of this website is part of National Chengchi University Institutional Repository. It provides free access to academic research and public education for non-commercial use. Please utilize it in a proper and reasonable manner and respect the rights of copyright owners. For commercial use, please obtain authorization from the copyright owner in advance.

    2.本網站之製作,已盡力防止侵害著作權人之權益,如仍發現本網站之數位內容有侵害著作權人權益情事者,請權利人通知本網站維護人員(nccur@nccu.edu.tw),維護人員將立即採取移除該數位著作等補救措施。
    NCCU Institutional Repository is made to protect the interests of copyright owners. If you believe that any material on the website infringes copyright, please contact our staff(nccur@nccu.edu.tw). We will remove the work from the repository and investigate your claim.
    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - Feedback