Loading...
|
Please use this identifier to cite or link to this item:
https://nccur.lib.nccu.edu.tw/handle/140.119/68266
|
Title: | 整合R與Hadoop/MapReduce來分析FOAF社群網路 Using R and Hadoop/MapReduce for FOAF-based Social Network Analytics |
Authors: | 孫肇祥 Sun, Jhao Siang |
Contributors: | 胡毓忠 Hu, Yuh Jong 孫肇祥 Sun, Jhao Siang |
Keywords: | RDF(S) R and Hadoop/MapReduce FOAF Hadoop MapReduce 社群網路分析 FOAF Hadoop MapReduce Social network analytics |
Date: | 2013 |
Issue Date: | 2014-08-06 11:47:06 (UTC+8) |
Abstract: | 分散式線上社群網路採用RDF(S)為基礎的FOAF格式於信任的第三方Hadoop cluster來儲存個人資料與其社群網絡。面臨大量的社群網路資料,傳統的分析方式將會遇到許多處理與儲存的問題。本研究透過結合R與Hadoop/MapReduce技術,提出三種分析方式:R + Hadoop Streaming (RHS), R + MySQL (RMS), R + Hive (RH)來解決分析大量FOAF資料運算與儲存的瓶頸。我們首先將FOAF資料集注入Hadoop cluster平台並利用MapReduce的分散式運算,預先消化大部分的資料以解決R統計軟體單機記憶體不足以應付大型檔案的問題,透過後續R的分析我們也同時解決MapReduce運算無法進行深層社群網路分析的問題。透過預先拆解的方式以可以處理更大的FOAF資料使其更有延展性。這個方法可以適用於非結構化或結構化資料。面對每日激增的社群網路資料,如何更進一步的結合R與Hadoop/MapReduce,並 使用HBase或是與既有R的平行化軟體作結合,也是日後可以努力研究的方向。 The decentralized online social networks are encoded as RDF(S)-based FOAF data format. These FOAF datasets, stored on the trusted Hadoop cluster, are used to represent Web users’ personal data and their social relationships. When using traditional data analysis techniques, we face numerous data processing and storing challenges. In this study, we apply three R and Hadoop/MapReduce integration techniques for high volume FOAF data analysis, including R + Hadoop Streaming (RHS), R + MySQL (RMS), and R + Hive (RH). We first ingest the FOAF datasets and pre-process these datasets through the MapReduce distributed programming paradigm. Then, apply R for FOAF data analysis. This resolves the major problems of impossibly reading high volume of big FOAF data into memory for R analysis and the limitation of social network analysis by using MapReduce computation. High volume of FOAF datasets can be distributed and stored effectively in the Hadoop platform for scalable data processing. The R + Hadoop/MapReduce techniques can be used for analysis on the structured and unstructured data. In the future study, the research issues will be on how to effectively integrate R and Hadoop/MapReduce and leverage the HBase or parallel R programming for high volume big data analytics. |
Reference: | [1].Apache Hadoop Project, http://hadoop.apache.org [2].Billion Triples Challenge 2012 Dataset, http://km.aifb.kit.edu/projects/btc-2012/ [3].Bizer, C., Heath, T., & Berners-Lee, T. (2009). Linked data-the story so far.International journal on semantic web and information systems, 5(3), 1-22. [4].Bonacich, P. (1987). Power and centrality: A family of measures. American journal of sociology, 1170-1182. [5].Chang, F., Dean, J., Ghemawat, S., Hsieh, W. C., Wallach, D. A., Burrows, M., ... & Gruber, R. E. (2008). Bigtable: A distributed storage system for structured data. ACM Transactions on Computer Systems (TOCS), 26(2), 4. [6].Daniel J. Weitzner . http://www.w3.org/People/Weitzner.html [7].Dean, J., & Ghemawat, S. (2008). MapReduce: simplified data processing on large clusters. Communications of the ACM, 51(1), 107-113. [8].Dean, J., & Ghemawat, S. (2008). MapReduce: simplified data processing on large clusters. Communications of the ACM, 51(1), 107-113. [9].Department Of Statistics, Purdue University (2012). Divide and Recombine (D&R) with RHIPE. Retrieved from http://www.datadr.org/. [10].Ding, L., Zhou, L., Finin, T., & Joshi, A. (2005, January). How the semantic web is being used: An analysis of foaf documents. In System Sciences, 2005. HICSS`05. Proceedings of the 38th Annual Hawaii International Conference on(pp. 113c-113c). IEEE. [11].Ding, L., Zhou, L., Finin, T., & Joshi, A. (2005, January). How the semantic web is being used: An analysis of foaf documents. In System Sciences, 2005. HICSS`05. Proceedings of the 38th Annual Hawaii International Conference on(pp. 113c-113c). IEEE. [12].Dirk Eddelbuettel(2014, July 7) . CRAN Task View: High-Performance and Parallel Computing with R , Retrieved July 7, 2014, from http://cran.r-project.org/web/views/HighPerformanceComputing.html [13].Erétéo, G., Gandon, F., Corby, O., & Buffa, M. (2009). Semantic social network analysis. arXiv preprint arXiv:0904.3701. [14].FOAF Vocabulary Specification 0.99/Namespace Document 14 January 2014 - Paddington Edition. http://xmlns.com/foaf/spec/ [15].Freeman, L. C. (1979). Centrality in social networks conceptual clarification. Social networks, 1(3), 215-239. [16].G. K. Zipf, Selected Studies of the Principle of Relative Frequency in Language. Harvard University Press, 1932 [17].Ghemawat, S., Gobioff, H., & Leung, S. T. (2003, October). The Google file system. In ACM SIGOPS Operating Systems Review (Vol. 37, No. 5, pp. 29-43). ACM. [18].Ghemawat, S., Gobioff, H., & Leung, S. T. (2003, October). The Google file system. In ACM SIGOPS Operating Systems Review (Vol. 37, No. 5, pp. 29-43). ACM. [19].Golbeck, J., & Rothstein, M. (2008, July). Linking Social Networks on the Web with FOAF: A Semantic Web Case Study. In AAAI (Vol. 8, pp. 1138-1143). [20].http://en.wikipedia.org/wiki/Information_Sciences_Institute [21].http://www.ldodds.com/foaf/foaf-a-matic.html [22].Jonathan Seidman .,& Ramesh Venkataramaiah (2011). Distributed Data Analysis with Hadoop and R. [23].Mori, J., Matsuo, Y., Ishizuka, M., & Faltings, B. (2004, September). Keyword extraction from the web for foaf metadata. In Proceedings of the 1st Workshop on Friend of a Friend, Social Networking and the (Semantic) Web. [24].MySQL database, http://www.mysql.com/ [25].MySQL Limits on Table Size, http://dev.mysql.com/doc/refman/5.1/en/table-size-limit.html [26].Paolillo, J. C., & Wright, E. (2004). The challenges of FOAF characterization. InProceedings of the 1st Workshop on Friend of a Friend, Social Networking and the (Semantic) Web. [27].Paolillo, J. C., & Wright, E. (2006). Social network analysis on the semantic web: Techniques and challenges for visualizing FOAF. In Visualizing the semantic web(pp. 229-241). Springer London. [28].Piccolboni, A. (2014,May 25) RevolutionAnalytics/RHadoop. Retrieved from https://github.com/RevolutionAnalytics/RHadoop/wiki. [29].Resource Description Framework (RDF), http://www.w3.org/RDF/ [30].Rickert, J. B. (2010). Big Data Analysis with Revolution R Enterprise. [31].Ryan R. Rosario(2010). Taking R to the Limit. Los Angeles R Users` Group [32].The Apache HBase, http://hbase.apache.org/ [33].The Apache Hive, https://hive.apache.org/ [34].The Apache ZooKeeper, http://zookeeper.apache.org/ [35].The Friend of a Friend (FOAF) project, http://www.foaf-project.org/ [36].The R Project for Statistical Computing, http://www.r-project.org/ [37].Yeung, C. M. A., Liccardi, I., Lu, K., Seneviratne, O., & Berners-Lee, T. (2009, January). Decentralization: The future of online social networking. In W3C Workshop on the Future of Social Networking Position Papers (Vol. 2, pp. 2-7). |
Description: | 碩士 國立政治大學 資訊科學學系 95971012 102 |
Source URI: | http://thesis.lib.nccu.edu.tw/record/#G0095971012 |
Data Type: | thesis |
Appears in Collections: | [資訊科學系] 學位論文
|
Files in This Item:
File |
Size | Format | |
101201.pdf | 9081Kb | Adobe PDF2 | 270 | View/Open |
|
All items in 政大典藏 are protected by copyright, with all rights reserved.
|