Loading...
|
Please use this identifier to cite or link to this item:
https://nccur.lib.nccu.edu.tw/handle/140.119/56888
|
Title: | 基於MapReduce之雲端運算下具地域特性之動態排程 Dynamic locality driven scheduler for mapreduce based cloud computing |
Authors: | 陳耀宗 Chen, Yao Chung |
Contributors: | 連耀南 Lien, Yao Nan 陳耀宗 Chen, Yao Chung |
Keywords: | 雲端運算 動態排程 Cloud Computing Dynamic Locality Driven Scheduler |
Date: | 2011 |
Issue Date: | 2013-02-01 16:53:37 (UTC+8) |
Abstract: | MapReduce 是目前最熱門的雲端技術之一,用來處理大量資料,不論資料探勘、非結構化的紀錄檔、網頁索引處理及其他需要大量資料處理的科學研究,都可透過 MapReduce 得到極佳的執行效率。MapReduce 為一分散式批次資料處理程式框架,將一個工作分解為許多較小的 map 任務以及 reduce 任務,由map 處理每個小問題,再由reduce將問題彙整,得到最終的結果。
Hadoop 是一個開放原始碼的 MapReduce 架構,並且被廣泛地應用在以大規模資料運算為主的雲端計算。Hadoop有一個非常重要的元件稱為scheduler ,是 hadoop的中樞,負責調度、指派任務和資源分配的優先順序。Scheduler的任務選擇與分配方式,將會影響 MapReduce 工作的執行效率與整個叢集的使用率,目前Hadoop預設的scheduler是將任務以先進先出(FIFO)的方式進行排程。提升MapReduce運算效能的挑戰之一為如何適當的分配Mapper 和 Reducer給雲端裡的每個節點來執行。儘管過去已經有許多改善MapReduce運算效能的研究,但是大部分的方法在實際的運作中,仍存在很多的問題,如工作節點的動態負載、data locality的問題,計算節點的異質性等等。我們發現目前Hadoop對於這些問題並沒有妥善處理,並且在相關的情況下,整體效能仍有改進空間。
我們提出Data Locality Driven Scheduler(DLDS)的方法,並實踐在 Hadoop上,試圖提高scheduler的效能。我們設計不同的實驗,比較DLDS在不同狀況下和其他的排程演算法的差異。實驗結果顯示,透過提高資料的地域性,平均可提昇10% 至 15% 的效能。 MapReduce is programming model for processing large data set. It is typically used to do distributed computing on clusters of computers such as Cloud computing platform. Examples of bit data set include unstructured logs, web indexing, scientific data, surveillance data, etc.
MapReduce is a distributed processing program framework, a computing job is broken down into many smaller Map tasks and a Reduce task.Each Map task processes a partition of the given data set and Reduce aggregates the results of Maps to produce final result.
Hadoop is an open-source MapReduce architecture, and is widely used in many cloud-based services.To best utilize computing resource in a cloud server, a task scheduler is essential to assign tasks to appropriate processors as well as to prioritize resource allocation. The default scheduler of Hadoop is first-in-first-out (FIFO) scheduler which is simple but has a performance inefficiency yet to be improved. Although there have been many researches aiming to improve the performance of MapReduce platform in the past year, there still have many issues hindering the performance improvement, such as dynamic load balance, data locality, and heterogeneity of computing nodes.
To improve data locality, we propose a new scheduler called Data Locality Driven Scheduler (DLDS) based on Hadoop platform. DLDS improve Hadoop`s performamce by allocating Map tasks as close as possible to the data block they are to process. We evaluated the proposed DLDS against several other schedulers by simulation on an 8 nodes real Hadoop system. Experimental results show that DLDS can improve data locality by 10-15%, which results in a significant performamce improvement. |
Reference: | [1] Apache Software Foundation. FairScheduler, http://hadoop.apache.org/mapreduce/docs/r0.21.0/fair_scheduler.html.
[2] Apache Software Foundation. CapacityScheduler http://hadoop.apache.org/mapreduce/docs/r0.21.0/capacity_scheduler.html
[3] Big data,http://wikibon.org/wiki/v/Big_Data_Market_Size_and_Vendor_Revenues
[4] Jeffrey Dean , Sanjay Ghemawat,"MapReduce: simplified data processing on large clusters, Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation", p.10-10, San Francisco, CA,December 06-08, 2004
[5] Jinling Du, and Dalian Liu,"Hybrid Genetic Algorithm for the Multi-objective Flexible Schedu ling Problem," IEEE International Conference on Computational Intelligence and Security,Nanning, China,Dec.2010.
[6] R.C.Eberhart, and J.Kennedy,"New Optimizer Using Particle Swarm Theory," Proc.Sixth International Symposium on Micro Machine and Human Science, Nagoya, Japan, Oct. 1995.
[7] Garey, M. and Johnson, D, "Computers and Intractability. A Guide to the Theory of NP-Completeness," Freemann, San Francisco, 1979.
[8] Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung, “The Google File System,” 19th ACM Symposium on Operating Systems Principles(SOSP), 2003
[9] Hadoop,http://hadoop.apache.org/
[10] Hadoop Distributed File System (HDFS) Architecture. [Online] Available: http://hadoop.apache.org/core/docs/current/hdfs design.html.
[11] HBase, http://hadoop.apache.org/hbase/
[12] Bahareh Jalili,and Mehrdad Dianati,"Application of Taboo Search and Genetic Algorithm in planning and optimization of UMTS radio networks," ACM International Wireless Communications and Mobile Computing Conference 6th, New York, USA, June 2010.
[13] Karp, R.M.,”Reducibility among combinatorial,” in complexity of computer computations, R.E.Miller and J.W. Thatcher(eds),Plenum Press,NY 85-103,1972
[14] Andréa Matsunaga, Maurício Tsugawa and José Fortes,“Programming Abstractions for Data Intensive Computing on Clouds and Grids,” IEEE Fourth International Conference on eScience, pp.489-493, 2008.
[15] Peter Mell and Tim Grance, "The NIST Definition of Cloud Computing," National Institute of Standards and Technology, Information Technology Laboratory, Version 15, Oct.2009.
[16] Chris Miceli, Michael Miceli, Shantenu Jha, Hartmut Kaiser, Andre Merzky, "Programming Abstractions for Data Intensive Computing on Clouds and Grids,” IEEE/ACM International Symposium on Cluster Computing and the Grid,pp.480-483, 2009.
[17] María Luisa Santamaría, and Sebastià Galmé,"Multi-objective Simulated Annealing Approachfor Optimal Routing in Time-Driven Sensor Networks,"IEEE 19th Annual International Symposium on Modelling, Analysis, and Simulation of Computer and Telecommunication Systems, Singapore, July 2011.
[18] OpenPBS.org. Torque resource manager. http://www.clusterresources.com/pages/products/torque-resource-manager.php.
[19] Ren Qing-dao-er-ji, and Yuping Wang, Xiaojing Si, "An Improved Genetic Algorithm For Job Shop Scheduling Problem,"IEEE International Conference on Computational Intelligence and Security, Nanning, China, Dec. 2010.
[20] Douglas Thain, Todd Tannenbaum, and Miron Livny, "Distributed Computing in Practice: The Condor Experience" Concurrency and Computation: Practice and Experience, Vol. 17, No. 2-4, pages 323-356, February-April, 2005.
[21] M. Zaharia, A. Konwinski, A. D. Joseph, R. Katz, and I. Stoica, "Improving MapReduce performance in heterogeneous environments, " In OSDI’08: 8th USENIX Symposium on Operating Systems Design and Implementation, 2008.
[22] M.Zaharia, D.Borthankur, J. Sarma, K. Elmellegy, S.Shenker, and I. Stoica, "Delay Scheduling:A Simple Technique for Achieving Locality and Fairness in cluster Scheduling," In EuroSys 2010, pp. 265-278. ACM, New York, 2010. |
Description: | 碩士 國立政治大學 資訊科學學系 97971010 100 |
Source URI: | http://thesis.lib.nccu.edu.tw/record/#G0097971010 |
Data Type: | thesis |
Appears in Collections: | [資訊科學系] 學位論文
|
Files in This Item:
File |
Size | Format | |
index.html | 0Kb | HTML2 | 391 | View/Open |
|
All items in 政大典藏 are protected by copyright, with all rights reserved.
|