政大機構典藏-National Chengchi University Institutional Repository(NCCUR):Item 140.119/56888
English  |  正體中文  |  简体中文  |  Post-Print筆數 : 27 |  Items with full text/Total items : 113318/144297 (79%)
Visitors : 51079126      Online Users : 944
RC Version 6.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
Scope Tips:
  • please add "double quotation mark" for query phrases to get precise results
  • please goto advance search for comprehansive author search
  • Adv. Search
    HomeLoginUploadHelpAboutAdminister Goto mobile version
    Please use this identifier to cite or link to this item: https://nccur.lib.nccu.edu.tw/handle/140.119/56888


    Title: 基於MapReduce之雲端運算下具地域特性之動態排程
    Dynamic locality driven scheduler for mapreduce based cloud computing
    Authors: 陳耀宗
    Chen, Yao Chung
    Contributors: 連耀南
    Lien, Yao Nan
    陳耀宗
    Chen, Yao Chung
    Keywords: 雲端運算
    動態排程
    Cloud Computing
    Dynamic Locality Driven Scheduler
    Date: 2011
    Issue Date: 2013-02-01 16:53:37 (UTC+8)
    Abstract: MapReduce 是目前最熱門的雲端技術之一,用來處理大量資料,不論資料探勘、非結構化的紀錄檔、網頁索引處理及其他需要大量資料處理的科學研究,都可透過 MapReduce 得到極佳的執行效率。MapReduce 為一分散式批次資料處理程式框架,將一個工作分解為許多較小的 map 任務以及 reduce 任務,由map 處理每個小問題,再由reduce將問題彙整,得到最終的結果。
    Hadoop 是一個開放原始碼的 MapReduce 架構,並且被廣泛地應用在以大規模資料運算為主的雲端計算。Hadoop有一個非常重要的元件稱為scheduler ,是 hadoop的中樞,負責調度、指派任務和資源分配的優先順序。Scheduler的任務選擇與分配方式,將會影響 MapReduce 工作的執行效率與整個叢集的使用率,目前Hadoop預設的scheduler是將任務以先進先出(FIFO)的方式進行排程。提升MapReduce運算效能的挑戰之一為如何適當的分配Mapper 和 Reducer給雲端裡的每個節點來執行。儘管過去已經有許多改善MapReduce運算效能的研究,但是大部分的方法在實際的運作中,仍存在很多的問題,如工作節點的動態負載、data locality的問題,計算節點的異質性等等。我們發現目前Hadoop對於這些問題並沒有妥善處理,並且在相關的情況下,整體效能仍有改進空間。
    我們提出Data Locality Driven Scheduler(DLDS)的方法,並實踐在 Hadoop上,試圖提高scheduler的效能。我們設計不同的實驗,比較DLDS在不同狀況下和其他的排程演算法的差異。實驗結果顯示,透過提高資料的地域性,平均可提昇10% 至 15% 的效能。
    MapReduce is programming model for processing large data set. It is typically used to do distributed computing on clusters of computers such as Cloud computing platform. Examples of bit data set include unstructured logs, web indexing, scientific data, surveillance data, etc.
    MapReduce is a distributed processing program framework, a computing job is broken down into many smaller Map tasks and a Reduce task.Each Map task processes a partition of the given data set and Reduce aggregates the results of Maps to produce final result.
    Hadoop is an open-source MapReduce architecture, and is widely used in many cloud-based services.To best utilize computing resource in a cloud server, a task scheduler is essential to assign tasks to appropriate processors as well as to prioritize resource allocation. The default scheduler of Hadoop is first-in-first-out (FIFO) scheduler which is simple but has a performance inefficiency yet to be improved. Although there have been many researches aiming to improve the performance of MapReduce platform in the past year, there still have many issues hindering the performance improvement, such as dynamic load balance, data locality, and heterogeneity of computing nodes.
    To improve data locality, we propose a new scheduler called Data Locality Driven Scheduler (DLDS) based on Hadoop platform. DLDS improve Hadoop`s performamce by allocating Map tasks as close as possible to the data block they are to process. We evaluated the proposed DLDS against several other schedulers by simulation on an 8 nodes real Hadoop system. Experimental results show that DLDS can improve data locality by 10-15%, which results in a significant performamce improvement.
    Reference: [1] Apache Software Foundation. FairScheduler, http://hadoop.apache.org/mapreduce/docs/r0.21.0/fair_scheduler.html.
    [2] Apache Software Foundation. CapacityScheduler http://hadoop.apache.org/mapreduce/docs/r0.21.0/capacity_scheduler.html
    [3] Big data,http://wikibon.org/wiki/v/Big_Data_Market_Size_and_Vendor_Revenues
    [4] Jeffrey Dean , Sanjay Ghemawat,"MapReduce: simplified data processing on large clusters, Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation", p.10-10, San Francisco, CA,December 06-08, 2004
    [5] Jinling Du, and Dalian Liu,"Hybrid Genetic Algorithm for the Multi-objective Flexible Schedu ling Problem," IEEE International Conference on Computational Intelligence and Security,Nanning, China,Dec.2010.
    [6] R.C.Eberhart, and J.Kennedy,"New Optimizer Using Particle Swarm Theory," Proc.Sixth International Symposium on Micro Machine and Human Science, Nagoya, Japan, Oct. 1995.
    [7] Garey, M. and Johnson, D, "Computers and Intractability. A Guide to the Theory of NP-Completeness," Freemann, San Francisco, 1979.
    [8] Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung, “The Google File System,” 19th ACM Symposium on Operating Systems Principles(SOSP), 2003
    [9] Hadoop,http://hadoop.apache.org/
    [10] Hadoop Distributed File System (HDFS) Architecture. [Online] Available: http://hadoop.apache.org/core/docs/current/hdfs design.html.
    [11] HBase, http://hadoop.apache.org/hbase/
    [12] Bahareh Jalili,and Mehrdad Dianati,"Application of Taboo Search and Genetic Algorithm in planning and optimization of UMTS radio networks," ACM International Wireless Communications and Mobile Computing Conference 6th, New York, USA, June 2010.
    [13] Karp, R.M.,”Reducibility among combinatorial,” in complexity of computer computations, R.E.Miller and J.W. Thatcher(eds),Plenum Press,NY 85-103,1972
    [14] Andréa Matsunaga, Maurício Tsugawa and José Fortes,“Programming Abstractions for Data Intensive Computing on Clouds and Grids,” IEEE Fourth International Conference on eScience, pp.489-493, 2008.
    [15] Peter Mell and Tim Grance, "The NIST Definition of Cloud Computing," National Institute of Standards and Technology, Information Technology Laboratory, Version 15, Oct.2009.
    [16] Chris Miceli, Michael Miceli, Shantenu Jha, Hartmut Kaiser, Andre Merzky, "Programming Abstractions for Data Intensive Computing on Clouds and Grids,” IEEE/ACM International Symposium on Cluster Computing and the Grid,pp.480-483, 2009.
    [17] María Luisa Santamaría, and Sebastià Galmé,"Multi-objective Simulated Annealing Approachfor Optimal Routing in Time-Driven Sensor Networks,"IEEE 19th Annual International Symposium on Modelling, Analysis, and Simulation of Computer and Telecommunication Systems, Singapore, July 2011.
    [18] OpenPBS.org. Torque resource manager. http://www.clusterresources.com/pages/products/torque-resource-manager.php.
    [19] Ren Qing-dao-er-ji, and Yuping Wang, Xiaojing Si, "An Improved Genetic Algorithm For Job Shop Scheduling Problem,"IEEE International Conference on Computational Intelligence and Security, Nanning, China, Dec. 2010.
    [20] Douglas Thain, Todd Tannenbaum, and Miron Livny, "Distributed Computing in Practice: The Condor Experience" Concurrency and Computation: Practice and Experience, Vol. 17, No. 2-4, pages 323-356, February-April, 2005.
    [21] M. Zaharia, A. Konwinski, A. D. Joseph, R. Katz, and I. Stoica, "Improving MapReduce performance in heterogeneous environments, " In OSDI’08: 8th USENIX Symposium on Operating Systems Design and Implementation, 2008.
    [22] M.Zaharia, D.Borthankur, J. Sarma, K. Elmellegy, S.Shenker, and I. Stoica, "Delay Scheduling:A Simple Technique for Achieving Locality and Fairness in cluster Scheduling," In EuroSys 2010, pp. 265-278. ACM, New York, 2010.
    Description: 碩士
    國立政治大學
    資訊科學學系
    97971010
    100
    Source URI: http://thesis.lib.nccu.edu.tw/record/#G0097971010
    Data Type: thesis
    Appears in Collections:[Department of Computer Science ] Theses

    Files in This Item:

    File SizeFormat
    index.html0KbHTML2393View/Open


    All items in 政大典藏 are protected by copyright, with all rights reserved.


    社群 sharing

    著作權政策宣告 Copyright Announcement
    1.本網站之數位內容為國立政治大學所收錄之機構典藏,無償提供學術研究與公眾教育等公益性使用,惟仍請適度,合理使用本網站之內容,以尊重著作權人之權益。商業上之利用,則請先取得著作權人之授權。
    The digital content of this website is part of National Chengchi University Institutional Repository. It provides free access to academic research and public education for non-commercial use. Please utilize it in a proper and reasonable manner and respect the rights of copyright owners. For commercial use, please obtain authorization from the copyright owner in advance.

    2.本網站之製作,已盡力防止侵害著作權人之權益,如仍發現本網站之數位內容有侵害著作權人權益情事者,請權利人通知本網站維護人員(nccur@nccu.edu.tw),維護人員將立即採取移除該數位著作等補救措施。
    NCCU Institutional Repository is made to protect the interests of copyright owners. If you believe that any material on the website infringes copyright, please contact our staff(nccur@nccu.edu.tw). We will remove the work from the repository and investigate your claim.
    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - Feedback