Loading...
|
Please use this identifier to cite or link to this item:
https://nccur.lib.nccu.edu.tw/handle/140.119/152034
|
Title: | MCSH-2P:一種具有標準Lock API的高效MCS Lock演算法 MCSH-2P:An Efficient MCS Lock Algorithm with Standard Lock API |
Authors: | 李庭慶 Li, Ting-Ching |
Contributors: | 張宏慶 Jang, Hung-Chin 李庭慶 Li,Ting-Ching |
Keywords: | 自旋鎖 隊列鎖 MCS鎖 鎖介面 吞吐量 延遲 可擴展性 Spin locks Queued locks MCS lock Locking API Throughput Latency Scalability |
Date: | 2024 |
Issue Date: | 2024-07-01 12:30:03 (UTC+8) |
Abstract: | Concurrency以及Lock演算法是現代計算機科學的基礎,在多執行緒的系統下都免不了使用Lock演算法。MCS演算法是由Mellor-Crummy以及Scott提出高效率並且廣泛運用的Lock演算法。MCS演算法有First-Come First-Served以及Cache-Friendly等優秀性質,然而MCS演算法實作上會需要額外的參數來管理Lock的狀態,在API設計上也不符合標準Lock API。因此先後分別由不同研究者提出MCS-K42以及MCSH等標準Lock API演算法,這些演算法在吞吐量以及延遲的表現都劣於MCS演算法。因此本論文旨在提出新的演算法不僅有標準Lock API性質甚至能在吞吐量與延遲的表現皆能比肩MCS演算法。為了達成目標,本論文結合MCSH、MCS-K42以及QSpinlock等標準Lock API演算法的特性提出MCSH-2P演算法,提出的演算法在acquire方法中會類似MCSH演算法的acquire方法,release方法則會類似QSpinlock的release方法只有單純的store指令,而Lock結構則類似於MCS-K42演算法的兩個指標結構。為驗證本論文算法是否有達到預期目標,模擬實驗採用C++進行模擬,模擬實驗設計不同的執行緒競爭環境來確保真實的Lock演算法使用情形,從實驗結果可以發現本論文相較於其他有標準Lock API演算法整體上有更高的吞吐量以及更低的延遲,甚至與MCS演算法在吞吐量以及延遲的有相近的效能。最後提出本論文研究可改善之部分,將其歸納給未來研究學者可繼續往更高吞吐量、更低延遲及符合現實執行緒競爭環境等方向研究。 Concurrency and lock algorithms are fundamental in modern computer science, and the use of lock algorithms is inevitable in systems with multiple threads. The MCS algorithm, proposed by Mellor-Crummey and Scott, is an efficient and widely used lock algorithm. The MCS algorithm has excellent properties such as First-Come First-Served and Cache-Friendly. However, the implementation of the MCS algorithm requires additional parameters to manage the lock's state, and its API design does not conform to standard lock APIs. Therefore, researchers have proposed standard lock API algorithms such as MCS-K42 and MCSH, but these algorithms perform worse than the MCS algorithm in terms of throughput and latency. This paper aims to propose a new algorithm that not only has the properties of a standard lock API but also can match the performance of the MCS algorithm in terms of both throughput and latency. To achieve this goal, this paper combines the characteristics of standard lock API algorithms such as MCSH, MCS-K42, and QSpinlock to propose MCSH-2P lock algorithm. The proposed algorithm has an acquire method similar to the MCSH algorithm's acquire method and a release method similar to the QSpinlock's release method, which only uses a simple store instruction. The lock structure is similar to the two-pointer structure of the MCS-K42 algorithm. To verify whether the algorithm proposed in this paper achieves the expected goals, simulation experiments are conducted using C++. Different thread competition environments are designed in the simulation experiments to ensure a realistic use of lock algorithms. The experimental results show that compared to other standard lock API algorithms, the algorithm proposed in this paper has higher overall throughput and lower latency, and even has performance similar to the MCS algorithm in terms of both throughput and latency. Finally, suggestions for improvement are proposed, which can be summarized for future research scholars to continue research in the directions of higher throughput, lower latency, and compliance with realistic thread competition environments. |
Reference: | [1] M. Abadi and L. Lamport, "The existence of refinement mappings," Theoretical Computer Science, vol. 82, no. 2, pp. 253-284, 1991. [2] H. Akkan, M. Lang, and L. Ionkov, "HPC runtime support for fast and power efficient locking and synchronization," in 2013 IEEE International Conference on Cluster Computing (CLUSTER), 2013, pp. 1-7. [3] M. Auslander, D. Edelsohn, O. Krieger, B. Rosenburg, and R. Wisniewski, "Enhancement to the MCS lock for increased functionality and improved programmability," ed, 2003. [4] S. Boyd-Wickizer, M. F. Kaashoek, R. Morris, and N. Zeldovich, "Non-scalable locks are dangerous," in Proceedings of the Linux Symposium, 2012, pp. 119- 130. [5] D. Dice, "Malthusian locks," in Proceedings of the Twelfth European Conference on Computer Systems, 2017, pp. 314-327. [6] D. Dice, D. Hendler, and I. Mirsky, "Lightweight contention management for efficient compare-and-swap operations," in European Conference on Parallel Processing, 2013, pp. 595-606. [7] D. Dice and A. Kogan, "Compact NUMA-aware locks," in Proceedings of the Fourteenth EuroSys Conference 2019, 2019, pp. 1-15. [8] D. Dice and A. Kogan, "TWA--Ticket Locks Augmented with a Waiting Array," in Euro-Par 2019: Parallel Processing: 25th International Conference on Parallel and Distributed Computing, G{\"o}ttingen, Germany, August 26--30, 2019, Proceedings 25, 2019, pp. 334-345. [9] D. Dice and A. Kogan, "Fissile locks," in International Conference on Networked Systems, 2020, pp. 192-208. [10] D. Dice and A. Kogan, "Hemlock: Compact and scalable mutual exclusion," in Proceedings of the 33rd ACM Symposium on Parallelism in Algorithms and Architectures, 2021, pp. 173-183. [11] D. Dice, V. J. Marathe, and N. Shavit, "Lock cohorting: a general technique for designing NUMA locks," ACM SIGPLAN Notices, vol. 47, no. 8, pp. 247-256, 2012. [12] E. W. Dijkstra, "Cooperating Sequential Processes, Technical Report EWD123," 1965. [13] E. W. Dijkstra, "Solution of a problem in concurrent programming control," Commun. ACM, vol. 8, no. 9, p. 569, sep 1965. [Online]. Available: https://doi.org/10.1145/365559.365617. [14] U. Drepper, "Futexes are tricky," Futexes are Tricky, Red Hat Inc, Japan, vol. 4, 2005. [15] R. Dvir and G. Taubenfeld, "Mutual exclusion algorithms with constant RMR complexity and wait-free exit code," in 21st International Conference on Principles of Distributed Systems (OPODIS 2017), 2018. [16] B. Falsafi, R. Guerraoui, J. Picorel, and V. Trigonakis, "Unlocking energy," in 2016 USENIX Annual Technical Conference (USENIX ATC 16), 2016, pp. 393- 406. [17] R. Guerraoui, H. Guiroux, R. Lachaize, V. Quma, and V. Trigonakis, "Lock-- unlock: Is that all? a pragmatic analysis of locking in software systems," ACM Transactions on Computer Systems (TOCS), vol. 36, no. 1, pp. 1-149, 2019. [18] W. H. Hesselink and P. A. Buhr, "MCSH, a lock with the standard interface," ACM Transactions on Parallel Computing, vol. 10, no. 2, pp. 1-23, 2023. [19] P. Jayanti, S. Jayanti, and S. Jayanti, "Towards an ideal queue lock," in Proceedings of the 21st International Conference on Distributed Computing and Networking, 2020, pp. 1-10. [20] S. Kashyap, I. Calciu, X. Cheng, C. Min, and T. Kim, "Scalable and practical locking with shuffling," in Proceedings of the 27th ACM Symposium on Operating Systems Principles, 2019, pp. 586-599. [21] L. Lamport, "A fast mutual exclusion algorithm," ACM Transactions on Computer Systems (TOCS), vol. 5, no. 1, pp. 1-11, 1987. [22] H. Lee, Local-spin mutual exclusion algorithms on the DSM model using fetch store objects. 2003. [23] W. Long, "qspinlock: Introducing a 4-byte queue spinlock implementation," Articles/561775, 2013. [24] P. Magnusson, A. Landin, and E. Hagersten, "Queue locks on cache coherent multiprocessors," in Proceedings of 8th International Parallel Processing Symposium, 1994, pp. 165-171. [25] J. M. Mellor-Crummey and M. L. Scott, "Algorithms for scalable synchronization on shared-memory multiprocessors," ACM Transactions on Computer Systems (TOCS), vol. 9, no. 1, pp. 21-65, 1991. [26] Y. Patel, L. Yang, L. Arulraj, A. C. Arpaci-Dusseau, R. H. Arpaci-Dusseau, and M. M. Swift, "Avoiding scheduler subversion using scheduler-cooperative locks," in Proceedings of the Fifteenth European Conference on Computer Systems, 2020, pp. 1-17. [27] D. P. Reed and R. K. Kanodia, "Synchronization with eventcounts and sequencers," Communications of the ACM, vol. 22, no. 2, pp. 115-123, 1979. [28] I. Rhee, "Optimizing a FIFO, scalable spin lock using consistent memory," in 17th IEEE Real-Time Systems Symposium, 1996, pp. 106-114. [29] L. Rudolph and Z. Segall, "Dynamic decentralized cache schemes for MIMD parallel processors," in Proceedings of the 11th Annual International Symposium on Computer Architecture, 1984, pp. 340-347. [30] M. L. Scott, "Non-blocking timeout in scalable queue-based spin locks," in Proceedings of the twenty-first annual symposium on Principles of distributed computing, 2002, pp. 31-40. [31] M. L. Scott, Shared-memory synchronization. Morgan \& Claypool Publishers, 2013. [32] M. L. Scott and W. N. Scherer, "Scalable queue-based spin locks with timeout," ACM SIGPLAN Notices, vol. 36, no. 7, pp. 44-52, 2001. [33] L. B. Torvalds, "No nuances, just buggy code (was: related to Spinlock implementation and the Linux Scheduler)," ed: \url{https://www.realworldtech.com/forum/?threadid=189711&curpostid=189 723}, 2020. [34] T. Wang, M. Chabbi, and H. Kimura, "Be my guest: MCS lock now welcomes guests," in Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2016, pp. 1-12 |
Description: | 碩士 國立政治大學 資訊科學系 109753209 |
Source URI: | http://thesis.lib.nccu.edu.tw/record/#G0109753209 |
Data Type: | thesis |
Appears in Collections: | [資訊科學系] 學位論文
|
Files in This Item:
File |
Description |
Size | Format | |
320901.pdf | | 2545Kb | Adobe PDF | 1 | View/Open |
|
All items in 政大典藏 are protected by copyright, with all rights reserved.
|