Efficient intra-node Communication for Chip Multiprocessors

Henriksen, Torje Starbo

dc.contributor.advisor	Ha, Phuong Hoai
dc.contributor.advisor	Bongo, Lars Ailo
dc.contributor.advisor	Anshus, Otto
dc.contributor.author	Henriksen, Torje Starbo
dc.date.accessioned	2009-01-26T11:42:02Z
dc.date.available	2009-01-26T11:42:02Z
dc.date.issued	2008-10-15
dc.description.abstract	The microprocessor industry has reached limitations of sequential processing power due to power-efficiency and heat problems. With the integrated-circuit technology moving forward, chip-multithreading has become the trend, increasing parallel processing power. The shift of focus has resulted in the vast majority of supercomputers having chip-multiprocessors. While the high performance computing community has long written parallel applications using libraries such as MPI, the performance-characteristics have changed from the traditional uni-core cluster, to the current generation of multi-core clusters; more communication is between processes on the same node, and processes run on cores sharing hardware resources, such as cache and memory bus. We explore the possibilities of optimizing a widely used MPI implementation, Open MPI, to minimize communication overhead for communication between processes running on a single node. We take three approaches for optimization: First we measure the message-passing latency between the different cores, and reduce latency for large messages by keeping the sender and receiver synchronized. Second, we increase scalability by using two new queue-designs, reducing the number of communication queues that need to be polled to receive messages. Third, we experiment with mapping a parallel application to different cores, using only a single node. The mapping is done dynamically during runtime, with no prior knowledge of the application's communication pattern. Our results show that for large messages sent between cores sharing cache, message-passing latency can be significantly reduced. Results from running the NAS Parallel Benchmarks using the new queue-designs show that Open MPI can increase its scalability when running more than 64 processes on a single node. Our dynamic mapper performs close to our manual mapping, but rarely increases performance. We see from the experimental results, that the three techniques give performance increase in different scenarios. Combining techniques like these with other techniques, can be a key to unlocking the parallel performance for a broader range of parallel applications.	en
dc.format.extent	1830164 bytes
dc.format.extent	2072 bytes
dc.format.mimetype	application/pdf
dc.format.mimetype	text/plain
dc.identifier.uri	https://hdl.handle.net/10037/1735
dc.identifier.urn	URN:NBN:no-uit_munin_1504
dc.language.iso	eng	en
dc.publisher	Universitetet i Tromsø	en
dc.publisher	University of Tromsø	en
dc.rights.accessRights	openAccess
dc.rights.holder	Copyright 2008 The Author(s)
dc.subject.courseID	INF-3981	nor
dc.subject	VDP::Mathematics and natural science: 400::Information and communication science: 420::Theoretical computer science, programming languages and programming theory: 421	en
dc.subject	MPI	en
dc.title	Efficient intra-node Communication for Chip Multiprocessors	en
dc.type	Master thesis	en
dc.type	Mastergradsoppgave	en

File(s) in this item

Name:: thesis.pdf
Size:: 1.745Mb
Format:: PDF

View/Open

Name:: license.txt
Size:: 2.023Kb
Format:: Text file

View/Open

This item appears in the following collection(s)

Mastergradsoppgaver i teknologi - informatikk [148]

Show simple item record