Throughput optimization techniques for heterogeneous architectures

Derumigny, Nicolas, author; Pouchet, Louis-Noël, advisor; Rastello, Fabrice, advisor; Hack, Sebastian, committee member; Rohou, Erven, committee member; Malaiya, Yashwant, committee member; Ortega, Francisco, committee member; Pétrot, Frédéric, committee member; Wilson, James, committee member; Zaks, Ayal, committee member

Throughput optimization techniques for heterogeneous architectures

dc.contributor.author	Derumigny, Nicolas, author
dc.contributor.author	Pouchet, Louis-Noël, advisor
dc.contributor.author	Rastello, Fabrice, advisor
dc.contributor.author	Hack, Sebastian, committee member
dc.contributor.author	Rohou, Erven, committee member
dc.contributor.author	Malaiya, Yashwant, committee member
dc.contributor.author	Ortega, Francisco, committee member
dc.contributor.author	Pétrot, Frédéric, committee member
dc.contributor.author	Wilson, James, committee member
dc.contributor.author	Zaks, Ayal, committee member
dc.date.accessioned	2024-05-27T10:32:46Z
dc.date.available	2024-05-27T10:32:46Z
dc.date.issued	2024
dc.description	Abstract in English and French.
dc.description.abstract	Moore's Law has allowed during the past 40 years to exponentially increase transistor density of integrated circuits. As a result, computing devices ranging from general-purpose processors to dedicated accelerators have become more and more complex due to the specialization and the multiplication of their compute units. Therefore, both low-level program optimization (e.g. assembly-level programming and generation) and accelerator design must solve the issue of efficiently mapping the input program computations to the various chip capabilities. However, real-world chip blueprints are not openly accessible in practice, and their documentation is often incomplete. Given the diversity of CPUs available (Intel's / AMD's / Arm's microarchitectures), we tackle in this manuscript the problem of automatically inferring a performance model applicable to fine-grain throughput optimization of regular programs. Furthermore, when order of magnitude of performance gain over generic accelerators are needed, domain-specific accelerators must be considered; which raises the same question of the number of dedicated units as well as their functionality. To remedy this issue, we present two complementary approaches: on one hand, the study of single-application specialized accelerators with an emphasis on hardware reuse, and, on the other hand, the generation of semi-specialized designs suited for a user-defined set of applications.
dc.format.medium	born digital
dc.format.medium	doctoral dissertations
dc.identifier	Derumigny_colostate_0053A_18202.pdf
dc.identifier.uri	https://hdl.handle.net/10217/238465
dc.identifier.uri	https://doi.org/10.25675/3.02989
dc.language	English
dc.language.iso	eng
dc.publisher	Colorado State University. Libraries
dc.relation.ispartof	2020-
dc.rights	Copyright and other restrictions may apply. User is responsible for compliance with all applicable laws. For information about copyright law, please see https://libguides.colostate.edu/copyright.
dc.title	Throughput optimization techniques for heterogeneous architectures
dc.type	Text
dcterms.rights.dpla	This Item is protected by copyright and/or related rights (https://rightsstatements.org/vocab/InC/1.0/). You are free to use this Item in any way that is permitted by the copyright and related rights legislation that applies to your use. For other uses you need to obtain permission from the rights-holder(s).
thesis.degree.discipline	Computer Science
thesis.degree.grantor	Colorado State University
thesis.degree.level	Doctoral
thesis.degree.name	Doctor of Philosophy (Ph.D.)

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Derumigny_colostate_0053A_18202.pdf
Size:: 2.93 MB
Format:: Adobe Portable Document Format

Download

Collections

2020-
Theses and Dissertations