Browsing by Author "Wilson, James, committee member"
Now showing 1 - 5 of 5
- Results Per Page
- Sort Options
Item Open Access A compiler for hierarchical task-based programming on distributed-memory(Colorado State University. Libraries, 2022) Dubois, Alexandre, author; Pouchet, Louis-Noël, advisor; Rajopadhye, Sanjay, committee member; Wilson, James, committee memberToday, computation intensive applications are run on heterogeneous clusters of machines and use the Message Passing Interface (MPI), which provides a library interface for message-passing between non-shared memory computational resources but comes at a high application development cost. Task-based programming, such as the Concurrent Collection (CnC) model, makes parallelism implicit by only describing task dependencies. This model has recently been extended to model programs with a hierarchical task-based representation, which allows to view tasks at different levels of decomposition, allowing to dispatch tasks of different level optimally to the different available architectures. This thesis main work was to design an algorithm following a graph-based approach to transform a restricted class of single-level regular kernel to a two-level representation in the CnC model extended with hierarchical concepts. This transformation will alleviate performance boost by reducing communication cost and allowing the use of optimized tasks implementation at a coarse level. After describing the CnC programming model concepts, the structure of the proposed graph based tiling algorithm will be developed. Then, the compiler implementing this algorithm on an Intermediate Representation representing a CnC program and generating a C++ version of the kernel using a new hierarchical CnC runtime. Finally, the overhead of this runtime on a shared memory system for a 3D synthetic benchmark representing a classic linear-algebra dependency pattern. This characterization is done to help the user choose the target volume of tasks tile that has to be given as input of the tiling algorithm. The main recommendation is to target a minimization of the number of super- tasks in the runtime while keeping the number of sub-tasks in every super-task under the order of 10000 sub-tasks.Item Open Access Group action on neighborhood complexes of Cayley graphs(Colorado State University. Libraries, 2014) Hughes, Justin, author; Hulpke, Alexander, advisor; Peterson, Chris, advisor; Berger, Bruce, committee member; Cavalieri, Renzo, committee member; Wilson, James, committee memberGiven G a group generated by S ≐ {g1, …, gn}, one can construct the Cayley Graph Cayley (G,S). Given a distance set D ⊂ Z≥0 and Cayley (G,S) one can construct a D-neighborhood complex. This neighborhood complex is a simplicial complex to which we can associate a chain complex. The group G acts on this chain complex and this leads to an action on the homology of the chain complex. These group actions decompose into several representations of G. This thesis uses tools from group theory, representation theory, homo-logical algebra, and topology to further our understanding of the interplay between generated groups (i.e. a group together with a set of generators), corresponding representations on their associated D-neighborhood complexes, and the homology of the D-neighborhood complexes.Item Open Access Identification of regular patterns within sparse data structures(Colorado State University. Libraries, 2020) Augustine, Travis, author; Pouchet, Louis-Noël, advisor; Rajopadhye, Sanjay, committee member; Bohm, Anton, committee member; Wilson, James, committee memberSparse matrix-vector multiplication (SpMV) is an essential computation in linear algebra. There is a well-known trade-off between operating on a dense or a sparse structure when performing SpMV. In the dense version of SpMV, useless operations are performed but the computation is amenable SIMD vectorization. In the sparse version, only useful operations are executed. However, an indirection array must be used, thus hindering the compiler's ability to perform optimizations that exploit the vector units available on the majority of modern processors. Our process automatically builds sets of regular sub-computations from the irregular sparse data structure. We mine for regular regions in the irregular data structure, grouping together non-contiguous points from the reorderable set of coordinates representing the sparse structure. The coordinates become partitioned into groupings of coordinates of pre-defined shapes using polyhedra. This partition models the exact same points from the input set of coordinates in a way that is specialized to the input's sparsity pattern. Once we have obtained a partition of the points into sets of polyhedra, we then scan these polyhedra to synthesize code that does not store any coordinates of zero-valued elements and does not require any indirection array to access data, thus making it amenable to SIMD vectorization.Item Open Access Moduli spaces of rational graphically stable curves(Colorado State University. Libraries, 2021) Fry, Andy J., author; Cavalieri, Renzo, advisor; Shoemaker, Mark, committee member; Wilson, James, committee member; Tavani, Daniele, committee memberWe use a graph to define a new stability condition for the algebraic and tropical moduli spaces of rational curves. Tropically, we characterize when the moduli space has the structure of a balanced fan by proving a combinatorial bijection between graphically stable tropical curves and chains of flats of a graphic matroid. Algebraically, we characterize when the tropical compactification of the compact moduli space agrees with the theory of geometric tropicalization. Both characterization results occur only when the graph is complete multipartite.Item Open Access Throughput optimization techniques for heterogeneous architectures(Colorado State University. Libraries, 2024) Derumigny, Nicolas, author; Pouchet, Louis-Noël, advisor; Rastello, Fabrice, advisor; Hack, Sebastian, committee member; Rohou, Erven, committee member; Malaiya, Yashwant, committee member; Ortega, Francisco, committee member; Pétrot, Frédéric, committee member; Wilson, James, committee member; Zaks, Ayal, committee memberMoore's Law has allowed during the past 40 years to exponentially increase transistor density of integrated circuits. As a result, computing devices ranging from general-purpose processors to dedicated accelerators have become more and more complex due to the specialization and the multiplication of their compute units. Therefore, both low-level program optimization (e.g. assembly-level programming and generation) and accelerator design must solve the issue of efficiently mapping the input program computations to the various chip capabilities. However, real-world chip blueprints are not openly accessible in practice, and their documentation is often incomplete. Given the diversity of CPUs available (Intel's / AMD's / Arm's microarchitectures), we tackle in this manuscript the problem of automatically inferring a performance model applicable to fine-grain throughput optimization of regular programs. Furthermore, when order of magnitude of performance gain over generic accelerators are needed, domain-specific accelerators must be considered; which raises the same question of the number of dedicated units as well as their functionality. To remedy this issue, we present two complementary approaches: on one hand, the study of single-application specialized accelerators with an emphasis on hardware reuse, and, on the other hand, the generation of semi-specialized designs suited for a user-defined set of applications.