Browsing by Author "Rajopadhye, Sanjay, committee member"
Now showing 1 - 13 of 13
Results Per Page
Sort Options
Item Open Access A compiler for hierarchical task-based programming on distributed-memory(Colorado State University. Libraries, 2022) Dubois, Alexandre, author; Pouchet, Louis-Noël, advisor; Rajopadhye, Sanjay, committee member; Wilson, James, committee memberToday, computation intensive applications are run on heterogeneous clusters of machines and use the Message Passing Interface (MPI), which provides a library interface for message-passing between non-shared memory computational resources but comes at a high application development cost. Task-based programming, such as the Concurrent Collection (CnC) model, makes parallelism implicit by only describing task dependencies. This model has recently been extended to model programs with a hierarchical task-based representation, which allows to view tasks at different levels of decomposition, allowing to dispatch tasks of different level optimally to the different available architectures. This thesis main work was to design an algorithm following a graph-based approach to transform a restricted class of single-level regular kernel to a two-level representation in the CnC model extended with hierarchical concepts. This transformation will alleviate performance boost by reducing communication cost and allowing the use of optimized tasks implementation at a coarse level. After describing the CnC programming model concepts, the structure of the proposed graph based tiling algorithm will be developed. Then, the compiler implementing this algorithm on an Intermediate Representation representing a CnC program and generating a C++ version of the kernel using a new hierarchical CnC runtime. Finally, the overhead of this runtime on a shared memory system for a 3D synthetic benchmark representing a classic linear-algebra dependency pattern. This characterization is done to help the user choose the target volume of tasks tile that has to be given as input of the tiling algorithm. The main recommendation is to target a minimization of the number of super- tasks in the runtime while keeping the number of sub-tasks in every super-task under the order of 10000 sub-tasks.Item Open Access A domain-protocol mapping based middleware for distributed application development(Colorado State University. Libraries, 2014) Mandalaparty, Sai Pradeep, author; France, Robert, advisor; Rajopadhye, Sanjay, committee member; Young, Peter, committee memberDistributed systems such as Internet of Things, Sensor Networks and Networked Control Systems are being used in various application domains, including industrial, environmental, medical and energy management domains. A distributed application in these domains may need to access data from different devices, where they may all be of the same type or a combination of different types. In addition, these devices may communicate through standardized protocols or proprietary interfaces. The development of such a distributed application may also require a team of developers with expertise in different disciplines. Therefore, the application development that involves heterogeneous devices and multidisciplinary teams can be made more effective by introducing an interface layer that shields developers from aspects of software and hardware heterogeneity. This work proposes a 'domain-protocol mapping' technique that is implemented as a middleware framework. The proposed mapping method maps the application data schema represented as object-oriented domain object to the appropriate communication protocol packet data and also updates the domain object from the response packet data. The middleware provides APIs for the domain experts to read the data from the device or to write the data to the device. The marshalling and unmarshalling process of the domain objects are hidden from the domain expert who may or may not be a software engineer. The use of the developed middleware is illustrated in two case-studies, one involving a simulation of distributed network controls for power system and the other involving integration of different types of power meters in power monitoring application.Item Open Access A performance evaluation of the coupling infrastructure within the Community Earth System Model™(Colorado State University. Libraries, 2018) Mickelson, Sheri A., author; Pouchet, Louis-Noel, advisor; Rajopadhye, Sanjay, committee member; Randall, David, committee memberEarth System models (ESMs) are complex software packages comprised of millions of lines of code used to simulate many different Earth processes. ESMs simulate the dynamical and physical states of the atmosphere, land, ocean, sea ice, rivers, and glaciers and coordinate the interactions between them. Many computational challenges exist within these models and future systems are putting more pressure on these challenges. In order to alleviate some of the pressure, it is important to study the performance challenges that exist within the models in order to understand the optimizations that need to be performed as we move to exascale systems. This work studies the performance of the coupling infrastructure between the modeling components within the Community Earth System Model. The coupler is responsible for the data exchanges between the different modeling components and while it has a small computational footprint, it has the potential to have a large impact on performance if the component resources are dispersed in incorrect proportions. This work explains and addresses this issue and provides easy solutions for users to save thousands of core cpu hours.Item Open Access Classifying simplicial dissections of convex polyhedra with symmetry(Colorado State University. Libraries, 2021) Mukthineni, Tarun, author; Betten, Anton, advisor; Peterson, Chris, committee member; Rajopadhye, Sanjay, committee memberA convex polyhedron is the convex hull of a finite set ofpoints in R3. A triangulation of a convex polyhedron is a decomposition into a finite number of 3-simplices such that any two intersect in a common face or are disjoint. A simplicial dissection is a decomposition into a finite number of 3-simplices such that no two share an interior point. We present an algorithm to classify the simplicial dissections of a regular polyhedron under the symmetry group of the polyhedron.Item Open Access Design and analysis of energy-efficient hierarchical electro-photonic network-on-chip architectures(Colorado State University. Libraries, 2015) Desai, Srinivas, author; Pasricha, Suddep, advisor; Rajopadhye, Sanjay, committee member; Malaiya, Yashwant K., committee memberFuture applications running on chip multiprocessors (CMPs) with tens to hundreds of cores on a chip will require an efficient inter-core communication strategy to achieve high performance. With recent demonstrations of feasibility in fabricating photonic components for on-chip communication, researchers are now focusing on photonic communication based on-chip networks for future CMPs. Photonic interconnects offer several benefits over conventional electrical on-chip interconnects, such as (1) high-bandwidth support by making use of dense wavelength division multiplexing, (2) distance independent power consumption, (3) significantly lower latency, and (4) improved performance-per-watt. Owing to these advantages, photonic interconnects are being considered as worthy alternatives for existing electrical networks. In this thesis, we design and explore a hierarchical electro-photonic network-on-chip (NoC) architecture called NOVA. NOVA aims to optimize several key design metrics such as throughput, latency, energy-delay-product, and power, which determine the overall system performance of a CMP. NOVA has three levels of communication hierarchy. The first level has a broadband-resonator based photonic switch. The second level consists of a low-loss, silicon-nitride arrayed waveguide grating based router. The last level of the hierarchy is made up of photonic ring waveguides. We have modeled and simulated multiple configurations of the proposed architecture with different designs of the photonic switch and several arbitration techniques on the photonic rings. This comprehensive analysis of NOVA allows us to arrive at an optimal configuration of the network for a given set of input applications and CMP platform. Finally, experimental results are strong indicators for considering the proposed architecture, as the improvements achieved were up to 6.1×, 55%, 5×, and 5.9× in terms of throughput, latency, energy-delay-product, and power compared to other state-of-the-art photonic NoC architectures.Item Open Access Element rearrangement for action classification on product manifolds(Colorado State University. Libraries, 2013) Kadappan, Karthik, author; Beveridge, J. Ross, advisor; Maciejewski, Anthony A., committee member; Peterson, Chris, committee member; Rajopadhye, Sanjay, committee memberConventional tensor-based classification algorithms unfold tensors into matrices using the standard mode-k unfoldings and perform classification using established machine learning algorithms. These methods assume that the standard mode-k unfolded matrices are the best 2-dimensional representations of N-dimensional structures. In this thesis, we ask the question: "Is there a better way to unfold a tensor?" To address this question, we design a method to create unfoldings of a tensor by rearranging elements in the original tensor and then applying the standard mode-k unfoldings. The rearrangement of elements in a tensor is formulated as a combinatorial optimization problem and tabu search is adapted in this work to solve it. We study this element rearrangement problem in the context of tensor-based action classification on product manifolds. We assess the proposed methods using a publicly available video data set, namely Cambridge-Gesture data set. We design several neighborhood structures and search strategies for tabu search and analyze their performance. Results reveal that the proposed element rearrangement algorithm developed in this thesis can be employed as a preprocessing step to increase classification accuracy in the context of action classification on product manifolds method.Item Open Access Generalized full sparse tiling of loop chains(Colorado State University. Libraries, 2013) Krieger, Christopher D., author; Strout, Michelle Mills, advisor; Böhm, Wim, committee member; Rajopadhye, Sanjay, committee member; Mueller, Jennifer, committee memberComputer and computational scientists are tackling increasingly large and complex problems and are seeking ways of improving the performance of their codes. The key issue faced is how to reach an effective balance between parallelism and locality. In trying to reach this balance, a problem commonly encountered is that of ascertaining the data dependences. Approaches that rely on automatic extraction of data dependences are frequently stymied by complications such as interprocedural and alias analysis. Placing the dependence analysis burden upon the programmer creates a significant barrier to adoption. In this work, we present a new programming abstraction, the loop chain, that specifies a series of loops and the data they access. Given this abstraction, a compiler, inspector, or runtime optimizer can avoid the computationally expensive process of formally determining data dependences, yet still determine beneficial and legal data and iteration reorderings. One optimization method that has been previously applied to irregular scientific codes is full sparse tiling. Full sparse tiling has been used to improve the performance of a handful of scientific codes, but in each case the technique had to be applied from scratch by an expert after careful manual analysis of the possible data dependence patterns. The full sparse tiling approach was extended and generalized as part of this work to apply to any code represented by the loop chain abstraction. Using only the abstraction, the generalized algorithm can produce a new data and iteration ordering as well as a parallel execution schedule. Insight into tuning a generalized full sparse tiled application was gained through a study of the different factors influencing tile count. This work lays the foundation for an efficient autotuning approach to optimizing tile count.Item Open Access Identification of regular patterns within sparse data structures(Colorado State University. Libraries, 2020) Augustine, Travis, author; Pouchet, Louis-Noël, advisor; Rajopadhye, Sanjay, committee member; Bohm, Anton, committee member; Wilson, James, committee memberSparse matrix-vector multiplication (SpMV) is an essential computation in linear algebra. There is a well-known trade-off between operating on a dense or a sparse structure when performing SpMV. In the dense version of SpMV, useless operations are performed but the computation is amenable SIMD vectorization. In the sparse version, only useful operations are executed. However, an indirection array must be used, thus hindering the compiler's ability to perform optimizations that exploit the vector units available on the majority of modern processors. Our process automatically builds sets of regular sub-computations from the irregular sparse data structure. We mine for regular regions in the irregular data structure, grouping together non-contiguous points from the reorderable set of coordinates representing the sparse structure. The coordinates become partitioned into groupings of coordinates of pre-defined shapes using polyhedra. This partition models the exact same points from the input set of coordinates in a way that is specialized to the input's sparsity pattern. Once we have obtained a partition of the points into sets of polyhedra, we then scan these polyhedra to synthesize code that does not store any coordinates of zero-valued elements and does not require any indirection array to access data, thus making it amenable to SIMD vectorization.Item Open Access Modern considerations for the use of naive Bayes in the supervised classification of genetic sequence data(Colorado State University. Libraries, 2021) Lakin, Steven M., author; Abdo, Zaid, advisor; Rajopadhye, Sanjay, committee member; Stenglein, Mark, committee member; Stewart, Jane, committee memberGenetic sequence classification is the task of assigning a known genetic label to an unknown genetic sequence. Often, this is the first step in genetic sequence analysis and is critical to understanding data produced by molecular techniques like high throughput sequencing. Here, we explore an algorithm called naive Bayes that was historically successful in classifying 16S ribosomal gene sequences for microbiome analysis. We extend the naive Bayes classifier to perform the task of general sequence classification by leveraging advancements in computational parallelism and the statistical distributions that underlie naive Bayes. In Chapter 2, we show that our implementation of naive Bayes, called WarpNL, performs within a margin of error of modern classifiers like Kraken2 and local alignment. We discuss five crucial aspects of genetic sequence classification and show how these areas affect classifier performance: the query data, the reference sequence database, the feature encoding method, the classification algorithm, and access to computational resources. In Chapter 3, we cover the critical computational advancements introduced in WarpNL that make it efficient in a modern computing framework. This includes efficient feature encoding, introduction of a log-odds ratio for comparison of naive Bayes posterior estimates, description of schema for parallel and distributed naive Bayes architectures, and use of machine learning classifiers to perform outgroup sequence classification. Finally in Chapter 4, we explore a variant of the Dirichlet multinomial distribution that underlies the naive Bayes likelihood, called the beta-Liouville multinomial. We show that the beta-Liouville multinomial can be used to enhance classifier performance, and we provide mathematical proofs regarding its convergence during maximum likelihood estimation. Overall, this work explores the naive Bayes algorithm in a modern context and shows that it is competitive for genetic sequence classification.Item Open Access Number of 4-cycles of the genus 2 superspecial isogeny graph(Colorado State University. Libraries, 2024) Sworski, Vladimir P., author; Pries, Rachel, advisor; Hulpke, Alexander, committee member; Rajopadhye, Sanjay, committee member; Shoemaker, Mark, committee memberThe genus 2 superspecial degree-2 isogeny graph over a finite field of size p2 is a network graph whose vertices are constructed from genus 2 superspecial curves and whose edges are the degree 2 isogenies between them. Flynn and Ti discovered 4-cycles in the graph, which pose problems for applications in cryptography. Florit and Smith constructed an atlas which describes what the neighborhood of each vertex looks like. We wrote a program in SageMath that can calculate neighborhoods of these graphs for small primes. Much of our work is motivated by these computations. We examine the prevalence of 4-cycles in the graph and, motivated by work of Arpin, et al. in the genus 1 situation, in the subgraph called the spine. We calculate the number of 4-cycles that pass through vertices of 12 of the 14 kinds possible. This also resulted in constructing the neighborhood of all vertices two steps or fewer away for three special types of curves. We also establish conjectures about the number of vertices and cycles in small neighborhoods of the spine.Item Open Access Pruning and acceleration of deep neural networks(Colorado State University. Libraries, 2020) Thivagara Sarma, Janarthanan, author; Pouchet, Louis-Noël, advisor; Rajopadhye, Sanjay, committee member; Pasricha, Sudeep, committee member; Anderson, Chuck, committee memberDeep neural networks are computational and memory intensive applications. Many network pruning and compression solutions has been introduced to deploy inference of large trained models in limited memory and time critical systems. We proposed a new pruning methodology that assigns significance rank to the operations in the inference program and for a given capacity and operation budget, generate only the important operations to do the inference. Our approach has shown that, in many classical feed forward classification networks we can maintain almost the same accuracy as the original inference by executing less than half of the operations in the original program. We also proposed a methodology to improve the effective implementation of the output sparse computation, controllable by a threshold variable.Item Open Access The conjugacy extension problem(Colorado State University. Libraries, 2021) Afandi, Rebecca, author; Hulpke, Alexander, advisor; Achter, Jeff, committee member; Pries, Rachel, committee member; Rajopadhye, Sanjay, committee memberIn this dissertation, we consider R-conjugacy of integral matrices for various commutative rings R. An existence theorem of Guralnick states that integral matrices which are Zp-conjugate for every prime p are conjugate over some algebraic extension of Z. We refer to the problem of determining this algebraic extension as the conjugacy extension problem. We will describe our contributions to solving this problem. We discuss how a correspondence between Z-conjugacy classes of matrices and certain fractional ideal classes can be extended to the context of R-conjugacy for R an integral domain. In the case of integral matrices with a fixed irreducible characteristic polynomial, this theory allows us to implement an algorithm which tests for conjugacy of these matrices over the ring of integers of a specified number field. We also describe how class fields can be used to solve the conjugacy extension problem in some examples.Item Embargo The shape of sound: rendering interactive six-degrees-of-freedom audio in software(Colorado State University. Libraries, 2024) Rehberg, Daniel, author; Ortega, Francisco Raul, advisor; Rajopadhye, Sanjay, committee member; Malinin, Laura, committee memberSix-degrees-of-freedom (6DoF) audio is an area of growing interest in interactive software, but it has faced several challenges: it does not easily conform to object-based rendering when achieved with arrays of ambisonics microphones; prior studies rely on subjective metrics which do not clearly indicate how this additional audio interaction might aid a human in a localization task (an indication of enhanced spatial awareness of a sound event); and the ambisonics technique requires specialized equipment and recording space, as well as audio engineering expertise for setup and calibration to work properly. These factors limit the accessibility of 6DoF audio to be implemented in research experiments or within commercial products like videogames. My work has involved taking an interdisciplinary approach to design, prototype, and validate (with human subjects) an inherently object-based 6DoF rendering method. This method exploits computational geometry techniques and follows a rendering paradigm inspired by the programmable graphics pipeline to create 3D audio meshes which can be transformed in real time to dynamically render monaural audio samples – meaning the output of the method can still be input into contemporary audio filtering and spatialization functions/tools, like a head-related transfer function. This work includes two studies performed with human subjects as well as a breakdown of the rendering method and its prototype implementation. The results of the human-subject studies indicate clear advantages to localizing a spatial sound in 3D space compared to the contemporary three-degrees-of-freedom approach.