Implications of storage subsystem interactions on processing efficiency in data intensive computing

Koneru, Hanisha, author; Pallickara, Shrideep, advisor; Pallickara, Sangmi, committee member; Arabi, Mazdak, committee member

Implications of storage subsystem interactions on processing efficiency in data intensive computing

dc.contributor.author	Koneru, Hanisha, author
dc.contributor.author	Pallickara, Shrideep, advisor
dc.contributor.author	Pallickara, Sangmi, committee member
dc.contributor.author	Arabi, Mazdak, committee member
dc.date.accessioned	2016-01-11T15:13:37Z
dc.date.available	2016-01-11T15:13:37Z
dc.date.issued	2015
dc.description.abstract	Processing frameworks such as MapReduce allow development of programs that operate on voluminous on-disk data. These frameworks typically include support for multiple file/storage subsystems. This decoupling of processing frameworks from the underlying storage subsystem provides a great deal of flexibility in application development. However, as we demonstrate, this flexibility often exacts a price: performance. Given the data volumes, storage subsystems (such as HDFS, MongoDB, and HBase) disperse datasets over a collection of machines. Storage subsystems manage complexity relating to preservation of consistency, redundancy, failure recovery, throughput, and load balancing. Preserving these properties involve message exchanges between distributed subsystem components, updates to in-memory data structures, data movements, and coordination as datasets are staged and system conditions change. Storage subsystems prioritize these properties differently, leading to vastly different network, disk, memory, and CPU footprints for staging and accessing the same dataset. This thesis proposes a methodology for comparing and identifying the storage subsystem suited for the processing that is being performed on a dataset. We profile the network I/O, disk I/O, memory, and CPU costs introduced by a storage subsystem during data staging, data processing, and generation of results. We perform this analysis with different storage subsystems and applications with different disk-I/O to CPU processing ratios.
dc.format.medium	born digital
dc.format.medium	masters theses
dc.identifier	Koneru_colostate_0053N_13265.pdf
dc.identifier.uri	http://hdl.handle.net/10217/170296
dc.identifier.uri	https://doi.org/10.25675/3.024338
dc.language	English
dc.language.iso	eng
dc.publisher	Colorado State University. Libraries
dc.relation.ispartof	2000-2019
dc.rights	Copyright and other restrictions may apply. User is responsible for compliance with all applicable laws. For information about copyright law, please see https://libguides.colostate.edu/copyright.
dc.subject	big data
dc.subject	distributed storage systems
dc.subject	Hadoop MapReduce
dc.subject	HBase
dc.subject	HDFS
dc.title	Implications of storage subsystem interactions on processing efficiency in data intensive computing
dc.type	Text
dcterms.rights.dpla	This Item is protected by copyright and/or related rights (https://rightsstatements.org/vocab/InC/1.0/). You are free to use this Item in any way that is permitted by the copyright and related rights legislation that applies to your use. For other uses you need to obtain permission from the rights-holder(s).
thesis.degree.discipline	Computer Science
thesis.degree.grantor	Colorado State University
thesis.degree.level	Masters
thesis.degree.name	Master of Science (M.S.)

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Koneru_colostate_0053N_13265.pdf
Size:: 417.07 KB
Format:: Adobe Portable Document Format

Download

Collections

2000-2019
Theses and Dissertations