Repository logo

Determining disease outbreak influence from voluminous epidemiology data on enhanced distributed graph-parallel system




Shah, Naman, author
Pallickara, Sangmi Lee, advisor
Pallickara, Shrideep, committee member
Turk, Daniel E., committee member

Journal Title

Journal ISSN

Volume Title


Historically, catastrophe has resulted from large-scale epidemiological outbreaks in livestock populations. Efforts to prepare for these inevitable disasters are critical, and these efforts primarily involve the efficient use of limited available resources. Therefore, determining the relative influence of the entities involved in large-scale outbreaks is mandatory. Planning for outbreaks often involves executing compute-intensive disease spread simulations. To capture the probabilities of various outcomes, these simulations are executed several times over a collection of representative input scenarios, producing voluminous data. The resulting datasets contain valuable insights, including sequences of events that lead to extreme outbreaks. However, discovering and leveraging such information is also computationally expensive. This thesis proposes a distributed approach for aggregating and analyzing voluminous epidemiology data to determine the influential measure of the entities in a disease outbreak using the PageRank algorithm. Using the Disease Transmission Network (DTN) established in this research, planners or analysts can accomplish effective allocation of limited resources, such as vaccinations and field personnel, by observing the relative influential measure of the entities. To improve the performance of the analysis execution pipeline, an extension to the Apache Spark GraphX distributed graph-parallel system has been proposed.


Rights Access


distributed analytics
epidemiological PageRank
NAADSM influential analysis
enhanced distributed graph-parallel system
disease propagation network
extended Apache Spark Graphx


Associated Publications