LEVERAGING SPARK PERFORMANCE WITH BOTTLENECK-AWARE TUNING, CACHING AND SCHEDULING
Driven by the rapid increase in the scale of data to be processed, massive clusters with data analytic frameworks such as Spark, operate around the clock to reduce operational cost, achieve higher resource utilization and improve processing throughput. However several performance issues remain. First, workload resource requirements are very diverse, running Spark with default parameter configuration will wrongly assume every workload behaves the same, yet tuning Spark is both time consuming and requires expert knowledge. Second, Spark performance is heavily dependent on in-memory computation to ...
(For more, see "View full record.")