Tuning Apache Spark Jobs the Easy Way: Web UI Stage Detail View

Apache Spark has proven an efficient and accessible platform for distributed computation. In some areas, it almost approaches the Holy Grail of making parallelization “automagic” – something we human programmers appreciate precisely because we are rarely good at it.

Nonetheless, although it is easy to get something to run on Spark, it is not always easy to tell whether it’s running optimally, nor – if we get a sense that something isn’t right – how to fix it. For example, a classic Spark puzzler is the batch job that runs the same code on the same sort of cluster and similar data night after night … but every so often it seems to take much longer to finish. What could be going on?

Author: Adam Breindel

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s