Introduction to Apache Spark: Big Data Analytics Simplified

Initially developed at U.C. Berkeley’s AMPLab in 2009, Apache Spark is a “lightning-fast unified analytics engine” for large-scale data processing. It can be used with cluster computing platforms such as Hadoop, Mesos, Kubernetes, or as a standalone cluster deployment.

It can also access data from a wide variety of sources including Hadoop Distributed File System (HDFS), Cassandra, and Hive. In this article, we’ll dive into Spark, its libraries, and why it has grown into one of the most popular distributed processing frameworks in the industry. If you’re new to the world of Big Data, I highly recommend you read up on the Hadoop ecosystem first to get an idea of how Spark fits into a Big Data analytics stack.

Author: Yoshitaka Shiotsu

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s