Articles

Introduction to Apache Spark: Big Data Analytics Simplified

Initially developed at U.C. Berkeley’s AMPLab in 2009, Apache Spark is a “lightning-fast unified analytics engine” for large-scale data processing. It can be used with cluster computing platforms such as Hadoop, Mesos, Kubernetes, or as a standalone cluster deployment.

It can also access data from a wide variety of sources including Hadoop Distributed File System (HDFS), Cassandra, and Hive. In this article, we’ll dive into Spark, its libraries, and why it has grown into one of the most popular distributed processing frameworks in the industry. If you’re new to the world of Big Data, I highly recommend you read up on the Hadoop ecosystem first to get an idea of how Spark fits into a Big Data analytics stack.

Source: business2community.com
Author: Yoshitaka Shiotsu

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s