With support for stats language R, along with a range of new features, the latest update to in-memory data-processing engine Apache Spark is now out.
By providing access to the popular R statistical programming language, the latest iteration of fast-growing analytics cluster framework Spark is aiming to make life easier for data scientists. Along with support for Python 3, Spark 1.4, which is now generally available, lets R users work directly on large datasets through the SparkR R API. “Because SparkR uses Spark’s parallel engine underneath, operations take advantage of multiple cores or multiple machines, and can scale to data sizes much larger than standalone R programs,” says Patrick Wendell.
Author: Toby Wolpe