IBM, Cloudera join RStudio to create R interface to Apache Spark

R users can now use the popular dplyr package to tap into Apache Spark big data.

The new sparklyr package is a native dplyr interface to Spark, according to RStudio. After installing the package, users can “interactively manipulate Spark data using both dplyr and SQL (via DBI), according to an RStudio blog post, as well as “filter and aggregate Spark data sets then bring them into R for analysis and visualization.” There is also access to Spark distributed machine-learning algorithms.

Author: Sharon Machlis


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s