Big Data University Series – Part I: Big Data

Big Data University logo

In Part I of this series, highlights three “Big Data” courses that may be of interest to BI, Analytics & Data Science Professionals.

What is Big Data University?

As stated on the Big Data University website, Big Data University is an IBM initiative aimed at spreading big data literacy. Big Data University’s mission is to democratize access to practical skills for working with data by removing two big impediments: money and time. To that end, they have made everything you need to learn free. Free courses, free access to all the tools, free data – free everything and not for a few days or weeks – forever.  Big Data University courses are “self-paced”,  allowing you to take as long as you need to complete a course.

Big Data Courses:

  • Big Data Fundamentals – This course presents a holistic approach to Big Data, taking both a top-down and a bottom-up approach to questions such as: What is Big Data? How do we tackle Big Data? Why are we interested in it? What is a Big Data platform?  The course emphasizes that we study Big Data to gain insight that will be used to get  people throughout the enterprise to run the business better and to provide better service to customers. Rather than a implementation of a single open-source systems such as Hadoop, the course recommends that Big Data should be processed in a platform that can handle the variety, velocity, and volume of data by using a family of components that require integration and data governance.  Big Data is NoHadoop (“not only Hadoop”) as well as NoSQL (“not only SQL”).
  • Hadoop Fundamentals I – Hadoop Fundamentals I teaches you the basics of Apache Hadoop and the concept of Big Data. This Hadoop course is entirely free, and so are the materials and software provided. This is the third version of our most popular Hadoop course. Since Version 2 was published, several more detailed courses covering topics such as MapReduce, Hive, HBase, Pig, Oozie, and Zookeeper have been added.  We recommend you start here and then dig deeper into the specific Hadoop technology you wish to learn more about. In this Hadoop tutorial, we first begin with describing what Big Data is and the need for Hadoop to be able to process that data in a timely manner. This is followed by describing the Hadoop architecture and how to work with the Hadoop Distributed File System (HDFS) both from the command line and using the BigInsights Console that is supplied with InfoSphere BigInsights.
  • Spark Fundamentals I – Apache Spark is an open source processing engine built around speed, ease of use, and analytics. If you have large amounts of data that requires low latency processing that a typical Map Reduce program cannot provide, Spark is the alternative. Spark performs at speeds up to 100 times faster than Map Reduce for iterative algorithms or interactive data mining. Spark provides in-memory cluster computing for lightning fast speed and supports Java, Scala, and Python APIs for ease of development. Spark combines SQL, streaming and complex analytics together seamlessly in the same application to handle a wide range of data processing scenarios. Spark runs on top of Hadoop, Mesos, standalone, or in the cloud. It can access diverse data sources such as HDFS, Cassandra, HBase, or S3.

In Part II of this series, will highlight three “Data Science” courses that may be of interest to BI, Analytics & Data Science Professionals.

You can follow on Twitter, Facebook & Google+.

1 reply »

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s