When it comes to streaming data, it’s tough to find a company operating on a more massive scale than Netflix, which streams more than 125 million hours of TV shows and movies — per day. Netflix captures billions of pieces data about those viewing sessions and routes it into various analytic engines using Apache Flink. However, some additional work was required to get it all working smoothly and efficiently at scale in the AWS cloud.
Netflix is no stranger to big data tech. The company has been an eager adopter of Hadoop, NoSQL, and related projects for years, and has done pioneering work in optimizing the technologies to run in public cloud platforms. Netflix runs everything in AWS, its primary cloud platform, while it keeps some VMs in standby on the Google Cloud Platform as the backup. Lately the company has been working on its real-time stream processing system, which it uses to drive various applications to improve personalization, provide operational insight, and detect fraud.
Author: Alex Woodie