Pandora’s Journey from Hadoop to MemSQL

Pandora Media, the music streaming service, transitioned two years ago from heavy reliance on Hadoop for distributed processing to MemSQL as its primary database. The switch was prompted by the company’s inability by 2016 to keep up with evolving requirements and new features on the advertising-based side of the streaming service.

“Answers about Monday weren’t available until Wednesday,” the company noted in a blog post detailing its transition to MemSQL. While query performance was sufficient, keeping up with “24-hour period” data was not, with a roughly 20-hour delay on a large Hadoop cluster. Performance declined further as the daily haul of data grew larger, requiring more processing time. As data backed up, Pandora engineers said they considered 30 different platforms for delivering query results in real time.

Author: George Leopold

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s