Derrick Martins

Q&A: Seven Questions with Andy Palmer, Co-Founder and CEO, Tamr


Q: For readers unfamiliar with Tamr, how would you describe the company and its value proposition?
A: Businesses have mission-critical questions to ask. They have the data assets they need to answer them. They’ve invested heavily in big data analytics — $44 billion in 2014 alone, according to Gartner. But they still can’t consume about 90% of their collected data, which remains scattered in silos and disparate sources across the organization.

Tamr connects the dots. Our data unification platform catalogs, connects and curates hundreds or thousands of internal and external data sources, using a combination of machine learning algorithms and human expert guidance. This radically reduces the cost, time and effort of preparing data for analysis.

By making a relatively small investment in unifying their data using Tamr, enterprises can realize the promise of big data and the full value of their existing investments.

Q: One of Tamr’s objectives is to empower organizations to “Leverage All Data.” How does Tamr accomplish this lofty goal?
A: Tamr radically simplifies and speeds the availability of an enterprise’s data for analytics and downstream applications through three stages of data unification: catalog, connect, consume.

Catalog: Create a Central Inventory of Enterprise Metadata

  • Tamr Catalog creates a logical map for all of an enterprise’s information, automatically cataloging all metadata available to the enterprise in a central, platform-neutral place. Through a combination of machine learning and human guidance, the metadata is organized by logical entities (what the data represents), rather than where the data is physically stored – making it easier for enterprises to find data necessary to answer critical business questions. For example, if an enterprise has customer data spread across multiple CRM, financial and service-record databases, Tamr Catalog would enable any analyst or executive to immediately find all of the data and attributes related to the entity ‘customer,’ without having to resort to sending one-off emails to individual data managers or compiling spreadsheets of complicated queries. By making it easier to find data, Tamr promotes greater transparency of attributes, sources, and ownership of data within the enterprise.

Connect: Easily Unify Data across Silos

  • Tamr drastically reduces the time and effort required to connect and integrate siloed data for analysis. Advanced machine learning algorithms automatically match attributes and entities across your full range of data sources, whether large or small; structured, semi- or unstructured; internal or third party – often accomplishing up to 90% of the task without human intervention. When human intervention is necessary, Tamr generates questions for data experts, aggregates responses and feeds them back into the system. Instead of requiring programmers to intervene, Tamr immediately feeds this expert guidance into the system – without a complex or costly implementation to maintain. This bottom-up, probabilistic approach scales to thousands of potential data sources and enables Tamr to continuously improve its accuracy and speed over time.

Consume: Turn Any Data Set into a Service

  • Tamr’s RESTful APIs deliver a consolidated view of entities and records wherever analysts and others need it most: from spreadsheets to business intelligence platforms and next-generation visualization tools. We do this via online APIs that deliver value in a “data flow” or via a plug-in to spreadsheets and other BI platforms. Our spreadsheet plug-in enables analysts to find, map and match external data with their internal data in Google Sheets or Excel. Tamr handles the mechanics of data matching and enrichment, prompting users with suggestions and then auto-populating their spreadsheets with new and changed data based on their choices. By being able to manipulate external data as easily as if it were their own, business analysts can use the data to resolve ambiguities, fill in gaps, enrich their data with additional columns and fields, and more. The benefit to this approach extends throughout the enterprise: any time someone puts together a central set of entities, it can easily be shared with business analysts and other groups in the organization.

Q: What is the value to an organization of leveraging all of its data?
A: Good business decision-making comes not from big data but the right data – but you can’t be sure you have the right data unless you understand all the data that is available to you. Tamr helps you see intelligently beyond your own silo to make sure you have all the data available to you. A great example of breaking down information silos to generate business value could be seen in one of our largest customers, who conducts business around the world in many retail locations. Prior to implementing Tamr, each retail location only had visibility into the data that it collected from its customers directly. However, often their customer base also interacted with other parts of the organization but the retail locations were never aware of it. By leveraging all customer data that the organization possesses (which totals millions of records) in order to create a unified view of customers, the company as a whole and each retail location now have the capability to understand all customer interactions. This facilitates much more effective marketing, upsell, and cross-sell efforts, ultimately driving revenue growth.

Q: How does Tamr utilize Machine Learning to help organizations create a unified view of data across the organization?  How does Tamr use human intervention to augment the data unification process?
A: Our approach draws from the best of machine and human learning to connect hundreds or thousands of data sources. Advanced algorithms automatically connect the vast majority of the sources while resolving duplications, errors and inconsistencies among source data of attributes and records – a bottom-up, probabilistic solution that is reminiscent of Google’s full-scale approach to web search and connection. When the Tamr system can’t resolve connections automatically, it calls for human expert guidance, using people in the organization familiar with the data to weigh in on the mapping and improve its quality and integrity. This integration of machine and human learning makes Tamr smarter over time.

Q: What are two recent success stories of Tamr customers?
A: Here are two — one for procurement optimization and one for customer data integration:

  • The first customer runs sourcing for a large global manufacturing company with thousands of suppliers spanning dozens of semi-autonomous business units. The company has hundreds of ERP systems and each business unit has its own idiosyncrasies within supplier systems. The customer saw great value in being able to generate a unified view of its suppliers across businesses, as it can then re-engage with common suppliers to negotiate part pricing and payment terms. The company first used Tamr to identify supplier overlap within two of its core businesses, whose supplier spend totaled billions of dollars when combined. Tamr was able to identify that, on average, approximately one-third of suppliers used within each of these two businesses had a relationship with at least one other business unit. Tamr’s ability to provide this unified view of suppliers allowed the customer to negotiate more favorable part prices and payment terms, driving millions of dollars of cost savings per year.
  • The second customer is a top financial services company serving high-net-worth individuals. They needed to capture a clean, unified view of their clients for consumption by a new customer onboarding application, designed to replace a 10-year-old legacy customer data repository. One or more of 400 relationship managers enter the data and it flows through a third-party clearing house before being incorporated into their systems. However, over time, the data became increasingly complicated to work with, with many duplicate data records, missing fields and erroneous data. An internally developed solution – writing code that groups records by account holder and account type before sending the information to the application – was proving impractical in reducing duplicates, the main challenge.  It specified exact matches of customer names, mailing addresses, date of birth and so on – completely unrealistic given the idiosyncrasies of diverse data sources. In a runoff against the internal solution, Tamr, with its “fuzzy matching” capabilities, was able to accurately identify 16 records in the repository associated with a particular customer – compared to only 2 records identified by the internal solution. With Tamr deployed, the company then provided enriched spreadsheets containing the unified customer information to the relationship managers for further review and curation, closing the loop in Tamr’s machine learning, human-guided approach and delivering only clean, unified records to the new on-boarding application.

Q: How does Tamr enable companies to find value in Big Data?
A: Enterprises have invested an estimated $3-4 trillion in IT over the last 20-plus years. Most of it has been in developing and deploying single-vendor systems, applications, functions and geographies to automate and optimize key business processes.

The result of all of this disparate activity? Data silos, schema proliferation, and radical data heterogeneity. With companies now investing heavily in big data analytics, this dynamic is becoming even more complex and costly to manage.

The complexity is best seen when enterprises attempt to ask “simple” questions across many business silos — be they divisions, geographies or functions. These questions often go unanswered because current top-down, deterministic data unification approaches (such as ETL, ELT and MDM) weren’t designed to scale to the variety of hundreds, thousands or tens of thousands of data silos. These approaches depend on highly trained architects developing “the one (master) schema to rule them all.” This is a red herring.

The fundamental diversity and mutability of enterprise data and semantics should naturally lead enterprises toward a new bottom-up, probabilistic approach to connecting data across the organization and exploiting big data variety. Tamr’s data unification platform provides this approach. It finds and connects siloed data into a unified view in a way that looks more like a Google search circa 2015 than a Yahoo index crawl circa 1995.

By enabling their organizations to dynamically catalog, connect and curate ALL of their enterprise information sources, Tamr’s “next gen” unification platform helps enterprises embrace variety — and transform it from a roadblock into ROI.

Q: How does Tamr support the work of Data Scientists & Analysts?
A: By getting rid of the drudgery involved in connecting, cleaning and preparing data, which consumes far too much of their time: an estimated 80% vs. 20% spent on analytics. And by eliminating the frustration of not being able to access all the data they want: IDC has stated that up to 90% of big data is “dark” – making it far too complex and time-consuming to locate all the data relevant to what you’re  analyzing. Even with great data scientists and analysts, most enterprises are only able to consume 10-12% of their available data for analytics and downstream applications.

Most enterprises are finding data heterogeneity is a massive roadblock to effectively using the state-of-the-art analytics and visualization tools they’ve invested in so heavily. While it’s not sexy to spend time cleaning and preparing data, it’s necessary. In fact, the process of providing unified data to enterprise analytics is as important as reliable water treatment is to providing clean drinking water to the population.

Tamr ensures long-term success for data scientists and analysts. Our data unification platform radically reduces the cost, time and effort of preparing data for business intelligence and analytics.  Tamr is…

  • Powerful enough to identify and catalog ALL of the available data in the enterprise
  • Comprehensive enough that 100% of data is consumable for analytics apps and analysis
  • Efficient enough that 80+% of data scientist time is spent on analysis and answering big questions
  • Flexible enough for solutions ranging from supply chain management to clinical trials management to customer data integration

About Andy PalmerAndy Palmer is co-founder and CEO of Tamr, Inc. Palmer co-founded Tamr with fellow serial entrepreneur Michael Stonebraker, PhD, adjunct professor at MIT CSAIL and the 2014 ACM Turing Award winner for his contributions to database technology, innovation and commercialization; Ihab Ilyas of the University of Waterloo; and others. Previously, Palmer was co-founder and founding CEO of Vertica Systems, a pioneering big data analytics company (acquired by HP). During his career as an entrepreneur, Palmer has served as founding investor, BOD member or advisor to more than 50 start-up companies in technology, healthcare and the life sciences.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s