Education & Training

Big Data Glossary – Part 1

bicorner.com listsThe big data analytics space is evolving at a tremendous speed, so is the terminology used in the space. This terminology is sometimes really hard to understand. Here is a list of some of the big data terminologies simplified:

1. Anonymization: Comes from the word anonymous. This includes making data unspecified; removing all data points that could lead to identify a person. It basically severs the links between people in a database and their records to prevent the discovery of the source.

2. Automatic Identification and Capture: This refers to any means by which data on items is automatically identified and captured, which is then stored in a computer system. For example, a barcode scanner can automatically identify and collect data on a product being shipped, collected and sold via a RFID chip.

3. Biometrics: The genetic (DNA) characteristics like fingerprints, eye retinas and irises, voice patterns, facial patterns and hand measurements that information technologies measure and analyse to authenticate a person. Used generally when access to confidential information must be restricted.

4. Brontobytes: The word probably comes from brontosaurus, the largest dinosaur. A brontobyte is approximately 1000 Yottabytes and consists of 27 zeros. It is the size of digital universe of tomorrow.

5. Clojure: Clojure is a dynamic programming language targeting the Java Virtual Machine (JVM). It is a dialect of Lispand shares with Lisp the code-as-data philosophy and a powerful macro system.

6. Clickstream Analytics: Clickstream Analytic, also known as clickpath analytics is a list of all pages viewed by a visitor, presented in the order that the pages were viewed, the time spent on each page, and when and where they left. It analysis users’ web activity through the items, pages they click.

7. DBaaS: DBaaS or Database as a Service is a cloud based service for storage and management of databases. This database hosted on a cloud is sold on a metered basis.

8. Data Profiling: Data profiling, also called data archaeology, is the statistical analysis and assessment of the quality of data values within a data set for consistency, uniqueness and logic. It is the process of collecting statistics and information about data in an existing source

9. ETL: ETL – Extract, Transform and Load is a process in the database usage, especially in database warehousing system that extracts data from various sources, converting it to fit operational needs and load the data in the target database

10. Elasticsearch: Elasticsearch is an open source search server based on Apache Lucene. It is designed to take data from any source and search, analyse, and visualize it in real time, helping people make sense of data

11. Failover: This is a backup operational mode in which the functions of a system component (Processor, Server, Network or Database) automatically switches to a different server or node should one fail.

12. Fault Tolerant Design: This enables a system to continue its  intended operations in event of failure of (one or more faults) within some of its components.

13. Grid Computing: This is a collection of computer resources from multiple locations, connected via the cloud to achieve a common goal. Grid commuting enables sharing, selection and aggregation of a wide variety of computational resources (such as supercomputers, compute clusters, storage systems, data sources, instruments, people) and present a single unified resource for problem solving.

14. Key Value Databases: These store data with a primary key, a uniquely identifiable record, which makes easy and fast to look up. The data stored in a Key Value is normally some kind of primitive of the programming language.

15. MapReduce: This is an algorithm design pattern that originated in the functional programming world. It is Google branded framework that breaks up a problem into pieces that are then distributed across multiple computers on the same network or cluster, or across a grid of disparate and possibly geographically separated systems (map), and then collecting all the results and combines them into a report (reduce).

For more, see Big Data Glossary – Part 2


Author: Richa Kapoor

Advertisements

2 replies »

  1. This is really amazing and I learnt a great thing for this weekend. Thanks a bunch Richa for your awesome article. I am eagerly waiting for tomorrow’s gyan 🙂

    Liked by 1 person

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s