Getting Started with Big Data Testing Strategy and Framework

October 15, 2018 

Getting Started with Big Data Testing Strategy and Framework

What is Big Data?

Big Data defined as a large volume of data structured or unstructured. Data may exist in any format like flat files, images, videos, etc. The primary Big data characteristics are three V's - volume, velocity, and variety where volume represents the size of the data collected from various sources like sensors, transactions, velocity described as the speed(handle and process rates) and variety represents the formats of data.

The primary example of Big Data is E-commerce sites such as Amazon, Flipkart, Snapdeal and any others E-commerce site which have millions of visitors and products.

  • Social Media Sites
  • Healthcare

Big Data Testing

There are several areas in Big Data where testing is required. There is various type of testing in Big Data projects such as Database testing, Infrastructure, and Performance testing, and Functional testing.


How Does Big Data Testing Work?

 

Data Ingestion Testing

In this, data collected from multiple sources such as CSV, sensors, logs, social media, etc. and further, store it into HDFS. In this testing, the primary motive is to verify that the data adequately extracted and correctly loaded into HDFS or not. Tester has to ensure that the data properly ingests according to the defined schema and also have to verify that there is no data corruption. The tester validates the correctness of data by taking some little sample source data, and after ingestion, compares both source data and ingested data with each other. And further, data loaded into HDFS into desired locations.

Tools - Zookeeper, Kafka, Sqoop, Flume.
 

Data Processing Testing

In this type of testing, the primary focus is on aggregated data. Whenever the ingested data processes, validate whether the business logic is implemented correctly or not. And further, validate it by comparing the output files with input files.

Tools - Hadoop, Hive, Pig, Oozie
 

Data Storage Testing

The output stored in HDFS or any other warehouse. The tester verifies the output data correctly loaded into the warehouse by comparing the output data with the warehouse data.

Tools - HDFS, HBase
 

Data Migration Testing

Majorly, the need for Data Migration is only when an application moved to a different server or if there is any technology change. So basically data migration is a process where the entire data of the user migrated from the old system to the new system. Data Migration testing is a process of migration from the old system to the new system with minimal downtime, with no data loss. For smooth migration (elimination defects), it is essential to carry out Data Migration testing.

There are different phases of migration test -

  • Pre-Migration Testing - In this phase, the scope of the data sets, what data included and excluded. Many tables, count of data and records are noted down.
  • Migration Testing - This is the actual migration of the application. In this phase, all the hardware and software configurations checked adequately according to the new system. Moreover, verifies the connectivity between all the components of the application.
  • Post_Migration Testing - In this phase, check whether all the data migrated or not in the new application, is there any data loss or not. Any functionality changed or not.

Performance Testing Overview

All the Big Data Applications involves the processing of significant data in a very short interval of time due to which there is a requirement of vast computing resources. And for such type of projects, architecture also plays an important role here. Any architecture issue can lead to performance bottlenecks in the process. So it is necessary to use Performance Testing to avoid bottlenecks. Following are some points on which Performance Testing majorly focused -

  • Data loading and Throughput - In this area, the rate at which the data consumed from different sources and the rate at which the data created in the data store observed.

Data Processing Speed

  • Sub-System Performance - In this, the performance of the individual components tested which are the part of the overall application. Sometimes it is necessary to identify the bottlenecks.
  • Functional Testing / Integration Testing

Functional Testing performed by testing the front end application according to the user requirements to validate the application results produced by the front end applications compared with the expected results. This process will test the complete workflow from Data Ingestion to Data Visualization.


How to Adopt Big Data Testing?

Implement Live integration - Live integration is important as data comes from different sources. Perform End - to - End Testing.

Data Validation - It involves validation of data into Hadoop Distributed File System. It includes the comparison of source data with the added data.

Process Validation - After comparison, process validation involves Mapreduce validation, Business Logic validation, Data Aggregation and Segregation, checks key-value pair generation.

Output Validation - It involves the elimination of data corruption, successful data loading, maintenance of data integrity, comparing HDFS data with target data.


Top 5 Benefits of Big Data Testing

  • Data Accuracy
  • Improved Business Decisions
  • Minimizes losses and increases revenues
  • Quality Cost
  • Improved market targeting and Strategizing

Why Big Data Testing Matters?

 

Big Data Testing plays a vital role in Big Data Systems. If Big Data systems not appropriately tested, then it will affect business, and it will also become tough to understand the error, cause of the failure and where it occurs. Due to which finding the solution for the problem also becomes difficult. If Big Data Testing performed correctly, then it will prevent the wastage of resources in the future.


Big Data Testing Best Practices

  • Testing based on requirements
  • Prioritize the fixing of bugs
  • Stay connected with the context
  • To save time, automate it
  • Test objective should be clear
  • Communication
  • Technical skills

Key Big Data Testing Tools

 

There are various Big Data tools/components -

  • HDFS (Hadoop Distributed File System)
  • Hive
  • HBase