A Complete Guide to RDD in Apache Spark

Overview of RDD in Apache Spark

Every Spark application is made-up of a Driver program which runs the primary function and is responsible for various parallel operations on the given cluster. The primary abstraction the Spark is the concept of RDD, which Spark uses to achieve Faster and efficient MapReduce operations.

Resilient Distributed Dataset (RDD) is the fundamental data structure of Spark. They are immutable Distributed collections of objects of any type. As the name suggests is a Resilient (Fault-tolerant) records of data that resides on multiple nodes.

XenonStack provides Platform Strategy and Integration solution with Data discovery, Data Catalog, Metadata Platform, mathematical and statistical techniques to build models for Prediction and Optimization of Business Outcomes.
Explore Our Services, Big Data Analytics Services and Solutions

Each Dataset in Spark RDD is divided into logical partitions across the cluster and thus can be operated in parallel, on different nodes of the cluster.

These RDDs can be created by deterministic operations on data on stable storage or other RDDs by either an existing Scala collection or with an external file in the HDFS(or any other supported file system).

Users can opt to persist an RDD in memory so that it can be reused multiple times efficiently. RDDs also have the potential to recover from faults occurrences in the system automatically.

Features of Resilient Distributed Dataset (RDD)

Lazy Evaluation 

All Transformations in the Apache Spark are lazy, which means that they do not compute the results as and when stated in Transformation statements. Instead, they Keep track of the Transformation tasks using the concept of DAG (Directed Acyclic Graphs). Spark computes these Transformations when an action requires a result for the driver program.

In-Memory Computation 

Spark uses in-memory computation as a way to speed up the total processing time. In the in-memory computation, the data is kept in RAM (random access memory) instead of the slower disk drives. This is very helpful as it reduces the cost of memory and allows for pattern detection, analyzes large data more efficiently. Main methods that accompany this are cache() and persist() methods.

Fault Tolerance 

The RDDs are fault-tolerant as they can track data lineage information to allow for rebuilding lost data automatically on failure. To achieve fault tolerance for the generated RDD’s, the achieved data is replicated among various Spark executors in worker nodes in the cluster.

Immutability 

As it is effortless to share the immutable data safely among several processes, it turns out to be a very valid option. Immutability simply rules out lots of potential problems due to various updates from varying threads at once. Having Immutable data is safer to share across processes. But, RDDs are not just immutable but also deterministic functions of their inputs which makes recreating the RDD parts possible at any given instance.

We can think of RDDs not only as a collection of data but a recipe for building new data from other data.

Partitioning 

RDDs are generally collections of various data items of massive volumes, that cannot fit into a single node and have to be partitioned across multiple nodes. Spark automatically does this partitioning of RDDs and distributes these partitions across different nodes.

Key points related to these partitions are 

  • Each node in a spark cluster contains one or more partitions.
  • Partitions in do not span multiple machines.
  • The number of barriers in Spark is configurable and should be chosen efficiently.
  • By increasing the number of executors on the cluster, parallelism can be increased in the system.

Location Setup capability 

RDDs are capable of clear placement preference to compute partitions. Placement preference refers to the defining information about the location of RDD. Here the DAG comes into play and places the partitions in a way that the task is close to the data it needs. Hence, the Speed of computation is increased.

Comparision: RDD vs Dataset vs DataFrame

RDD APIs

An RDD or Resilient Distributed Dataset is the actual fundamental data Structure of Apache Spark. These are immutable (Read-only) collections of objects of varying types, which computes on the different nodes of a given cluster. These provide the functionality to perform in-memory computations on large clusters in a fault-tolerant manner. Every DataSet in the Spark RDD is well partitioned across many servers so that they can be efficiently computed on different nodes of the cluster.

DataSet APIs

In Apache Spark, the Dataset is a data structure in Spark SQL which is strongly typed, Object-oriented and is a map to a relational schema. It represents a structured query with encoders and is an extension to the Data-frame API. These are both serializable and Query-able, thus persisting in nature. It provides a single interface for both Scala and Java languages. It also reduces the burden of libraries.

DataFrame APIs 

We can say that Data-Frames are Dataset organized into named columns. These are very similar to the table in a relational database. The ideology is to allow processing of a large amount of Structured Data. Data-Frame contains rows with a schema where the schema is the illustration of the structure of data. It provides memory management and optimized execution plans.

Ways to Create RDDs in Apache Spark 

Three ways to create an RDD in Apache Spark 

Parallelizing collection (Parallelized) 

We take an already existing collection in the program and pass it onto the SparkContext’s parallelize() method. This is an original method which creates RDDs quickly in Spark-shell and also performs operations on them. It is very rarely used, as this requires the entire Dataset on one machine.

Referencing External Dataset 

In Spark, the RDDs can be formed from any data source supported by the Hadoop, including local file systems, HDFS, Hbase, Cassandra, etc. Here, data is loaded from an external dataset. We can use SparkContext’s textFile method to create text file RDD. It would URL of the file and read it as a collection of line. URL can be a local path on the machine itself.

Creating RDD from existing RDD 

Transformation mutates one RDD into another, and change is the way to create an RDD from an existing RDD. This creates a difference between Apache Spark and Hadoop MapReduce. Conversion works like one that intakes an RDD and produces one. The input RDD does not change, and as RDDs are immutable, it generates varying RDD by applying operations. 

Operation on RDD 

There are Two operations of Apache Spark RDDs Transformations and Actions. A Transformation is a function that produces a new RDD from the existing RDDs. It takes an RDD as input and generates one or more RDD as output. Every time it creates new RDD when we apply any transformation. Thus, all the input RDDs, cannot be changed since RDD are immutable. Some points are –

  • No Change to the cluster.
  • Produces a DAG which keeps track of which RDD was made when in the Life cycle.
  • Example : map(func), filter(func), Reduce(func), intersection(dataset), distinct(), groupByKey(), union(dataset), mapPartitions(fun), flatMap(). 

Types of Transformations 

  • Narrow Transformations: In this type, all the elements which are required to compute the records in a single partition live in that single partition. Here, we use a limited subset of partition to calculate the result. Narrow transformations are the result of map(), filter().
  • Wide Transformations: Here, all elements required to compute the records in that single partition may live in many of the partitions of the parent RDD. These use groupbyKey() and reducebyKey().

Spark Actions 

  • The Transformations in Apache Spark create RDDs from each other, but to work on actual Dataset, and then we perform action operations. Here, new RDD is not formed but gives non-RDD values as results that are stored on drivers or to the external storage system. It brings Laziness to the processing of RDDs.
  • Actions are a means of sending data from Executor to the Driver where the Executors are responsible for executing a task. At the same time, the Driver is a JVM process that manages workers and execution of the task.
  • Some Examples include : count(), collect(), take(n), top(), count value(), reduce(), fold(), aggregate(), foreach().

The flow of RDD in the Spark Architecture 

  • Spark creates a graph when you enter code in the sparking console.
  • When an action is called on Spark RDD, Spark submits graph to DAG scheduler.
  • Operators are divided into stages of Tasks in DAG scheduler.
  • The stages are passed on to the Task scheduler, which launches task through Cluster Manager.

Limitations of the RDD in Apache Spark

No automatic Optimization 

In Apache Spark, the RDD does not have an option for automatic input optimization. It is unable to make use of the Spark advance optimizers like the Catalyst optimizer and Tungsten execution engine, and thus we can only do manual RDD optimization. This is Overcome in the Dataset and DataFrame concepts, where both make use of the Catalyst to generate optimized logical and physical query plan. It provides space and Speed efficiency.

No static and Runtime type safety 

RDD does not provide static or Runtime type Safety and does not allow the user to check error at the runtime. But, Dataset provides compile-time type safety to build complex data workflows. This helps error detection at compile time and thus make code safer. 

The Problem of Overflow 

RDD degrades when there is not enough memory too available to store it in-memory or on disk. Here, the partitions that overflow from RAM may be stored on disk and will provide the same level of performance. We need to increase the RAM and disk size to overcome this problem.

Overhead of serialization and garbage collection (Performance limitation) 

As RDD is an in-memory object, it involves the overhead of Garbage Collection and Java serialization, which becomes expensive with growth in data. To overcome this, we can use data structures with fewer objects to lower cost or can persist object in serialized form.

No Schema View of Data 

RDD has a problem with handling structured data. This is because it does not provide a schema view of data and has no provision in that context. Dataset and DataFrame provide Schema view and is distributed collection of data organized into named columns.

Conclusion

The Hadoop MapReduce had a lot of shortcomings with it. To overcome these shortcomings, Spark RDD was introduced. It had in-memory processing, immutability and other functionalities mentioned above which gave users a better option. But RDD too had some limitations which restricted Spark from being more versatile. Thus, the concept of Data-Frame and Dataset evolved.

Get insight on Test-Driven Development for Big Data and Apache Spark


Related Posts


Leave a Comment

Name required.
Enter a Valid Email Address.
Comment required.(Min 30 Char)

[wpforms id="7646"]
<div class="wpforms-container wpforms-container-full optin-monster-forms" id="wpforms-7646"><form id="wpforms-form-7646" class="wpforms-validate wpforms-form" data-formid="7646" method="post" enctype="multipart/form-data" action="/blog/rdd-in-spark/"><noscript class="wpforms-error-noscript">Please enable JavaScript in your browser to complete this form.</noscript><div class="wpforms-page-indicator progress" data-indicator="progress" data-indicator-color="#72b239" data-scroll="1"><span class="wpforms-page-indicator-page-title" ></span><span class="wpforms-page-indicator-page-title-sep" style="display:none;"> - </span><span class="wpforms-page-indicator-steps">Step <span class="wpforms-page-indicator-steps-current">1</span> of 3</span><div class="wpforms-page-indicator-page-progress-wrap"><div class="wpforms-page-indicator-page-progress" style="width:33.333333333333%;background-color:#72b239;"></div></div></div><div class="wpforms-field-container"><div class="wpforms-page wpforms-page-1 "><div id="wpforms-7646-field_10-container" class="wpforms-field wpforms-field-pagebreak" data-field-id="10"></div><div id="wpforms-7646-field_24-container" class="wpforms-field wpforms-field-html form-popup-header-wrapper" data-field-id="24"><div id="wpforms-7646-field_24"><div class="form-popup-header"> <h2>Accelerate Digital Transformation with Intelligent Automation</h2> </div></div></div><div id="wpforms-7646-field_21-container" class="wpforms-field wpforms-field-radio custom-radio-btn-wrapper wpforms-list-2-columns wpforms-conditional-trigger" data-field-id="21"><label class="wpforms-field-label wpforms-label-hide" for="wpforms-7646-field_21">Sevices <span class="wpforms-required-label">*</span></label><ul id="wpforms-7646-field_21" class="wpforms-field-required wpforms-image-choices wpforms-image-choices-modern"><li class="choice-1 depth-1 wpforms-image-choices-item"><label class="wpforms-field-label-inline" for="wpforms-7646-field_21_1" tabindex="0"><span class="wpforms-image-choices-image"><img src="https://www.xenonstack.com/wp-content/uploads/2020/07/real-time-data-analysis.png" alt="Real Time Data Analytics" title="Real Time Data Analytics"></span><input type="radio" id="wpforms-7646-field_21_1" class="wpforms-screen-reader-element" name="wpforms[fields][21]" value="Real Time Data Analytics" tabindex="-1" required ><span class="wpforms-image-choices-label">Real Time Data Analytics</span></label></li><li class="choice-2 depth-1 wpforms-image-choices-item"><label class="wpforms-field-label-inline" for="wpforms-7646-field_21_2" tabindex="0"><span class="wpforms-image-choices-image"><img src="https://www.xenonstack.com/wp-content/uploads/2020/07/data-visualization.png" alt="Interactive Data Visualisation" title="Interactive Data Visualisation"></span><input type="radio" id="wpforms-7646-field_21_2" class="wpforms-screen-reader-element" name="wpforms[fields][21]" value="Interactive Data Visualisation" tabindex="-1" required ><span class="wpforms-image-choices-label">Interactive Data Visualisation</span></label></li><li class="choice-3 depth-1 wpforms-image-choices-item"><label class="wpforms-field-label-inline" for="wpforms-7646-field_21_3" tabindex="0"><span class="wpforms-image-choices-image"><img src="https://www.xenonstack.com/wp-content/uploads/2020/07/application-modernisation.png" alt="Application Modernisation" title="Application Modernisation"></span><input type="radio" id="wpforms-7646-field_21_3" class="wpforms-screen-reader-element" name="wpforms[fields][21]" value="Application Modernisation" tabindex="-1" required ><span class="wpforms-image-choices-label">Application Modernisation</span></label></li><li class="choice-4 depth-1 wpforms-image-choices-item"><label class="wpforms-field-label-inline" for="wpforms-7646-field_21_4" tabindex="0"><span class="wpforms-image-choices-image"><img src="https://www.xenonstack.com/wp-content/uploads/2020/07/enterprise-ai.png" alt="Enterprise AI" title="Enterprise AI"></span><input type="radio" id="wpforms-7646-field_21_4" class="wpforms-screen-reader-element" name="wpforms[fields][21]" value="Enterprise AI" tabindex="-1" required ><span class="wpforms-image-choices-label">Enterprise AI</span></label></li><li class="choice-5 depth-1 wpforms-image-choices-item"><label class="wpforms-field-label-inline" for="wpforms-7646-field_21_5" tabindex="0"><span class="wpforms-image-choices-image"><img src="https://www.xenonstack.com/wp-content/uploads/2020/07/intelligent-cognitive-automation.png" alt="Intelligent and Cognitive Automation" title="Intelligent and Cognitive Automation"></span><input type="radio" id="wpforms-7646-field_21_5" class="wpforms-screen-reader-element" name="wpforms[fields][21]" value="Intelligent and Cognitive Automation" tabindex="-1" required ><span class="wpforms-image-choices-label">Intelligent and Cognitive Automation</span></label></li></ul></div><div id="wpforms-7646-field_23-container" class="wpforms-field wpforms-field-pagebreak" data-field-id="23"><div class="wpforms-clear wpforms-pagebreak-left"><button class="wpforms-page-button wpforms-page-next" data-action="next" data-page="1" data-formid="7646">Next</button></div></div></div><div class="wpforms-page wpforms-page-2 " style="display:none;"><div id="wpforms-7646-field_25-container" class="wpforms-field wpforms-field-html form-popup-header-wrapper" data-field-id="25"><div id="wpforms-7646-field_25"><div class="form-popup-header"> <h2>How can we get in Touch</h2> <p>Fill the form and we will revert back to you soon.<p> </div></div></div><div id="wpforms-7646-field_20-container" class="wpforms-field wpforms-field-name col-12 col-sm-10 col-md-8 form-group" data-field-id="20"><label class="wpforms-field-label" for="wpforms-7646-field_20">Name <span class="wpforms-required-label">*</span></label><input type="text" id="wpforms-7646-field_20" class="wpforms-field-large wpforms-field-required" name="wpforms[fields][20]" placeholder="Name" required></div><div id="wpforms-7646-field_2-container" class="wpforms-field wpforms-field-email col-12 col-sm-10 col-md-8 form-group" data-field-id="2"><label class="wpforms-field-label" for="wpforms-7646-field_2">Email <span class="wpforms-required-label">*</span></label><input type="email" id="wpforms-7646-field_2" class="wpforms-field-large wpforms-field-required" name="wpforms[fields][2]" placeholder="Email" required></div><div id="wpforms-7646-field_3-container" class="wpforms-field wpforms-field-text col-12 col-sm-10 col-md-8 form-group" data-field-id="3"><label class="wpforms-field-label" for="wpforms-7646-field_3">Organization <span class="wpforms-required-label">*</span></label><input type="text" id="wpforms-7646-field_3" class="wpforms-field-large wpforms-field-required" name="wpforms[fields][3]" placeholder="Organization" required></div><div id="wpforms-7646-field_12-container" class="wpforms-field wpforms-field-pagebreak next-btn-wrapper" data-field-id="12"><div class="wpforms-clear wpforms-pagebreak-left"><button class="wpforms-page-button wpforms-page-next" data-action="next" data-page="2" data-formid="7646">Next</button></div></div></div><div class="wpforms-page wpforms-page-3 last next-btn-wrapper" style="display:none;"><div id="wpforms-7646-field_34-container" class="wpforms-field wpforms-field-html form-popup-header-wrapper" data-field-id="34"><div id="wpforms-7646-field_34"><div class="form-popup-header"> <h2>Share Your Requirements</h2> </div></div></div><div id="wpforms-7646-field_7-container" class="wpforms-field wpforms-field-checkbox col-12 col-sm-12 col-md-12 custom-checkbox form-group wpforms-list-2-columns wpforms-conditional-field wpforms-conditional-show" data-field-id="7" style="display:none;"><label class="wpforms-field-label" for="wpforms-7646-field_7">Application Modernization Services <span class="wpforms-required-label">*</span></label><ul id="wpforms-7646-field_7" class="wpforms-field-required"><li class="choice-1 depth-1"><input type="checkbox" id="wpforms-7646-field_7_1" name="wpforms[fields][7][]" value="Application Re-platform " required ><label class="wpforms-field-label-inline" for="wpforms-7646-field_7_1">Application Re-platform </label></li><li class="choice-2 depth-1"><input type="checkbox" id="wpforms-7646-field_7_2" name="wpforms[fields][7][]" value="Application Migration " required ><label class="wpforms-field-label-inline" for="wpforms-7646-field_7_2">Application Migration </label></li><li class="choice-3 depth-1"><input type="checkbox" id="wpforms-7646-field_7_3" name="wpforms[fields][7][]" value="Cloud Native Transformation" required ><label class="wpforms-field-label-inline" for="wpforms-7646-field_7_3">Cloud Native Transformation</label></li><li class="choice-4 depth-1"><input type="checkbox" id="wpforms-7646-field_7_4" name="wpforms[fields][7][]" value="Application Assessment" required ><label class="wpforms-field-label-inline" for="wpforms-7646-field_7_4">Application Assessment</label></li><li class="choice-5 depth-1"><input type="checkbox" id="wpforms-7646-field_7_5" name="wpforms[fields][7][]" value="Application Re-engineering" required ><label class="wpforms-field-label-inline" for="wpforms-7646-field_7_5">Application Re-engineering</label></li></ul></div><div id="wpforms-7646-field_28-container" class="wpforms-field wpforms-field-checkbox col-12 col-sm-12 col-md-12 custom-checkbox form-group wpforms-list-2-columns wpforms-conditional-field wpforms-conditional-show" data-field-id="28" style="display:none;"><label class="wpforms-field-label" for="wpforms-7646-field_28">Data Visualization Services <span class="wpforms-required-label">*</span></label><ul id="wpforms-7646-field_28" class="wpforms-field-required" data-choice-limit="1"><li class="choice-1 depth-1"><input type="checkbox" id="wpforms-7646-field_28_1" data-rule-check-limit="true" name="wpforms[fields][28][]" value="Data Visualization Cloud Services" required ><label class="wpforms-field-label-inline" for="wpforms-7646-field_28_1">Data Visualization Cloud Services</label></li><li class="choice-2 depth-1"><input type="checkbox" id="wpforms-7646-field_28_2" data-rule-check-limit="true" name="wpforms[fields][28][]" value="Dashboard and User Experience Design" required ><label class="wpforms-field-label-inline" for="wpforms-7646-field_28_2">Dashboard and User Experience Design</label></li><li class="choice-3 depth-1"><input type="checkbox" id="wpforms-7646-field_28_3" data-rule-check-limit="true" name="wpforms[fields][28][]" value="Data Visualization Integration " required ><label class="wpforms-field-label-inline" for="wpforms-7646-field_28_3">Data Visualization Integration </label></li><li class="choice-4 depth-1"><input type="checkbox" id="wpforms-7646-field_28_4" data-rule-check-limit="true" name="wpforms[fields][28][]" value="Analytics and Reporting Solutions " required ><label class="wpforms-field-label-inline" for="wpforms-7646-field_28_4">Analytics and Reporting Solutions </label></li></ul></div><div id="wpforms-7646-field_35-container" class="wpforms-field wpforms-field-checkbox col-12 col-sm-12 col-md-12 custom-checkbox form-group wpforms-list-2-columns wpforms-conditional-field wpforms-conditional-show" data-field-id="35" style="display:none;"><label class="wpforms-field-label" for="wpforms-7646-field_35">Data Visualization Tools <span class="wpforms-required-label">*</span></label><ul id="wpforms-7646-field_35" class="wpforms-field-required" data-choice-limit="1"><li class="choice-1 depth-1"><input type="checkbox" id="wpforms-7646-field_35_1" data-rule-check-limit="true" name="wpforms[fields][35][]" value="Microsoft Power BI" required ><label class="wpforms-field-label-inline" for="wpforms-7646-field_35_1">Microsoft Power BI</label></li><li class="choice-2 depth-1"><input type="checkbox" id="wpforms-7646-field_35_2" data-rule-check-limit="true" name="wpforms[fields][35][]" value="Amazon QuickSight" required ><label class="wpforms-field-label-inline" for="wpforms-7646-field_35_2">Amazon QuickSight</label></li><li class="choice-3 depth-1"><input type="checkbox" id="wpforms-7646-field_35_3" data-rule-check-limit="true" name="wpforms[fields][35][]" value="Google Data Studio" required ><label class="wpforms-field-label-inline" for="wpforms-7646-field_35_3">Google Data Studio</label></li><li class="choice-4 depth-1"><input type="checkbox" id="wpforms-7646-field_35_4" data-rule-check-limit="true" name="wpforms[fields][35][]" value="Tableau" required ><label class="wpforms-field-label-inline" for="wpforms-7646-field_35_4">Tableau</label></li></ul></div><div id="wpforms-7646-field_29-container" class="wpforms-field wpforms-field-checkbox col-12 col-sm-12 col-md-12 custom-checkbox form-group wpforms-list-2-columns wpforms-conditional-field wpforms-conditional-show" data-field-id="29" style="display:none;"><label class="wpforms-field-label" for="wpforms-7646-field_29">Big Data Services <span class="wpforms-required-label">*</span></label><ul id="wpforms-7646-field_29" class="wpforms-field-required"><li class="choice-1 depth-1"><input type="checkbox" id="wpforms-7646-field_29_1" name="wpforms[fields][29][]" value="Modern Data Integration " required ><label class="wpforms-field-label-inline" for="wpforms-7646-field_29_1">Modern Data Integration </label></li><li class="choice-2 depth-1"><input type="checkbox" id="wpforms-7646-field_29_2" name="wpforms[fields][29][]" value="Big Data Governance and Security" required ><label class="wpforms-field-label-inline" for="wpforms-7646-field_29_2">Big Data Governance and Security</label></li><li class="choice-3 depth-1"><input type="checkbox" id="wpforms-7646-field_29_3" name="wpforms[fields][29][]" value="Enterprise Data Strategy" required ><label class="wpforms-field-label-inline" for="wpforms-7646-field_29_3">Enterprise Data Strategy</label></li><li class="choice-4 depth-1"><input type="checkbox" id="wpforms-7646-field_29_4" name="wpforms[fields][29][]" value="Data Catalog" required ><label class="wpforms-field-label-inline" for="wpforms-7646-field_29_4">Data Catalog</label></li><li class="choice-5 depth-1"><input type="checkbox" id="wpforms-7646-field_29_5" name="wpforms[fields][29][]" value="Data Discovery " required ><label class="wpforms-field-label-inline" for="wpforms-7646-field_29_5">Data Discovery </label></li></ul></div><div id="wpforms-7646-field_8-container" class="wpforms-field wpforms-field-checkbox col-12 col-sm-12 col-md-12 custom-checkbox form-group wpforms-list-2-columns wpforms-conditional-field wpforms-conditional-show" data-field-id="8" style="display:none;"><label class="wpforms-field-label" for="wpforms-7646-field_8">Data Ingestion Tools <span class="wpforms-required-label">*</span></label><ul id="wpforms-7646-field_8" class="wpforms-field-required"><li class="choice-1 depth-1"><input type="checkbox" id="wpforms-7646-field_8_1" name="wpforms[fields][8][]" value="Amazon Kinesis" required ><label class="wpforms-field-label-inline" for="wpforms-7646-field_8_1">Amazon Kinesis</label></li><li class="choice-2 depth-1"><input type="checkbox" id="wpforms-7646-field_8_2" name="wpforms[fields][8][]" value="Apache Kafka" required ><label class="wpforms-field-label-inline" for="wpforms-7646-field_8_2">Apache Kafka</label></li><li class="choice-3 depth-1"><input type="checkbox" id="wpforms-7646-field_8_3" name="wpforms[fields][8][]" value="Google Pub/Sub" required ><label class="wpforms-field-label-inline" for="wpforms-7646-field_8_3">Google Pub/Sub</label></li><li class="choice-4 depth-1"><input type="checkbox" id="wpforms-7646-field_8_4" name="wpforms[fields][8][]" value="Apache Pulsar" required ><label class="wpforms-field-label-inline" for="wpforms-7646-field_8_4">Apache Pulsar</label></li></ul></div><div id="wpforms-7646-field_30-container" class="wpforms-field wpforms-field-checkbox col-12 col-sm-12 col-md-12 custom-checkbox form-group wpforms-list-2-columns wpforms-conditional-field wpforms-conditional-show" data-field-id="30" style="display:none;"><label class="wpforms-field-label" for="wpforms-7646-field_30">Data Processing Tools <span class="wpforms-required-label">*</span></label><ul id="wpforms-7646-field_30" class="wpforms-field-required"><li class="choice-1 depth-1"><input type="checkbox" id="wpforms-7646-field_30_1" name="wpforms[fields][30][]" value="Apache Spark" required ><label class="wpforms-field-label-inline" for="wpforms-7646-field_30_1">Apache Spark</label></li><li class="choice-2 depth-1"><input type="checkbox" id="wpforms-7646-field_30_2" name="wpforms[fields][30][]" value="Apache Flink" required ><label class="wpforms-field-label-inline" for="wpforms-7646-field_30_2">Apache Flink</label></li><li class="choice-3 depth-1"><input type="checkbox" id="wpforms-7646-field_30_3" name="wpforms[fields][30][]" value="Apache Beam" required ><label class="wpforms-field-label-inline" for="wpforms-7646-field_30_3">Apache Beam</label></li><li class="choice-4 depth-1"><input type="checkbox" id="wpforms-7646-field_30_4" name="wpforms[fields][30][]" value="Amazon EMR" required ><label class="wpforms-field-label-inline" for="wpforms-7646-field_30_4">Amazon EMR</label></li><li class="choice-6 depth-1"><input type="checkbox" id="wpforms-7646-field_30_6" name="wpforms[fields][30][]" value="Google Cloud Dataproc" required ><label class="wpforms-field-label-inline" for="wpforms-7646-field_30_6">Google Cloud Dataproc</label></li></ul></div><div id="wpforms-7646-field_31-container" class="wpforms-field wpforms-field-checkbox col-12 col-sm-12 col-md-12 custom-checkbox form-group wpforms-list-2-columns wpforms-conditional-field wpforms-conditional-show" data-field-id="31" style="display:none;"><label class="wpforms-field-label" for="wpforms-7646-field_31">Cloud Services <span class="wpforms-required-label">*</span></label><ul id="wpforms-7646-field_31" class="wpforms-field-required"><li class="choice-1 depth-1"><input type="checkbox" id="wpforms-7646-field_31_1" name="wpforms[fields][31][]" value="Cloud Governance and Security" required ><label class="wpforms-field-label-inline" for="wpforms-7646-field_31_1">Cloud Governance and Security</label></li><li class="choice-2 depth-1"><input type="checkbox" id="wpforms-7646-field_31_2" name="wpforms[fields][31][]" value="Cloud Native Microservices" required ><label class="wpforms-field-label-inline" for="wpforms-7646-field_31_2">Cloud Native Microservices</label></li><li class="choice-3 depth-1"><input type="checkbox" id="wpforms-7646-field_31_3" name="wpforms[fields][31][]" value="Cloud Infrastructure Automation" required ><label class="wpforms-field-label-inline" for="wpforms-7646-field_31_3">Cloud Infrastructure Automation</label></li><li class="choice-4 depth-1"><input type="checkbox" id="wpforms-7646-field_31_4" name="wpforms[fields][31][]" value="Managed Cloud Services" required ><label class="wpforms-field-label-inline" for="wpforms-7646-field_31_4">Managed Cloud Services</label></li><li class="choice-5 depth-1"><input type="checkbox" id="wpforms-7646-field_31_5" name="wpforms[fields][31][]" value="Cloud Data Migration" required ><label class="wpforms-field-label-inline" for="wpforms-7646-field_31_5">Cloud Data Migration</label></li></ul></div><div id="wpforms-7646-field_32-container" class="wpforms-field wpforms-field-checkbox col-12 col-sm-12 col-md-12 custom-checkbox form-group wpforms-list-2-columns wpforms-conditional-field wpforms-conditional-show" data-field-id="32" style="display:none;"><label class="wpforms-field-label" for="wpforms-7646-field_32">IT Infrastructure <span class="wpforms-required-label">*</span></label><ul id="wpforms-7646-field_32" class="wpforms-field-required"><li class="choice-1 depth-1"><input type="checkbox" id="wpforms-7646-field_32_1" name="wpforms[fields][32][]" value="AWS" required ><label class="wpforms-field-label-inline" for="wpforms-7646-field_32_1">AWS</label></li><li class="choice-2 depth-1"><input type="checkbox" id="wpforms-7646-field_32_2" name="wpforms[fields][32][]" value="Google" required ><label class="wpforms-field-label-inline" for="wpforms-7646-field_32_2">Google</label></li><li class="choice-3 depth-1"><input type="checkbox" id="wpforms-7646-field_32_3" name="wpforms[fields][32][]" value="Azure" required ><label class="wpforms-field-label-inline" for="wpforms-7646-field_32_3">Azure</label></li><li class="choice-4 depth-1"><input type="checkbox" id="wpforms-7646-field_32_4" name="wpforms[fields][32][]" value="Private Cloud" required ><label class="wpforms-field-label-inline" for="wpforms-7646-field_32_4">Private Cloud</label></li><li class="choice-6 depth-1"><input type="checkbox" id="wpforms-7646-field_32_6" name="wpforms[fields][32][]" value="Data Center" required ><label class="wpforms-field-label-inline" for="wpforms-7646-field_32_6">Data Center</label></li></ul></div><div id="wpforms-7646-field_33-container" class="wpforms-field wpforms-field-checkbox col-12 col-sm-12 col-md-12 custom-checkbox form-group wpforms-list-2-columns wpforms-conditional-field wpforms-conditional-show" data-field-id="33" style="display:none;"><label class="wpforms-field-label" for="wpforms-7646-field_33">AI Services <span class="wpforms-required-label">*</span></label><ul id="wpforms-7646-field_33" class="wpforms-field-required"><li class="choice-1 depth-1"><input type="checkbox" id="wpforms-7646-field_33_1" name="wpforms[fields][33][]" value="Computer Vision Services" required ><label class="wpforms-field-label-inline" for="wpforms-7646-field_33_1">Computer Vision Services</label></li><li class="choice-2 depth-1"><input type="checkbox" id="wpforms-7646-field_33_2" name="wpforms[fields][33][]" value="Robotic Process Automation" required ><label class="wpforms-field-label-inline" for="wpforms-7646-field_33_2">Robotic Process Automation</label></li><li class="choice-3 depth-1"><input type="checkbox" id="wpforms-7646-field_33_3" name="wpforms[fields][33][]" value="Enterprise Operational Analytics" required ><label class="wpforms-field-label-inline" for="wpforms-7646-field_33_3">Enterprise Operational Analytics</label></li><li class="choice-4 depth-1"><input type="checkbox" id="wpforms-7646-field_33_4" name="wpforms[fields][33][]" value="AI Based Development" required ><label class="wpforms-field-label-inline" for="wpforms-7646-field_33_4">AI Based Development</label></li><li class="choice-5 depth-1"><input type="checkbox" id="wpforms-7646-field_33_5" name="wpforms[fields][33][]" value="AI Strategy Consulting " required ><label class="wpforms-field-label-inline" for="wpforms-7646-field_33_5">AI Strategy Consulting </label></li></ul></div><div id="wpforms-7646-field_36-container" class="wpforms-field wpforms-field-checkbox col-12 col-sm-12 col-md-12 custom-checkbox form-group wpforms-list-2-columns wpforms-conditional-field wpforms-conditional-show" data-field-id="36" style="display:none;"><label class="wpforms-field-label" for="wpforms-7646-field_36">Robotic Process Automation Platform</label><ul id="wpforms-7646-field_36"><li class="choice-2 depth-1"><input type="checkbox" id="wpforms-7646-field_36_2" name="wpforms[fields][36][]" value="Blue Prism " ><label class="wpforms-field-label-inline" for="wpforms-7646-field_36_2">Blue Prism </label></li><li class="choice-4 depth-1"><input type="checkbox" id="wpforms-7646-field_36_4" name="wpforms[fields][36][]" value="UiPath" ><label class="wpforms-field-label-inline" for="wpforms-7646-field_36_4">UiPath</label></li><li class="choice-5 depth-1"><input type="checkbox" id="wpforms-7646-field_36_5" name="wpforms[fields][36][]" value="Automation Anywhere" ><label class="wpforms-field-label-inline" for="wpforms-7646-field_36_5">Automation Anywhere</label></li></ul></div><div id="wpforms-7646-field_11-container" class="wpforms-field wpforms-field-pagebreak" data-field-id="11"><div class="wpforms-clear wpforms-pagebreak-left"></div></div></div></div><div class="wpforms-field wpforms-field-hp"><label for="wpforms-7646-field-hp" class="wpforms-field-label">Comment</label><input type="text" name="wpforms[hp]" id="wpforms-7646-field-hp" class="wpforms-field-medium"></div><input type="hidden" name="wpforms[recaptcha]" value=""><div class="wpforms-submit-container" style="display:none;"><input type="hidden" name="wpforms[id]" value="7646"><input type="hidden" name="wpforms[author]" value="6"><input type="hidden" name="wpforms[post_id]" value="6847"><button type="submit" name="wpforms[submit]" class="wpforms-submit om-trigger-conversion mon-btn" id="wpforms-submit-7646" value="wpforms-submit" aria-live="assertive" data-alt-text="Submitting..." data-submit-text="Submit">Submit</button></div></form></div> <!-- .wpforms-container -->
[wpforms id="1328"]
<div class="wpforms-container wpforms-container-full subscription-form optin-monster-forms" id="wpforms-1328"><form id="wpforms-form-1328" class="wpforms-validate wpforms-form" data-formid="1328" method="post" enctype="multipart/form-data" action="/blog/rdd-in-spark/"><noscript class="wpforms-error-noscript">Please enable JavaScript in your browser to complete this form.</noscript><div class="wpforms-field-container"><div id="wpforms-1328-field_1-container" class="wpforms-field wpforms-field-email col-12 col-sm-12 col-md-12 form-group" data-field-id="1"><label class="wpforms-field-label wpforms-label-hide" for="wpforms-1328-field_1">Email <span class="wpforms-required-label">*</span></label><input type="email" id="wpforms-1328-field_1" class="wpforms-field-large wpforms-field-required" name="wpforms[fields][1]" placeholder="Email address" required></div><div id="wpforms-1328-field_8-container" class="wpforms-field wpforms-field-hidden" data-field-id="8"><input type="hidden" id="wpforms-1328-field_8" name="wpforms[fields][8]" value="Subscribe"></div></div><div class="wpforms-field wpforms-field-hp"><label for="wpforms-1328-field-hp" class="wpforms-field-label">Phone</label><input type="text" name="wpforms[hp]" id="wpforms-1328-field-hp" class="wpforms-field-medium"></div><input type="hidden" name="wpforms[recaptcha]" value=""><div class="wpforms-submit-container" ><input type="hidden" name="wpforms[id]" value="1328"><input type="hidden" name="wpforms[author]" value="6"><input type="hidden" name="wpforms[post_id]" value="6847"><button type="submit" name="wpforms[submit]" class="wpforms-submit om-trigger-conversion btn" id="wpforms-submit-1328" value="wpforms-submit" aria-live="assertive" data-alt-text="Sending..." data-submit-text="Subscribe">Subscribe</button></div></form></div> <!-- .wpforms-container -->