Xenonstack Recommends

Apache Solr Search Engine and Architecture

Acknowledging Data Management
          Best Practices with DataOps


Overview of Apache Solr

Apache Solr is an application based on J2EE and uses Lucene libraries internally to provide user-friendly search as well as to generate the indexes. Solr is scalable, highly reliable and fault tolerant, providing replication, distributed indexing and load-balanced querying, recovery and automated failover, centralized configuration, and more. Solr powers the navigation and search features of internet sites. Read more about Apache Hive Architecture here.

Apache Solr Features

Apache Solr has the following features -
  • Highly Scalable and Fault Tolerant
  • Easy Monitoring
  • Comprehensive Administration Interfaces
  • Optimized for High Volume Traffic
  • Advanced Full-Text Search Capabilities
  • Extensible Plugin Architecture
  • Near Real-Time Indexing
  • Flexible and Adaptable with easy configuration
  • Standards-Based Open Interfaces - XML, JSON and HTTP

Apache Solr Architecture Components

Request handlers processed request Handler -The requests made to Solr. The requests might be index update requests or query requests. Search Component - Search component is a feature of search provided by Solr. It might be a query, spell checking, hit highlighting, etc. Query Parser - The parser parses the queries that we pass to Solr and verifies for syntactical errors. Response Writer - Response writer generates the formatted output for user queries. Analyzer/Tokenizer - In Solr, an analyzer examines the text of fields and generates a Token Stream. Update Request Processor - Update Request Processor is used for modifications such as adding a field, dropping in a field, etc.

Apache Solr Security

Apache Solr can be secured by following methods - Enable Plugins with security.json Using security.json with Solr
  • In Standalone Mode
  • In SolrCloud Mode
  • Enabling a Plugin
  • Available Authentication Plugins
  • Loading a Custom Plugin
  • Available Authorization Plugins
Securing Inter-Node Requests PKI Authentication Plugin Enabling SSL If using SolrCloud, ZooKeeper Access Control

Apache Solr Installation on AWS

You can install Apache Solr on AWS EC2 instance by doing the following steps -

Step 1 - Connect to your instance, using SSH.

ssh -i /path/to/key-pair.pem ec2-user@ec2-196-41-100-1.compute-1.amazonaws.com

Step 2 - Configure Java and download Solr

# verify default java version packaged with AWS instances is 1.7
$ java -version
$ sudo yum install java-1.8.0
$ sudo /usr/sbin/alternatives --config java
# select jdk-1.8
# verify default java version to java-1.8
$ java -version
# download desired version of Solr
$ wget http://archive.apache.org/dist/lucene/solr/7.2.0/solr-7.2.0.tgz
# untar
$ tar -zxvf solr-7.2.0.tgz
$ export SOLR_HOME=$PWD/solr-7.2.0
# put the env variable in .bashrc
# vim ~/.bashrc
export SOLR_HOME=/home/ec2-user/solr-7.2.0

Step 3 - Change public DNS to hostname

Edit /etc/hosts, and add entries:
$ sudo vim /etc/hosts solr-node-1

Step 4 - Configure Solr

$ cd $SOLR_HOME 1. # start Solr node on 8983 and connect to ZooKeeper running on first node $ bin/solr start -c -p 8983 -h solr-node-2 -z solr-node-1:9983

Step 5 - Inspect and verify Solr nodes from browser

Go to-http://ec2-121-3-2-1.us-east-2.compute.amazonaws.com:8983/solr (solr-node-

A Holistic Strategy

A Highly Scalable distributed infrastructure helps Enterprises to enable AutoScaling, efficient Data Handling and building Extensive reporting interfaces. To know more about distributed platforms we recommend taking the following steps -

Related blogs and Articles

Real Time Streaming Application with Apache Spark

Big Data Engineering

Real Time Streaming Application with Apache Spark

Apache Spark Overview Apache Spark is a fast, in-memory data processing engine with expressive development APIs to allow data workers to execute streaming conveniently. With Spark running on Apache Hadoop YARN, developers everywhere can now create applications to exploit Spark’s power, derive insights, and enrich their data science workloads within a single, shared dataset in Apache Hadoop. In...