Interested in Solving your Challenges with XenonStack Team

Get Started

Get Started with your requirements and primary focus, that will help us to make your solution

Proceed Next

Big Data Engineering

A Deep Dive into Apache Solr Security Measures

Navdeep Singh Gill | 27 November 2024

Apache Solr Security with Kerberos on Kubernetes

Introduction to Apache Solr

Securing your Apache Solr installation is essential to protect sensitive data and ensure the integrity of your search platform. Solr security encompasses multiple layers, including authentication, authorization, encryption, and the proper configuration of key components like ZooKeeper and SolrCloud security. With the right measures, such as TLS/SSL for secure communications and Kerberos for strong authentication, you can safeguard your Solr environment against threats. Additionally, enforcing access control and enabling audit logging helps monitor and manage risks effectively. This guide will cover the most critical aspects of Solr security, common pitfalls to avoid, and best practices for a robust setup.

What is Apache Solr?

Apache Solr is an open-source search platform built on the Lucene Java search library. It serves as a full-text search server, leveraging Lucene’s powerful indexing and search capabilities at its core. Solr is equipped with REST features, including HTTP/XML and JSON APIs, making it easy to integrate with a wide variety of programming languages. Many prominent websites, such as Apple, Cisco, and others, rely on Apache Solr for their search and navigation functionalities.

In essence, Apache Solr is a sub-project of Apache Lucene that was developed using Java. As part of the Lucene project, Solr utilizes the Lucene Java search library to perform searching and indexing operations. This article will focus on Apache Solr security, highlighting the critical measures to safeguard your Solr installation.

Operations Performed by Apache Solr to Search a Document

  • Indexing: The document that needs to be searched is converted into a machine-readable format, a process known as indexing.

  • Querying: Solr then processes the user’s query, identifying essential terms, keywords, and other relevant search parameters.

  • Mapping: The user query is then mapped to the stored documents in the database to find the most relevant results.

  • Ranking the Outcome: Solr ranks the search results based on their relevance, returning the most pertinent documents first.

Big Data tools and frameworks are responsible for retrieving meaningful information from a huge set of data. Click to explore about, Open Source Big Data Tools

Exploring Apache Solr Security Architecture

The architecture of Apache Solr security integrates several layers of functionality that work together to ensure secure and efficient indexing and searching. Below is a breakdown of Solr’s key components and their roles in maintaining both functionality and security within the system:

  • Indexing and Searching: The core functions of Apache Solr are indexing documents and providing search results. Handlers are used to manage data in specific categories.

  • Update Processor Chain: Whenever data is uploaded, it passes through the update processor chain, which eliminates duplicate values to avoid unnecessary repetition.

  • Analyzer: The Analyzer examines field data and generates tokens. The Tokenizer breaks the field data into lexical units or symbols, with only one Tokenizer per Analyzer. Common words like "is," "am," "are," etc., are filtered out for more effective search results.

  • Query Parser: The Query Parser parses the user’s query. DisMax, Lucene, and e-DisMax are some common query parsers, each suited for different query requirements.

  • Index Searcher: The query is handed to the index searcher, and the index reader runs the query on the indexed data to retrieve relevant results.

  • Response Writer: The response writer formats and returns the query results to the client, ensuring they are based on the search results from the Lucene engine.

  • Security Measures: The architecture integrates authentication, authorization, and encryption to secure data, ensuring safe communication throughout the indexing and search process.

How to Implement Kerberos Security in Solr

Authentication is crucial for securing Apache Solr. When implementing Kerberos for Solr, the Kerberos service principal and keytab file are necessary for authenticating Solr with ZooKeeper and between the nodes of the Solr Cluster. Additionally, all clients and users must have a valid Kerberos ticket before they can send a request to Solr. This ensures that only authorized entities can interact with Solr, providing a robust layer of security. By integrating Kerberos into Solr’s security model, you enhance authentication and prevent unauthorized access to sensitive data.

To secure Apache Solr, we will walk through the following steps - Before configuring Solr, we will make sure that we have Kerberos Service Principle for each of the Solr Hosts, and there is also a Zookeeper that must be available in the KDC server. We will then generate a Keytab file as - Assuming hostname to be 192.168.10.120 and the home directory to be home/foo/. For this particular environment, the phase will be -

root@kdc:/# kadmin.local
Authenticating as principal foo/admin@EXAMPLE.COM with password

kadmin.local:  addprinc HTTP/192.168.10.120
WARNING: no policy specified for HTTP/192.168.10.120@EXAMPLE.COM; defaulting to no policy
Enter the password for principal "HTTP/192.168.10.120@EXAMPLE.COM":
Re-enter password for principal "HTTP/192.168.10.120@EXAMPLE.COM":
Principal "HTTP/192.168.10.120@EXAMPLE.COM" created.

kadmin.local:  ktadd -k /tmp/120.keytab HTTP/192.168.10.120
Entry for principal HTTP/192.168.10.120 with kvno 2, encryption type aes256-cts-hmac-sha1-96 added to keytab WRFILE:/tmp/120.keytab.
Entry for principal HTTP/192.168.10.120 with kvno 2, encryption type arcfour-hmac added to keytab WRFILE:/tmp/120.keytab.
Entry for principal HTTP/192.168.10.120 with kvno 2, encryption type des3-cbc-sha1 added to keytab WRFILE:/tmp/108.keytab.
Entry for principal HTTP/192.168.10.120 with kvno 2, encryption type des-cbc-crc added to keytab WRFILE:/tmp/120.keytab.

kadmin.local:  quit

Copy the keytab file from the KDC server's/tmp/120.keytab location to the Solr host at /keytabs/120.keytab. Repeat this step for each Solr node. If the Zookeeper hasn't been set up, then similar steps must take place for the ZooKeeper service principal and keytab. Now, we have to create the security.json file and put it in our $SOLR_HOME directory. In SolrCloud mode, we can upload it using Kerberos Plugin to Zookeeper. We can create it as follows -

server/scripts/cloud-scripts/zkcli.sh -zkhost localhost:2181 -cmd put /security.json  
'{"authentication":{"class": "org.apache.solr.security.KerberosPlugin"}}'

Now we have to Define a JAAS Configuration File. This helps us to define specific properties that will be needed for authentication. We can also set some other properties like ticket caching etc. Below is the JAAS configuration file with the name and path of /home/foo/jaas-client.conf

Client {
com.sun.security.auth.module.Krb5LoginModule required
  useKeyTab=true
  keyTab="/keytabs/120.keytab"
  storeKey=true
  useTicketCache=true
  debug=true
  principal="HTTP/192.168.0.120@EXAMPLE.COM";
}; 

This name and path will be used to define the Solr start parameters and help authenticate the internode requests and zookeeper requests. Now before Starting the Solr, some parameters need to be passed. We can also use these parameters at the command line with the bin/solr start command. The following table tells us which settings are needed or not. Once the Configuration is completed, we can start the Solr with the following command -

bin/solr -c -z server1:2181,server2:2181,server3:2181/solr

To test the Configuration, we will try to connect the Solr with the following command -

curl --negotiate -u :"http://192.168.0.120:8983/solr/"

Top Apache Solr Security Best Practices

The following are the best practices to secure Solr from development to production:

Encryption with a TLS Certificate (SSL)

Encrypting traffic to/from Solr and between Solr nodes prevents sensitive data from leaking across the network. TLS is also often required to prevent credential sniffing when using authentication.

Authentication, Authorization, and Audit Logging

Authorization ensures that only users with the necessary roles/permissions can access a given resource. Authorization ensures that only users with the necessary roles/permissions can access a given resource. Audit logging will log the audit of requests to your cluster, such as users being denied access to administrative APIs.

Enable IP Access Control

Restrict network access to specific hosts by setting SOLR_IP_WHITELIST/SOLR_IP_BLACKLIST through environment variables or in solr.in.sh/solr.in.cmd.

ZooKeeper Traffic Protection

ZooKeeper is a core part of the SolrCloud cluster and the ZooKeeper Access Control page shows how to protect its content.

Enable Security Manager

Solr can be run in the Java Security Manager sandbox by setting SOLR_SECURITY_MANAGER_ENABLED=true via an environment variable or in solr.in.sh/solr.in.cmd. This feature is not compatible with Hadoop.

introduction-icon  Common Pitfalls in Securing Apache Solr 

To maintain a strong AWS security posture, organizations should follow these key best practices:

  1. Failure to Enable Strong Authentication: Not configuring robust authentication mechanisms, like Kerberos or basic authentication, can leave your Solr installation vulnerable to unauthorized access. Enforcing secure login methods is crucial to preventing attackers from exploiting weak entry points.
  2. Improper Authorization Configuration: Misconfigured authorization settings can lead to privilege escalation. It's essential to define user roles and permissions accurately to ensure that only authorized users can access or modify sensitive data, avoiding potential security breaches.
  3. Neglecting Encryption (TLS/SSL): Without TLS/SSL encryption, data in transit is exposed to interception. Enabling encryption ensures that communication between Solr and clients is protected, maintaining data protection and privacy.
  4. Inadequate Firewall Configuration: Failing to properly configure firewalls can leave your Solr instance exposed to external attacks. A well-configured firewall acts as a first line of defense, controlling incoming and outgoing traffic and reducing potential threats.
  5. Ignoring Vulnerability Management: Regular vulnerability management is vital, especially for monitoring CVE (Common Vulnerabilities and Exposures). Not staying on top of security patches and updates can result in your Solr instance being compromised by known vulnerabilities.
  6. Overlooking Security.json Configuration: Incorrect or incomplete Security.json configurations can leave your system susceptible to attacks, such as Cross-Site Request Forgery (CSRF). Properly configuring this file helps mitigate such risks by defining strict access policies and restrictions.

Building a Comprehensive Solr Security Strategy

Apache Solr security is crucial for enterprises seeking to maintain secure, scalable, and high-performance search solutions. While real-time indexing and advanced text search features offer powerful capabilities, ensuring that these functionalities are secure is equally important. SolrCloud security and robust configurations, such as authentication, authorization, and encryption with TLS/SSL, help protect data throughout the entire indexing and search process.

By integrating Kerberos authentication, managing user roles and permissions, and implementing continuous audit logging, businesses can enhance their security posture and reduce vulnerabilities. Additionally, regular vulnerability management, including monitoring CVE listings, coupled with strong firewall configurations, strengthens network security. A well-rounded security strategy will enable enterprises to fully leverage Solr’s capabilities while safeguarding their critical data and systems.

Actionable Steps to Enhance Your Solr Security

Talk to our experts about integrating compound AI systems with Apache Solr to revolutionize your business operations. Learn how industries and departments use Agentic Workflows and Decision Intelligence to become decision-centric. By leveraging Apache Solr’s search and indexing features, AI can automate and optimize IT support and operations, improving both efficiency and responsiveness across your organization.

More Ways to Explore Us

Auto Indexing with Machine Learning Databases | A Quick Guide

arrow-checkmark

The Ultimate Guide to Apache Flink Security and Deployment

arrow-checkmark

Apache Zookeeper Security with Kerberos

arrow-checkmark

Table of Contents

navdeep-singh-gill

Navdeep Singh Gill

Global CEO and Founder of XenonStack

Navdeep Singh Gill is serving as Chief Executive Officer and Product Architect at XenonStack. He holds expertise in building SaaS Platform for Decentralised Big Data management and Governance, AI Marketplace for Operationalising and Scaling. His incredible experience in AI Technologies and Big Data Engineering thrills him to write about different use cases and its approach to solutions.

Get the latest articles in your inbox

Subscribe Now