Unified MetaData Management, Data Quality & Governance on EKS

13:40

Executive Summary

Customer, a global supply chain enterprise based in Italy, partnered with Xenonstack to transform its data governance, metadata management, and data quality processes. Legacy systems created silos, poor data visibility, and compliance bottlenecks. By implementing a hybrid cloud-native platform—supporting on-prem Kubernetes for lower environments and Amazon EKS for production—using OpenMetadata, Apache Ranger, and Data Quality, the Customer achieved centralised metadata visibility, real-time data quality validation, and automated governance. This reduced compliance effort by 70% and improved business data trust and operational agility.

Customer Challenge

Customer Information

Customer: Confidential
Industry: Supply Chain
Location: Italy

Company Size: 11-50

Business Challenges

Customer, a global supply chain enterprise, was struggling with various business and technical challenges that hindered its ability to manage data efficiently and comply with regulatory requirements. The company had fragmented metadata scattered across departments, creating silos that prevented cohesive data discovery and lineage tracking. Business teams faced delays in generating insights due to inconsistent data structures and manual processes for report generation and quality checks.

On the technical front, legacy systems such as siloed RDS databases and batch-oriented ETL tools could not scale with the increasing volume and complexity of operational data. There was no centralized mechanism to validate data in real-time or to enforce consistent access policies across services. This limited the organization’s ability to respond to dynamic supply chain requirements and increased the risk of data governance failures.

The customer's business goals were to gain unified operational visibility, automate compliance, and establish a scalable, secure analytics infrastructure. However, their existing setup lacked the architectural agility to support streaming data, enforce governance, or allow business users self-service access to trusted data.

In addition, Customer had to meet strict compliance standards such as regional data residency and secure access control. There was pressure to resolve frequent inventory stock-outs and provide real-time visibility during a critical global expansion phase. The urgency to reduce audit response times and streamline metadata workflows made it imperative for Customer to move to a cloud-native, automated, and unified solution.

Technical Challenges

Customer’s legacy data infrastructure was a significant barrier to achieving real-time data operations and governance. The existing architecture relied heavily on siloed RDS instances and traditional ETL tools that only supported batch processing. This setup resulted in delayed insights, redundant data pipelines, and complex maintenance cycles, all of which compounded technical debt. There was no centralized schema registry or lineage tracking, making it difficult to debug data issues or ensure consistency across systems.

Integration challenges also emerged due to the heterogeneous nature of data sources, which included transactional systems, IoT feeds, and third-party partner data. These needed to be unified into a scalable and secure data platform, but legacy systems lacked standard APIs and cloud-native interfaces. The lack of modularity in the architecture hindered scalability and introduced performance bottlenecks during peak loads.

Data quality was another major pain point. Without automated validation pipelines, Customer had limited control over incoming data consistency, which impacted reporting and operational accuracy. Furthermore, security and compliance requirements demanded role-based access control, end-to-end encryption, and audit logs to satisfy regulations such as GDPR. The legacy environment offered limited support for these features, increasing the risk of non-compliance and security breaches. Addressing these challenges required a ground-up redesign of Customer’s data architecture with modern, container-native solutions on AWS.

Partner Solution

Solution Overview

Xenonstack implemented a hybrid cloud-native metadata management and governance platform tailored to Customer’s supply chain requirements. The architecture supported deployment of development and staging environments on on-prem Kubernetes clusters, while production workloads were deployed on Amazon EKS. The design featured modular integration of OpenMetadata, Apache Ranger, Unity Catalog, and Data Quality Service. This enabled centralised metadata discovery, real-time data quality validation, and automated policy enforcement.

Built entirely on containerised microservices orchestrated via Kubernetes, the platform supported flexible scaling and seamless integration with existing systems, including Amazon RDS and S3. CI/CD pipelines with environment-specific overlays ensured consistent, rapid deployments across both on-prem and cloud environments. IAM, IRSA, and Secrets Manager secured access at every layer.

AWS Services Used

Amazon EKS: Managed Kubernetes cluster, which is used to deploy and orchestrate all microservices, providing scalability and high availability.

Amazon S3: Central storage layer for metadata exports, validation outputs, and archival data.

Amazon RDS: Hosted the metadata repository and governance configurations using a Postgresql-compatible engine.

AWS IAM: Enforced fine-grained, role-based access control across services, APIs, and users.
AWS Secrets Manager: Managed sensitive credentials and access tokens securely between services.
Amazon CloudWatch: Provided logging, metric collection, and alerting for service health and operations.

Architecture Diagram

Implementation Details

The implementation followed Agile methodology over an 11-month timeline. It began with collaborative workshops to identify governance requirements, compliance constraints, and key performance indicators (KPIs). Core services were deployed on Amazon EKS and on-prem Kubernetes using Helm, with OpenMetadata connectors established for ingesting metadata from Amazon RDS and third-party sources. Unity Catalog was layered to define classification rules and schemas, while Apache Ranger was integrated for managing policy-based access control.

Data quality checks were implemented using Data Quality, which was connected to batch and streaming sources. Rules were codified centrally and run against live pipelines to ensure data accuracy and completeness.

Security and compliance were core to the deployment: IRSA was used for IAM role assignment per Kubernetes pod in EKS, TLS was enforced across services, and data was encrypted both in transit and at rest using KMS-backed configurations. The architecture is fully aligned with GDPR and supply chain regulatory frameworks.

Timeline and Major Milestones:

Months 1–2: Stakeholder alignment, IAM/IRSA setup
Months 3–4: Metadata ingestion pipelines, catalog deployment
Month 5: Data quality rules defined and integrated
Months 6–7: Policy enforcement with Apache Ranger
Months 8–11: User onboarding, access controls, and production rollout across hybrid environments.

Innovation and Best Practices

The solution adhered to AWS Well-Architected Framework principles, prioritizing security, operational excellence, and performance efficiency. Xenonstack leveraged container-native design patterns on Kubernetes and EKS to build independently scalable services, while infrastructure as code (IaC) and GitOps practices enabled reliable, automated deployments.

A key innovation was the integration of OpenMetadata and Apache Ranger for dynamic governance based on real-time metadata context. This reduced manual overhead in policy enforcement and improved traceability. Unity Catalog enabled scalable schema management with minimal duplication and maximum clarity across teams.

Results and Benefits

Business Outcomes and Success Metrics

After implementing the unified metadata management and governance solution on Amazon EKS, the customer experienced substantial improvements across key performance and business indicators. The move to a cloud-native architecture delivered a 40% reduction in total cost of ownership (TCO) by replacing legacy license-heavy systems with scalable, open-source tools. Operational reporting latency dropped from 24–36 hours to under 1 hour, allowing stakeholders to make timely, data-driven decisions.

Time-to-market for new analytics initiatives improved by 60%, as the new platform allowed metadata and data quality pipelines to be deployed rapidly with minimal engineering overhead. Additionally, data source onboarding time was reduced from 2–3 weeks to less than 3 days, accelerating insights across procurement, inventory, and logistics teams.

The ability to automate compliance audits and streamline metadata discovery gave Customer a significant competitive edge in responding to shifting global supply chain demands. The platform enabled more accurate demand forecasting, improved supplier negotiation, and reduced the risk of inventory stock-outs.

The customer realised a full ROI within the first year, driven by reduced infrastructure costs, higher productivity, and faster analytics cycles.

Technical Benefits

The solution delivered significant performance enhancements through containerised microservices deployed on Amazon EKS. By shifting from batch processing to real-time ingestion and validation, Customer achieved a 95% improvement in pipeline throughput, minimising data lag and manual corrections.

Scalability was enhanced through pod-level auto-scaling and decoupled service architectures, ensuring optimal resource allocation during peak loads. Reliability increased with high availability across services and resilient failover mechanisms. From a security perspective, integration with AWS IAM and Secrets Manager strengthened access control, while data was encrypted in transit and at rest, meeting GDPR and internal audit requirements. The use of IRSA for Kubernetes service accounts ensured fine-grained identity management per microservice.

The project also reduced technical debt by retiring monolithic ETL and governance tools and replacing them with open, modular frameworks. Development velocity improved thanks to CI/CD pipelines and GitOps practices, enabling faster iterations and safer deployments.

Lessons Learned

Challenges Overcome

During the implementation, the team faced several significant challenges. One of the most complex issues was integrating Apache Ranger's policy engine with dynamic metadata updates from OpenMetadata. Early stages of development also revealed usability concerns, especially for non-technical users navigating the governance dashboard. In addition, data ingestion pipelines needed to accommodate both batch and real-time sources, requiring custom connectors and transformations.

To overcome these, the team adopted a modular deployment strategy, allowing individual services to be debugged and optimised independently. Mid-project, user feedback loops were introduced to refine dashboard layouts and accessibility. The data ingestion architecture was restructured to support schema evolution, which reduced ingestion failures and manual interventions.

The original plan was adjusted to delay full UI rollout until after extensive stakeholder training and feedback collection. This phased onboarding approach significantly improved adoption and reduced friction during transition.

Best Practices Identified

A key learning was the value of starting with clear KPIS aligned to business outcomes. This ensured architectural decisions remained focused and avoided overengineering. Kubernetes-native deployment on Amazon EKS provided flexibility, resource isolation, and auto-scaling, which are critical for handling variable workloads across departments.

Success was also driven by GitOps practices, including version-controlled infrastructure, automated CI/CD pipelines, and real-time policy deployment. Embracing open-source tools like OpenMetadata and Data Quality enabled quick customisation and avoided vendor lock-in.

These approaches can benefit similar implementations by enabling rapid innovation with security and compliance at the core. Providing a self-service metadata experience through lightweight UIs also empowered business users, reducing reliance on central data teams.

Future Plans

Looking ahead, Customer plans to expand the platform’s capabilities by integrating streaming data sources using Apache Kafka. The next phase will introduce ML-based anomaly detection for proactive data quality monitoring. Additionally, integration with AWS Lake Formation is under evaluation to enhance data cataloging and cross-service permissions.

Future optimization will include query acceleration and dashboard caching for analytics users. Xenonstack and the customer will continue collaborating to enable external audit dashboards, further enhancing regulatory transparency. The partnership will also explore AWS Marketplace integrations for rapid deployment of new governance modules.

Next Steps with EKS

Talk to our experts about implementing compound AI system, How Industries and different departments use Agentic Workflows and Decision Intelligence to Become Decision Centric. Utilizes AI to automate and optimize IT support and operations, improving efficiency and responsiveness.

Talk To Specialist

Interested in Solving your Challenges with XenonStack Team

Get Started

Interested in Solving your Challenges with XenonStack

Personalization

In Which Agentic Platform and Accelerator you are Interested? *

Which segment does your company belong to? *

What is your primary focus areas? *

At what stage is your AI use case currently in? *

What are the primary challenges in adopting AI? *

What kind of infrastructure does your organization currently using? *

Are you using any Data platform? *

Preferred Approach for AI Transformation *

In Which Domain your Solution/Organization belongs to in-terms of Data Privacy, Trustworthy AI *

Captcha Verification *

your request has been submitted successfully !

Unified MetaData Management, Data Quality & Governance on EKS

Executive Summary

Customer Challenge

Customer Information

Business Challenges

Technical Challenges

Partner Solution

Solution Overview

AWS Services Used

Architecture Diagram

Implementation Details

Innovation and Best Practices

Results and Benefits

Business Outcomes and Success Metrics

Technical Benefits

Lessons Learned

Challenges Overcome

Best Practices Identified

Future Plans

Next Steps with EKS

More Ways to Explore Us

Open Metadata: Simplifying Data Discovery and Governance

Metadata Management for Agentic AI Systems

Data Management Services and Solutions

Share Article

Table of Contents

Share Article

Explore Related Topics

Navdeep Singh Gill

Subscribe to our Latest Technology Insights and Resources

Get the latest articles in your inbox

Related Articles

StreamSets -Real Time Data Ingestion and CDC

Unified MetaData Management, Data Quality & Governance on EKS

Enabling Scalable and Secure IoT Platform on Google Cloud