
Technical Challenges
Customer’s legacy data infrastructure was a significant barrier to achieving real-time data operations and governance. The existing architecture relied heavily on siloed RDS instances and traditional ETL tools that only supported batch processing. This setup resulted in delayed insights, redundant data pipelines, and complex maintenance cycles, all of which compounded technical debt. There was no centralized schema registry or lineage tracking, making it difficult to debug data issues or ensure consistency across systems.
Integration challenges also emerged due to the heterogeneous nature of data sources, which included transactional systems, IoT feeds, and third-party partner data. These needed to be unified into a scalable and secure data platform, but legacy systems lacked standard APIs and cloud-native interfaces. The lack of modularity in the architecture hindered scalability and introduced performance bottlenecks during peak loads.
Data quality was another major pain point. Without automated validation pipelines, Customer had limited control over incoming data consistency, which impacted reporting and operational accuracy. Furthermore, security and compliance requirements demanded role-based access control, end-to-end encryption, and audit logs to satisfy regulations such as GDPR. The legacy environment offered limited support for these features, increasing the risk of non-compliance and security breaches. Addressing these challenges required a ground-up redesign of Customer’s data architecture with modern, container-native solutions on AWS.
Partner Solution
Solution Overview
Xenonstack implemented a hybrid cloud-native metadata management and governance platform tailored to Customer’s supply chain requirements. The architecture supported deployment of development and staging environments on on-prem Kubernetes clusters, while production workloads were deployed on Amazon EKS. The design featured modular integration of OpenMetadata, Apache Ranger, Unity Catalog, and Data Quality Service. This enabled centralised metadata discovery, real-time data quality validation, and automated policy enforcement.
Built entirely on containerised microservices orchestrated via Kubernetes, the platform supported flexible scaling and seamless integration with existing systems, including Amazon RDS and S3. CI/CD pipelines with environment-specific overlays ensured consistent, rapid deployments across both on-prem and cloud environments. IAM, IRSA, and Secrets Manager secured access at every layer.
AWS Services Used
-
Amazon EKS: Managed Kubernetes cluster, which is used to deploy and orchestrate all microservices, providing scalability and high availability.
- Amazon S3: Central storage layer for metadata exports, validation outputs, and archival data.
- Amazon RDS: Hosted the metadata repository and governance configurations using a Postgresql-compatible engine.
- AWS IAM: Enforced fine-grained, role-based access control across services, APIs, and users.
- AWS Secrets Manager: Managed sensitive credentials and access tokens securely between services.
- Amazon CloudWatch: Provided logging, metric collection, and alerting for service health and operations.
Architecture Diagram
Implementation Details
The implementation followed Agile methodology over an 11-month timeline. It began with collaborative workshops to identify governance requirements, compliance constraints, and key performance indicators (KPIs). Core services were deployed on Amazon EKS and on-prem Kubernetes using Helm, with OpenMetadata connectors established for ingesting metadata from Amazon RDS and third-party sources. Unity Catalog was layered to define classification rules and schemas, while Apache Ranger was integrated for managing policy-based access control.
Data quality checks were implemented using Data Quality, which was connected to batch and streaming sources. Rules were codified centrally and run against live pipelines to ensure data accuracy and completeness.
Security and compliance were core to the deployment: IRSA was used for IAM role assignment per Kubernetes pod in EKS, TLS was enforced across services, and data was encrypted both in transit and at rest using KMS-backed configurations. The architecture is fully aligned with GDPR and supply chain regulatory frameworks.
Timeline and Major Milestones:
-
Months 1–2: Stakeholder alignment, IAM/IRSA setup
-
Months 3–4: Metadata ingestion pipelines, catalog deployment
-
Month 5: Data quality rules defined and integrated
-
Months 6–7: Policy enforcement with Apache Ranger
-
Months 8–11: User onboarding, access controls, and production rollout across hybrid environments.
Innovation and Best Practices
The solution adhered to AWS Well-Architected Framework principles, prioritizing security, operational excellence, and performance efficiency. Xenonstack leveraged container-native design patterns on Kubernetes and EKS to build independently scalable services, while infrastructure as code (IaC) and GitOps practices enabled reliable, automated deployments.
A key innovation was the integration of OpenMetadata and Apache Ranger for dynamic governance based on real-time metadata context. This reduced manual overhead in policy enforcement and improved traceability. Unity Catalog enabled scalable schema management with minimal duplication and maximum clarity across teams.
Results and Benefits
Business Outcomes and Success Metrics
After implementing the unified metadata management and governance solution on Amazon EKS, the customer experienced substantial improvements across key performance and business indicators. The move to a cloud-native architecture delivered a 40% reduction in total cost of ownership (TCO) by replacing legacy license-heavy systems with scalable, open-source tools. Operational reporting latency dropped from 24–36 hours to under 1 hour, allowing stakeholders to make timely, data-driven decisions.
Time-to-market for new analytics initiatives improved by 60%, as the new platform allowed metadata and data quality pipelines to be deployed rapidly with minimal engineering overhead. Additionally, data source onboarding time was reduced from 2–3 weeks to less than 3 days, accelerating insights across procurement, inventory, and logistics teams.
The ability to automate compliance audits and streamline metadata discovery gave Customer a significant competitive edge in responding to shifting global supply chain demands. The platform enabled more accurate demand forecasting, improved supplier negotiation, and reduced the risk of inventory stock-outs.
The customer realised a full ROI within the first year, driven by reduced infrastructure costs, higher productivity, and faster analytics cycles.
Technical Benefits
The solution delivered significant performance enhancements through containerised microservices deployed on Amazon EKS. By shifting from batch processing to real-time ingestion and validation, Customer achieved a 95% improvement in pipeline throughput, minimising data lag and manual corrections.
Scalability was enhanced through pod-level auto-scaling and decoupled service architectures, ensuring optimal resource allocation during peak loads. Reliability increased with high availability across services and resilient failover mechanisms. From a security perspective, integration with AWS IAM and Secrets Manager strengthened access control, while data was encrypted in transit and at rest, meeting GDPR and internal audit requirements. The use of IRSA for Kubernetes service accounts ensured fine-grained identity management per microservice.
The project also reduced technical debt by retiring monolithic ETL and governance tools and replacing them with open, modular frameworks. Development velocity improved thanks to CI/CD pipelines and GitOps practices, enabling faster iterations and safer deployments.
Lessons Learned
Challenges Overcome
During the implementation, the team faced several significant challenges. One of the most complex issues was integrating Apache Ranger's policy engine with dynamic metadata updates from OpenMetadata. Early stages of development also revealed usability concerns, especially for non-technical users navigating the governance dashboard. In addition, data ingestion pipelines needed to accommodate both batch and real-time sources, requiring custom connectors and transformations.
To overcome these, the team adopted a modular deployment strategy, allowing individual services to be debugged and optimised independently. Mid-project, user feedback loops were introduced to refine dashboard layouts and accessibility. The data ingestion architecture was restructured to support schema evolution, which reduced ingestion failures and manual interventions.
The original plan was adjusted to delay full UI rollout until after extensive stakeholder training and feedback collection. This phased onboarding approach significantly improved adoption and reduced friction during transition.
Best Practices Identified
A key learning was the value of starting with clear KPIS aligned to business outcomes. This ensured architectural decisions remained focused and avoided overengineering. Kubernetes-native deployment on Amazon EKS provided flexibility, resource isolation, and auto-scaling, which are critical for handling variable workloads across departments.
Success was also driven by GitOps practices, including version-controlled infrastructure, automated CI/CD pipelines, and real-time policy deployment. Embracing open-source tools like OpenMetadata and Data Quality enabled quick customisation and avoided vendor lock-in.
These approaches can benefit similar implementations by enabling rapid innovation with security and compliance at the core. Providing a self-service metadata experience through lightweight UIs also empowered business users, reducing reliance on central data teams.
Future Plans
Looking ahead, Customer plans to expand the platform’s capabilities by integrating streaming data sources using Apache Kafka. The next phase will introduce ML-based anomaly detection for proactive data quality monitoring. Additionally, integration with AWS Lake Formation is under evaluation to enhance data cataloging and cross-service permissions.
Future optimization will include query acceleration and dashboard caching for analytics users. Xenonstack and the customer will continue collaborating to enable external audit dashboards, further enhancing regulatory transparency. The partnership will also explore AWS Marketplace integrations for rapid deployment of new governance modules.