Real-Time Supply Chain Lakehouse on Amazon EKS for Global Visibility

13:38

A global supply chain enterprise, partnered with Xenonstack to implement a real-time Lakehouse architecture on Amazon EKS. Their legacy systems led to delayed insights, fragmented data silos, and inefficiencies in procurement and inventory management, by leveraging open-source tools like Apache Kafka, Spark, Iceberg, and Trino on Kubernetes, the new platform achieved near real-time analytics, reduced stock-outs, and provided unified data visibility. Key outcomes include a 95% reduction in reporting latency and 40% TCO savings.

Customer Challenge

Business Challenges

The client faced significant hurdles in modernising their supply chain analytics due to the following:

The business problems the customer was facing:

No real-time visibility into inventory, orders, and shipments

Inconsistent data across business units

Manual report generation is slowing down decision-making

Technical limitations or challenges they encountered:

Siloed data architecture relying on legacy ETL and RDS

Inability to process both batch and streaming data at scale

Lack of standardised schema and unified data governance

The business goals they were trying to achieve:

Enable real-time operational visibility across global supply chains

Build a scalable analytics infrastructure

Provide self-service BI for business stakeholders

Why was their existing solution inadequate?

ETL jobs were delayed and batch-based

Legacy tools couldn't support near real-time updates or scale efficiently

No centralised data access or orchestration layer

Any compliance, security, or performance requirements:

Role-based access control across departments

Regional data compliance and encryption for sensitive logistics data

Critical timelines or business pressures:

Required real-time capabilities to address frequent inventory stock-outs
Business demanded visibility across global operations during a rapid expansion phase

Technical Challenges

Legacy systems and infrastructure challenges: The existing infrastructure relied on outdated ETL tools and lacked support for modern streaming data frameworks.
Technical debt or architectural limitations: Fragmented architecture with siloed systems introduced inconsistencies and hindered data unification.
Integration requirements: Integration was needed across multiple AWS services, partner data feeds, Iot sources, and legacy RDS instances.
Scalability, reliability, or performance issues: The previous solution was not scalable for global operations and suffered from latency during peak usage.
Data challenges: Inconsistent schemas, lack of data versioning, and minimal observability made data debugging and auditing difficult.
Security and compliance requirements: Required implementation of role-based access, encryption for sensitive data, and adherence to regional compliance regulations.

Partner Solution

Solution Overview

Xenonstack designed a two-layer modular architecture on Amazon EKS:

Data Ingestion & Processing Layer

This layer supports both real-time and batch ingestion from diverse sources across the customer supply chain:

AWS DMS streams transactional data (orders, inventory, shipments) from Amazon RDS into Apache Kafka topics.

Iot or partner data is also streamed into Kafka for event-driven processing.

Apache Spark on EKS handles batch transformations and structured stream processing, writing output directly to Apache Iceberg tables on Amazon S3.

Apache Airflow orchestrates data pipelines and dependency-based workflows using container-native DAGS.

Lakehouse Storage & Analytics Layer

This layer provides governed, queryable storage and interactive BI dashboards:

Transformed data is stored in Apache Iceberg tables (partitioned and ACID-compliant) on Amazon S3, supporting schema evolution and time travel.

Trino, deployed on EKS, is the SQL query engine over Iceberg datasets, enabling fast, federated analytics.

Apache Superset dashboards offer business users real-time visibility into supply chain KPIS without engineering dependency.

All services are monitored using Prometheus and Grafana and are integrated into the Kubernetes control plane.

AWS Services Used

Amazon EKS: Kubernetes orchestration for containerised workloads

Amazon S3: Object storage for Iceberg datasets

Amazon RDS: Source for transactional data

Amazon DMS: Data migration and replication from RDS

AWS IAM: RBAC for scoped access
Amazon CloudWatch, Prometheus, Grafana: Monitoring and observability

Architecture Diagram

architecture-diagram-2

Implementation Details

Xenonstack implemented the solution over 11 months using Agile methodology and DevOps automation. The team began with stakeholder workshops to identify key operational KPIS, which helped guide architecture and prioritisation.

How the solution was implemented: RDS transactional data was streamed into Kafka using AWS DMS. Apache Spark jobs—configured for both batch and streaming—ran on Amazon EKS. Apache Airflow orchestrated data pipelines, while Iceberg tables on S3 stored transformed data. Trino on EKS enabled federated queries, and Superset provided visual dashboards for business users.
Methodology used: Agile sprints guided iterative development and testing. DevOps best practices were followed, and GitOps pipelines were used for deployment automation.
Migration approach: The legacy ETL system was replaced in phases, starting with ingestion pipelines, processing logic, and finally dashboarding tools. This ensured continuity of business operations.
Integration with existing systems: The platform was integrated with Amazon RDS, AWS IAM, and external data providers. Kafka connected Iot and partner data sources.
Security and compliance considerations: IAM-based RBAC was enforced at the Kubernetes and data access level. Encryption and compliance policies were aligned with GDPR and regional supply chain regulations.
Deployment and testing strategy: Components were containerised and deployed via Helm on EKS. Integration and load testing were performed using automated test suites. Grafana and Prometheus were configured for observability.

Timeline and major milestones:

Months 1–2: Requirements gathering, DMS-Kafka integration

Months 3–4: Spark pipeline implementation, Iceberg table design

Month 5: Trino setup, dashboard prototyping in Superset
Months 6–7: Performance tuning, role-based access, production deployment

Innovation and Best Practices

The solution adopted several AWS best practices, including modular design, containerization, and CI/CD deployment on EKS.

How the solution leveraged AWS best practices: Each service (Kafka, Spark, Trino) was containerised and deployed using Helm on EKS with auto-scaling. Logging and monitoring were built in via CloudWatch, Prometheus, and Grafana.
Innovative approaches or unique aspects of the implementation: Apache Iceberg for ACID-compliant data lakes enabled schema evolution and time travel, simplifying the onboarding of new data. A KPI-first approach minimised overengineering and kept efforts aligned with business outcomes.
Use of AWS Well-Architected Framework principles: Operational Excellence was achieved through GitOps pipelines and observability. Reliability and Performance Efficiency were addressed with resource-tuned deployments and stream processing. Cost optimisation was accomplished via open-source tools and S3-based object storage.

DevOps, CI/CD, or other modern practices implemented:

GitOps workflows using Argocd

CI/CD pipelines for infrastructure and Spark jobs

Helm charts for consistent multi-environment deployment

Prometheus and Grafana dashboards for real-time monitoring

Started with KPI-first design, avoiding over-architecture

Used Kubernetes-native services for modular scaling

Applied Iceberg for schema evolution and unified storage
Enabled Superset for business user self-service
Applied AWS Well-Architected principles and container best practices

Results and Benefits

Business Outcomes and Success Metrics

Cost savings (specific percentages or amounts)

Achieved a 40% reduction in total cost of ownership by shifting to an EKS-based, open-source lakehouse architecture.

Reduced infrastructure and license expenditures through pay-as-you-go models and container orchestration.

Revenue increases or new revenue streams

Enabled faster demand forecasting and order planning, contributing to improved supplier negotiations and reduced inventory carrying costs.

Time-to-market improvements

Reporting latency reduced from 24–36 hours to under 1 hour, accelerating business decision-making.

Forecast model refresh cycle improved from weekly (manual) to daily (automated), enhancing agility.

Operational efficiencies

Fully automated dashboards eliminated the need for manual report generation (saving 8–10 hours/week).

Onboarding new data sources now takes less than 3 days, compared to the previous 2–3 weeks.

Competitive advantages gained

Real-time dashboards and unified analytics gave customers visibility into global supply chain metrics, enhancing responsiveness and reducing stock-outs.

ROI and payback period

A significant reduction in infrastructure overheads and operational delays led to a rapid ROI, with the payback period achieved within the first year of deployment.

Technical Benefits

Performance improvements (with metrics): The shift to real-time pipelines and distributed processing significantly improved performance. Reporting latency was reduced from 24–36 hours to under 1 hour. Automated model refresh cycles accelerated from weekly to daily.
Scalability enhancements: The platform achieved modular scalability by containerising services and deploying them on Amazon EKS. Kafka, Spark, and Trino components could scale independently based on workload demands, ensuring optimal performance during peak hours.
Reliability and availability improvements: The Kubernetes-native design introduced auto-healing, load balancing, and high availability across services. Data pipelines became resilient to node failures, and real-time ingestion pipelines ensured data continuity.
Strengthening security posture: IAM-based RBAC and service account policies provided fine-grained access controls. Data stored in Iceberg tables on S3 was encrypted in transit and at rest, satisfying regional compliance requirements.
Reduced technical debt: Replacing legacy batch ETL and monolithic reporting tools with modern open-source frameworks reduced code complexity, enhanced maintainability, and lowered long-term technical overhead.
Improved development velocity: GitOps automation and CI/CD pipelines for infrastructure and jobs enabled faster iteration cycles. Teams could test, deploy, and monitor new pipelines or features with minimal manual intervention.

Lessons Learned

Challenges Overcome

During implementation, the team encountered several complex challenges:

Significant challenges encountered during implementation:

Tuning Apache Spark on Kubernetes for optimal performance was difficult due to the large shuffle operations.

Kafka experienced high latency during peak usage, which impacted data streaming.

Designing role-based access for multiple departments while ensuring compliance was initially complicated.

Initial Superset dashboards were too technical for non-technical users.

How were these challenges addressed?

Spark issues were resolved through executor memory tuning and custom pod resource configurations.

Adjusting partition strategies, increasing broker count, and offloading archival workloads to batch pipelines improved Kafka performance.

RBAC was implemented using IAM and Kubernetes service accounts to enforce scoped access.

Dashboard usability was enhanced after feedback sessions with business stakeholders, leading to simpler and more targeted dashboards.

Adjustments made to the original plan:

Additional focus was placed on UI/UX for BI dashboards.
Performance benchmarking became an ongoing process to optimise workloads on EKS.

Best Practices Identified

Key learnings from the implementation:

Starting with clear business KPIS helped align the architecture with tangible goals.

Kubernetes-native deployment offered flexibility in resource allocation and service modularity.

Practices that contributed to success:

Adoption of open-source, cloud-native tools avoided vendor lock-in.

GitOps and CI/CD pipelines ensured rapid, consistent, and observable deployments.

Iceberg's ACID compliance and schema evolution made it easier to onboard new data sources.

Approaches that could benefit other implementations:

A KPI-first planning methodology aligns IT architecture with measurable business outcomes.

Empowering business users through self-service BI reduces dependency on data engineering.

Early investment in observability tools like Prometheus and Grafana improves operational confidence and reduces mean-time-to-resolution (MTTR).

KPI-first solution design drove relevance.

Open source on Kubernetes ensured cost-effectiveness.

Superset accelerated BI adoption

RBAC ensured compliance across teams

Next Steps with Supply Chain Lakehouse

Talk to our experts about implementing compound AI system, How Industries and different departments use Agentic Workflows and Decision Intelligence to Become Decision Centric. Utilizes AI to automate and optimize IT support and operations, improving efficiency and responsiveness.

Talk To Specialist

Interested in Solving your Challenges with XenonStack Team

Get Started

Interested in Solving your Challenges with XenonStack

Personalization

In Which Agentic Platform and Accelerator you are Interested? *

Which segment does your company belong to? *

What is your primary focus areas? *

At what stage is your AI use case currently in? *

What are the primary challenges in adopting AI? *

What kind of infrastructure does your organization currently using? *

Are you using any Data platform? *

Preferred Approach for AI Transformation *

In Which Domain your Solution/Organization belongs to in-terms of Data Privacy, Trustworthy AI *

Captcha Verification *

your request has been submitted successfully !

Real-Time Supply Chain Lakehouse on Amazon EKS for Global Visibility

Customer Challenge

Business Challenges

The business problems the customer was facing:

Technical limitations or challenges they encountered:

The business goals they were trying to achieve:

Why was their existing solution inadequate?

Any compliance, security, or performance requirements:

Critical timelines or business pressures:

Technical Challenges

Partner Solution

Solution Overview

Data Ingestion & Processing Layer

Lakehouse Storage & Analytics Layer

AWS Services Used

Architecture Diagram

Implementation Details

Innovation and Best Practices

Results and Benefits

Business Outcomes and Success Metrics

Technical Benefits

Lessons Learned

Challenges Overcome

Best Practices Identified

Key learnings from the implementation:

Practices that contributed to success:

Approaches that could benefit other implementations:

Next Steps with Supply Chain Lakehouse

More Ways to Explore Us

Data Lakehouse Architecture and its Use Cases in Industries

Supply Chain Management System Benefits and Solutions

Supply Chain Analytics and Its Use Cases

Share Article

Table of Contents

Share Article

Explore Related Topics

Navdeep Singh Gill

Subscribe to our Latest Technology Insights and Resources

Get the latest articles in your inbox

Related Articles

Re-Imagining Interior Design with AI-Powered Design Agent

Accelerating Software Delivery with XenonStack’s AWS DevOps CI/CD

Customer Engagement with Contact Center AI on AWS