Interested in Solving your Challenges with XenonStack Team

Get Started

Get Started with your requirements and primary focus, that will help us to make your solution

Proceed Next

Enterprise Data Management

Amundsen Lyft - The New Revelation in DataOps

Chandan Gaur | 18 April 2025

Amundsen Lyft - The New Revelation in DataOps
8:46
Amundsen Lyft - The New Revelation in DataOps

In an age where data drives every strategic decision, enterprises are undergoing a foundational shift—from managing raw data assets to enabling seamless data consumption. This transition requires a reliable framework where data is not only accessible and trustworthy but also discoverable, governed, and actively managed. Welcome to the world of DataOps—a discipline that combines Agile principles, DevOps best practices, and data governance to enable high-quality data flow across organisations.

 

At the centre of this DataOps revolution stands Amundsen, the open-source data discovery platform developed at Lyft, which democratized metadata management. However, as enterprises scaled and data complexity grew, the limitations of static metadata catalogs surfaced. In response, Amundsen Lyft emerged—a smarter, more connected, and automated evolution of its predecessor. This isn’t just an upgrade. It’s a reimagining of data discovery and metadata orchestration that aligns perfectly with the ethos of modern DataOps.

What is Amundsen Lyft?

Amundsen Lyft is an advanced data discovery and metadata orchestration platform built on the backbone of the original Amundsen, but evolved with enterprise-scale requirements in mind. It retains the core functionalities—search, preview, ownership, and lineage—while introducing enhanced automation, intelligent recommendations, and deeper integration into the modern data stack.

Key Highlights:

  • Smart Metadata Discovery: Powered by machine learning to auto-classify, tag, and recommend datasets.

  • Unified Data Graph: A real-time, connected view of data assets, lineage, and usage patterns.

  • Collaboration-first Design: Combining data engineers, analysts, and domain experts into a shared knowledge ecosystem.

  • Governance Integration: Built-in hooks for compliance, role-based access, and audit trails.

  • Cloud-native Architecture: Scalable and containerised for modern deployments on Kubernetes, AWS, GCP, and Azure.

From Amundsen to Amundsen Lyft: The Evolution

The original Amundsen project provided a powerful solution for metadata indexing and search. However, as organisations scaled, they needed more:

Feature Amundsen (Original) Amundsen Lyft
Search Capability Metadata search & ranking ML-driven semantic search
Lineage View Manual or semi-automated Real-time lineage from DAGS
Collaboration Basic tagging & Ownership Integrated Slack, comments, and notes
Governance Limited metadata roles Role-based access control & audit logs
AI Capabilities N/A Dataset recommendations & quality scoring

Amundsen Lyft addresses the gaps by introducing intelligence, automation, and extensibility, making it suitable for large-scale deployments across multiple teams, business units, and regions.

Why Amundsen Lyft Elevates DataOps Efficiency

DataOps aims to make data delivery fast, reliable, and collaborative. Here’s how Amundsen Lyft accelerates this transformation:

1. Automated Metadata Ingestion and Classification

No more manual metadata entry or static configurations. Amundsen Lyft ingests metadata from:

  • Data lakes (Amazon S3, Google Cloud Storage)

  • Data warehouses (Snowflake, BigQuery, Redshift)

  • ETL tools (Airflow, debt, Matillion)

  • BI platforms (Looker, Tableau). Once ingested, it auto-classifies datasets using natural language processing, infers data types, applies sensitivity labels, and assigns ownership, saving countless engineering hours.

2. Real-Time Data Lineage

Data lineage isn’t just a nice-to-have—it’s necessary for debugging pipelines, tracing data transformations, and complying with regulations. Amundsen Lyft integrates directly with:

  • Orchestration tools (Airflow, Dragster)

  • debt models

  • Git-based workflows. It dynamically generates lineage graphs, showing upstream/downstream dependencies and change history in real time.

3. Collaborative Data Discovery

Data discovery becomes more intuitive and social. Users can:

  • Comment on datasets

  • Tag and categorise with business terms

  • Request access or clarification from the owners

  • Embed Slack threads and JIRA tickets. This transforms data discovery into a living documentation hub instead of a static catalogue.

4. Intelligent Recommendations

Borrowing from e-commerce and content streaming paradigms, Amundsen Lyft provides:

  • People also used” suggestions

  • Popular datasets within departments

  • Frequently joined tables

  • Recommended dashboards based on similar queries. This fosters a self-service culture where users find relevant data faster and reduce dependency on data engineering.

5. Integrated Governance and Compliance

Amundsen Lyft supports role-based metadata access, audit logging, and GDPR-ready data classification. Organisations can define:

  • Who can view PII fields

  • What metadata can be edited

  • How long are logs retained? It is also enterprise-ready and integrates with IAM platforms, Okta, and RBAC policies.

Where Amundsen Lyft Fits into the DataOps Lifecycle

DataOps Stage Role of Amundsen Lyft
Ingestion Tracks source systems, captures metadata on ingestion
Orchestration Integrates with workflow engines for real-time lineage
Transformation Monitors debt/Airflow pipelines for schema changes
Governance Applies metadata tagging, data sensitivity, and access control
Consumption Enables search, preview, and sharing of trusted data
Observability Tracks usage metrics, freshness, and data quality scores

This full-spectrum integration makes Amundsen Lyft the metadata backbone for any DataOps-centric organisation.

Use Cases Across Industries

1. Financial Services

Challenge: Regulatory compliance (GDPR, SOX), high-risk data
Solution: Use Amundsen Lyft to tag sensitive data, track data lineage for audits, and limit PII access

2. Healthcare

Challenge: Protected Health Information (PHI) management
Solution: Auto-classify PHI fields, integrate with IAM for role-based access

3. Retail & E-commerce

Challenge: Product, transaction, and customer data spread across platforms
Solution: Create a unified metadata graph for product analytics, marketing, and inventory insights

4. Manufacturing

Challenge: Sensor and machine-generated data with complex lineage
Solution: Real-time tracking of data from Iot devices, lineage mapping of ML-driven quality models

5. Technology & SaaS

Challenge: Democratizing data for developers, PMS, and executives
Solution: Build custom discovery portals using Amundsen Lyft APIS

Extending Amundsen Lyft in the Modern Data Stack

Amundsen Lyft is designed to be modular and pluggable.

Integrations:

  • Orchestration: Airflow, Prefect, Dragster

  • Warehouses: Snowflake, BigQuery, Redshift

  • BI Tools: Looker, Mode, Tableau

  • Catalogs: AWS Glue, Google Data Catalog

  • Observability: Monte Carlo, Soda

API-first Approach:

All components—search, graph, frontend, and metadata service—can be extended with custom APIS, enabling organisations to build proprietary portals, bots, or governance layers on top of Amundsen Lyft.

Challenges and Considerations

While Amundsen Lyft is powerful, a successful implementation requires:

  • Metadata hygiene: Garbage in, garbage out. Quality metadata ingestion is essential.

  • Change management: Adoption among non-engineering teams takes time and training.

  • Scalability planning: Large organisations may need to scale graph databases (e.g., Neptune or Neo4j) and search services (e.g., Elasticsearch).

The Future of Amundsen, Lyft and DataOps

As DataOps matures, metadata becomes the control plane for enterprise data ecosystems. Future directions for Amundsen Lyft include:

  • AI-powered metadata bots for anomaly detection and schema drift

  • Natural language interfaces for data search and exploration

  • Tighter integration with LLMS for documentation auto-generation

  • Federated metadata across multi-cloud and hybrid architectures. In many ways, Amundsen Lyft is not just a product—it’s an operating system for metadata, setting the stage for the next era of intelligent data operations.

In the battle for data agility and trust, metadata is your greatest ally, and Amundsen Lyft is the weapon of choice. Its enterprise-grade features, real-time lineage, intelligent recommendations, and collaborative discovery capabilities represent a paradigm shift in DataOps tooling. Organisations that embrace the Amundsen Lyft position themselves for faster decision-making, stronger data governance, and scalable self-service. As the demands of data consumers grow more complex and real-time, tools like Amundsen Lyft are not just helpful—they are essential. The future of DataOps is metadata-driven. And the future of metadata starts with Amundsen Lyft.

Next Steps with Amundsen Lyft

Talk to our experts about implementing compound AI system, How Industries and different departments use Agentic Workflows and Decision Intelligence to Become Decision Centric. Utilizes AI to automate and optimize IT support and operations, improving efficiency and responsiveness.

More Ways to Explore Us

Data Foundry Services and Data Intelligence

arrow-checkmark

DataOps Testing Tools and its Best Practices

arrow-checkmark

DataOps as a Service: Introduction, Services and Benefits

arrow-checkmark

 

Table of Contents

Get the latest articles in your inbox

Subscribe Now