lakeFS

lakeFS

lakefs.io

2

About this website

lakeFS is an open-source data lake version control platform developed by Treeverse (CEO Einat Orr), bringing Git-like branching, committing, merging, and rollback capabilities to data lake storage systems including Amazon S3, Google Cloud Storage, and Azure Blob Storage. This means data engineering teams can manage data asset versions exactly as they manage code versions, creating isolated branches on the data lake for experimentation and testing without copying any underlying data. lakeFS achieves zero-data-copy branching through copy-on-write metadata management. Core use cases include data quality assurance (running validation tests on production data copies before merging to the main branch), AI model training reproducibility (recording the exact data version used in each training run), data pipeline change testing (validating ETL logic changes on a new branch without affecting production), and data compliance and auditing (automatically recording complete data lineage and audit logs for all changes). In terms of performance, lakeFS claims to reduce testing time by 80 percent validated through multiple customer case studies, and accelerate ML model release velocity by 3x. Notable customer cases include Arm Holdings (using lakeFS for automated data cleaning and governance), Lockheed Martin (AI reproducibility assurance), Netflix (large-scale data lake test isolation), and the United States Department of Energy (data governance for AI model development). lakeFS was recognized by Gartner as a representative vendor in the 2025 DataOps Tools Market Guide. The project is open-sourced under the Apache 2.0 license on GitHub under the treeverse organization.

Statistics

2
Views
0
Clicks
0
Like
0
Dislike

Comments

Log In to post a comment

No comments yet. Be the first!