Apache Airflow Workflow Platform

Apache Airflow Workflow Platform

github.com

2

About this website

Apache Airflow is a free and open-source platform to programmatically author, schedule, and monitor workflows as directed acyclic graphs (DAGs) of tasks. Originally created at Airbnb by Maxime Beauchemin in 2014 (open-sourced in 2015, becoming a top-level Apache project in 2019), Airflow has become one of the most popular data orchestration tools, used by companies including Google, Slack, Robinhood, and Walmart. Key features: Python-based DAGs: workflows are defined as Python code. Each DAG consists of tasks connected by dependencies, forming a directed acyclic graph. Tasks are instances of Operators (BashOperator, PythonOperator, KubernetesPodOperator) that define what to execute. The code-as-configuration approach enables version control, testing, and dynamic DAG generation. Scheduler: monitors DAGs and triggers tasks when their upstream dependencies are met. Supports cron expressions, timedelta intervals, and dataset-based scheduling. Executors: pluggable backends determine where tasks run: SequentialExecutor (development), LocalExecutor (parallel on single machine), CeleryExecutor (distributed via Celery + Redis/RabbitMQ), KubernetesExecutor (each task in its own pod). Operators and Hooks: over 200 built-in operators for AWS, GCP, Snowflake, Databricks, PostgreSQL, Spark, Kubernetes, and HTTP APIs. Hooks provide reusable connections. XCom: cross-communication mechanism for passing data between tasks. Jinja2 templating for dynamic values. UI: rich web interface for DAG visualization (graph, Gantt, calendar, tree views), task inspection, log viewing, manual trigger, and retry. Pools and priority weights for resource management. Sensors for waiting on external conditions. Python codebase. Apache-2.0.

Tags & Categories

Statistics

2
Views
0
Clicks
0
Like
0
Dislike

Comments

Log In to post a comment

No comments yet. Be the first!