NashTech Blog

Transitioning from Airflow 1 to 2

Migrating from Apache Airflow 1 to 2 is a substantial shift due to architectural changes, improved scalability, and more features in Airflow 2, but it requires a well-planned upgrade process. This guide will help in a smooth transition by covering major changes, migration steps, and best practices.

Overview of Key Changes in Airflow 2

Scheduler Improvements

Airflow 1’s single-threaded scheduler could bottleneck larger workflows. Airflow 2 introduces a highly available scheduler with support for multi-threading and parallelism. The scheduler can now run independently of the web server, leading to faster scheduling and separation of concerns.

TaskFlow API

TaskFlow API, introduced in 2, offers a decorator-based approach to defining tasks and dependencies, making DAGs more readable and modular.

Better Performance and Scalability

Smart Sensors: Sensors now use a “smart” mechanism to reduce load by avoiding database polling for task status checks.
High Availability (HA): Multiple schedulers can be used in an HA configuration, which ensures zero downtime and reduces the risk of single points of failure.

Comprehensive REST API

Airflow 1's experimental REST API was limited. Airflow 2 offers a fully-fledged, standardized REST API for extensive automation and integration.

Enhanced UI and CLI Changes

Better visual cues, more DAG management features, and an overall improved user experience in the Airflow 2 UI. Some CLI commands have changed, requiring updated command knowledge.

Step-by-Step Migration Process

Step 1: Prepare the Environment

Version Control: Backup your DAGs, plugins, and configurations.
Update Python Environment: Make sure Python 3.6+ is installed and available.
Upgrade Dependencies: Many libraries used in Airflow might need updates. Run compatibility checks with your libraries.

Step 2: Install Airflow 2 using PIP

				
					pip install apache-airflow==2

#Airflow 2 requires a database schema upgrade, which can be applied by:
airflow db upgrade

				
			

Step 3: Modify DAGs for TaskFlow API

Refactor DAGs Using TaskFlow: Replace task definitions with TaskFlow API where possible, using decorators for cleaner code:
				
					from airflow.decorators import task, dag

				
			

Step 4: Update Sensor Usage

Replace any long-running sensors with smart sensors to reduce resource load:
				
					ExternalTaskSensor(..., mode='reschedule')

				
			

Step 5: Handle New Scheduler Configurations

Parallelism and HA Scheduler Configurations: Update airflow.cfg with scheduler_ha and configure appropriate resources based on workload.
Scheduler Interval Adjustment: Default intervals for checking DAG statuses have changed; confirm intervals in the airflow.cfg file.

Step 6: Revise CLI Commands

				
					airflow dags list   # instead of airflow list_dags

				
			

Utilize the New REST API

Authentication and Permissions: Configure authentication for REST API access, which supports key-based or OAuth authentication.
Automated Workflow Triggering: The new API provides easy endpoints for triggering DAG runs, managing tasks, and more:
				
					curl -X POST "http:///api/v1/dags//dagRuns"

				
			

Conclusion

Migrating to Airflow 2 requires a solid understanding of new features and architectural shifts, especially around the scheduler and TaskFlow API. With improved scalability and a robust REST API, Airflow 2 offers a great opportunity to streamline data pipelines, automate workflows, and enhance Airflow’s reliability in production settings.
Scroll to Top