[Airflow-2423] syncing DAGs without scheduler/web-server restart#3318
[Airflow-2423] syncing DAGs without scheduler/web-server restart#3318aditiverma wants to merge 2 commits intoapache:masterfrom
Conversation
|
@aditiverma thanks for your contribution! We are in the process of abstracting a dag fetcher which will allow all kinds of fetchers to be extended from the a common base (see #3138). |
|
@jgao54 thanks for the initiative! Could you also review this PR to be included as a part of the dag fetcher? |
|
I want to put this PR on hold and revisited once the fetcher is implemented. Reason being this will need to be extended from the BaseDagFetcher. Merging it now will add unnecessary complication to the fetcher implementation. I'd ask you to add some unit tests but that would be early optimization, given the overall pr will need to adopt the fetcher once that's implemented. |
|
@jgao54 sounds good. Please update me once the BaseDagFetcher is ready to be extended. |
|
See #3138. I'm going to close this PR for now. |
|
Hi all i want to implement one logic |
Make sure you have checked all steps below.
JIRA
Description
This PR enables syncing DAGs from a common remote location in S3 without scheduler or web-server restart. Syncing DAGs is useful while running Airflow on a distributed setup (like in mesos), where the hosts/containers running the scheduler and the web-server can change with time. Also, where scheduler and web-server restart is to be avoided for every DAG update/addition. This PR syncs DAGs periodically from S3 location with the local DAGs in scheduler and web-server. The newly added/updated DAG in S3 is reflected in the web-server and scheduler local directories, and added to the meta-store backend on every call of
collect_dagsIf
s3_dags_folderproperty is defined in the airflow config, the '.py' files from S3 location are recursively scanned. The corresponding DAG file from S3 is downloaded only if its new or its last update timestamp is later than the local DAG file's last update timestamp.Tests
Tested it locally, and using it with airflow deployment on mesos. Currently, there is no test for s3_hook which is required in this PR, due to which a test for S3 DAG sync is not added
Commits
Documentation
Code Quality
git diff upstream/master -u -- "*.py" | flake8 --diff