Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion airflow-ctl/docs/cli-and-env-variables-ref.rst
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ CLI
'''

airflowctl has a very rich command line interface that allows for
many types of operation on a DAG, starting services, and supporting
many types of operation on a Dag, starting services, and supporting
development and testing.

.. note::
Expand Down
8 changes: 4 additions & 4 deletions airflow-ctl/docs/howto/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -130,19 +130,19 @@ These visual references show the full command syntax, options, and parameters fo
:width: 60%
:alt: airflowctl Connections Command

**DAGs**
**Dags**
''''''''
.. image:: ../images/output_dag.svg
:target: https://raw.githubusercontent.com/apache/airflow/main/airflow-ctl/docs/images/output_dag.svg
:width: 60%
:alt: airflowctl DAG Command
:alt: airflowctl Dag Command

**DAG Runs**
**Dag Runs**
''''''''''''
.. image:: ../images/output_dagrun.svg
:target: https://raw.githubusercontent.com/apache/airflow/main/airflow-ctl/docs/images/output_dagrun.svg
:width: 60%
:alt: airflowctl DAG Run Command
:alt: airflowctl Dag Run Command

**Jobs**
''''''''
Expand Down
4 changes: 2 additions & 2 deletions chart/docs/adding-connections-and-variables.rst
Original file line number Diff line number Diff line change
Expand Up @@ -52,8 +52,8 @@ to override values under these sections of the ``values.yaml`` file.

Variables
---------
Airflow supports Variables which enable users to craft dynamic dags. You can set Variables in Airflow in three ways - UI,
command line, and within your DAG file. See :doc:`apache-airflow:howto/variable` for more.
Airflow supports Variables which enable users to craft dynamic Dags. You can set Variables in Airflow in three ways - UI,
command line, and within your Dag file. See :doc:`apache-airflow:howto/variable` for more.

With the Helm chart, you can also inject environment variables into Airflow. So in the example ``override.yaml`` file,
we can override values of interest in the ``env`` section of the ``values.yaml`` file.
Expand Down
2 changes: 1 addition & 1 deletion chart/docs/airflow-configuration.rst
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,6 @@ configuration prior to installing and deploying the service.

.. note::

The recommended way to load example dags using the official Docker image and chart is to configure the ``AIRFLOW__CORE__LOAD_EXAMPLES`` environment variable
The recommended way to load example Dags using the official Docker image and chart is to configure the ``AIRFLOW__CORE__LOAD_EXAMPLES`` environment variable
in ``extraEnv`` (see :doc:`Parameters reference <parameters-ref>`). The official Docker image has ``AIRFLOW__CORE__LOAD_EXAMPLES=False``
set within the image, so you need to override it with an environment variable when deploying the chart in order for the examples to be present.
62 changes: 31 additions & 31 deletions chart/docs/manage-dag-files.rst
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
.. Licensed to the Apache Software Foundation (ASF) under one
.. Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
Expand All @@ -16,17 +16,17 @@
under the License.


Manage dag files
Manage Dag files
================

When you create new or modify existing DAG files, it is necessary to deploy them into the environment. This section will describe some basic techniques you can use.
When you create new or modify existing Dag files, it is necessary to deploy them into the environment. This section will describe some basic techniques you can use.

Bake dags in docker image
Bake Dags in docker image
-------------------------

With this approach, you include your dag files and related code in the Airflow image.
With this approach, you include your Dag files and related code in the Airflow image.

This method requires redeploying the services in the helm chart with the new docker image in order to deploy the new DAG code. This can work well particularly if DAG code is not expected to change frequently.
This method requires redeploying the services in the helm chart with the new docker image in order to deploy the new Dag code. This can work well particularly if Dag code is not expected to change frequently.

.. code-block:: bash

Expand All @@ -40,7 +40,7 @@ This method requires redeploying the services in the helm chart with the new doc
.. note::

In Airflow images prior to version 2.0.2, there was a bug that required you to use
a bit longer Dockerfile, to make sure the image remains OpenShift-compatible (i.e DAG
a bit longer Dockerfile, to make sure the image remains OpenShift-compatible (i.e Dag
has root group similarly as other files). In 2.0.2 this has been fixed.

.. code-block:: bash
Expand Down Expand Up @@ -101,12 +101,12 @@ If you are deploying an image from a private repository, you need to create a se
Using git-sync
--------------

Mounting dags using git-sync sidecar with persistence enabled
Mounting Dags using git-sync sidecar with persistence enabled
.............................................................

This option will use a Persistent Volume Claim with an access mode of ``ReadWriteMany``.
The scheduler pod will sync dags from a git repository onto the PVC every configured number of
seconds. The other pods will read the synced dags. Not all volume plugins have support for
The scheduler pod will sync Dags from a git repository onto the PVC every configured number of
seconds. The other pods will read the synced Dags. Not all volume plugins have support for
``ReadWriteMany`` access mode.
Refer `Persistent Volume Access Modes <https://kubernetes.io/docs/concepts/storage/persistent-volumes/#access-modes>`__
for details.
Expand All @@ -121,12 +121,12 @@ for details.
# Please refer to values.yaml for details


Mounting dags using git-sync sidecar without persistence
Mounting Dags using git-sync sidecar without persistence
........................................................

This option will use an always running Git-Sync sidecar on every scheduler, webserver (if ``airflowVersion < 2.0.0``)
and worker pods.
The Git-Sync sidecar containers will sync dags from a git repository every configured number of
The Git-Sync sidecar containers will sync Dags from a git repository every configured number of
seconds. If you are using the ``KubernetesExecutor``, Git-sync will run as an init container on your worker pods.

.. code-block:: bash
Expand All @@ -135,28 +135,28 @@ seconds. If you are using the ``KubernetesExecutor``, Git-sync will run as an in
--set dags.persistence.enabled=false \
--set dags.gitSync.enabled=true
# you can also override the other gitSync values
# by setting the dags.gitSync.* values
# by setting the dags.gitSync.* values
# Refer values.yaml for details

When using ``apache-airflow >= 2.0.0``, :ref:`DAG Serialization <apache-airflow:dag-serialization>` is enabled by default,
hence Webserver does not need access to DAG files, so ``git-sync`` sidecar is not run on Webserver.
When using ``apache-airflow >= 2.0.0``, :ref:`Dag Serialization <apache-airflow:dag-serialization>` is enabled by default,
hence Webserver does not need access to Dag files, so ``git-sync`` sidecar is not run on Webserver.

Notes for combining git-sync and persistence
............................................

While using both git-sync and persistence for dags is possible, it is generally not recommended unless the
While using both git-sync and persistence for Dags is possible, it is generally not recommended unless the
deployment manager carefully considered the trade-offs it brings. There are cases when git-sync without
persistence has other trade-offs (for example delays in synchronization of DAGS vs. rate-limiting of Git
persistence has other trade-offs (for example delays in synchronization of Dags vs. rate-limiting of Git
servers) that can often be mitigated (for example by sending signals to git-sync containers via web-hooks
when new commits are pushed to the repository) but there might be cases where you still might want to choose
git-sync and Persistence together, but as a Deployment Manager you should be aware of some consequences it has.

git-sync solution is primarily designed to be used for local, POSIX-compliant volumes to checkout Git
repositories into. Part of the process of synchronization of commits from git-sync involves checking out
new version of files in a freshly created folder and swapping symbolic links to the new folder, after the
checkout is complete. This is done to ensure that the whole dags folder is consistent at all times. The way
git-sync works with symbolic-link swaps, makes sure that Parsing the dags always work on a consistent
(single-commit-based) set of files in the whole DAG folder.
checkout is complete. This is done to ensure that the whole Dags folder is consistent at all times. The way
git-sync works with symbolic-link swaps, makes sure that Parsing the Dags always work on a consistent
(single-commit-based) set of files in the whole Dag folder.

This approach, however might have undesirable side effects when the folder that git-sync works on is not
a local volume, but is a persistent volume (so effectively a networked, distributed volume). Depending on
Expand All @@ -165,7 +165,7 @@ consequences. There are a lot of persistence solutions available for various K8S
them has different characteristics, so you need to carefully test and monitor your filesystem to make sure
those undesired side effects do not affect you. Those effects might change over time or depend on parameters
like how often the files are being scanned by the Dag File Processor, the number and complexity of your
dags, how remote and how distributed your persistent volumes are, how many IOPS you allocate for some of
Dags, how remote and how distributed your persistent volumes are, how many IOPS you allocate for some of
the filesystem (usually highly paid feature of such filesystems is how many IOPS you can get) and many other
factors.

Expand All @@ -176,15 +176,15 @@ to pretty sudden and unexpected demand increase. Most of the persistence solutio
smaller/shorter burst of traffic, but when they outgrow certain thresholds, you need to upgrade the
networking to a much more capable and expensive options. This is difficult to control and impossible to
mitigate, so you might be suddenly faced with situation to pay a lot more for IOPS/persistence option to
keep your dags sufficiently synchronized to avoid inconsistencies and delays in synchronization.
keep your Dags sufficiently synchronized to avoid inconsistencies and delays in synchronization.

The side-effects that you might observe:

* burst of networking/communication at the moment when new commit is checked out (because of the quick
succession of deleting old files, creating new files, symbolic link swapping.
* temporary lack of consistency between files in DAG folders while DAGS are being synced (because of delays
* temporary lack of consistency between files in Dag folders while Dags are being synced (because of delays
in distributing changes to individual files for various nodes in the cluster)
* visible drops of performance of the persistence solution when your DAG number grows, drops that might
* visible drops of performance of the persistence solution when your Dag number grows, drops that might
amplify the side effects described above.
* some of persistence solutions might lack filesystem functionality that git-sync needs to perform the sync
(for example changing permissions or creating symbolic links). While those can often be mitigated it is
Expand All @@ -198,25 +198,25 @@ Synchronizing multiple Git repositories with git-sync
.....................................................

Airflow git-sync integration in the Helm Chart, does not allow to configure multiple repositories to be
synchronized at the same time. The DAG folder must come from single git repository. However it is possible
synchronized at the same time. The Dag folder must come from single git repository. However it is possible
to use `submodules <https://git-scm.com/book/en/v2/Git-Tools-Submodules>`_ to create an "umbrella" repository
that you can use to bring a number of git repositories checked out together (with ``--submodules recursive``
option). There are success stories of Airflow users using such approach with 100s of repositories put
together as submodules via such "umbrella" repo approach. When you choose this solution, however,
you need to work out the way how to link the submodules, when to update the umbrella repo when "submodule"
repository change and work out versioning approach and automate it. This might be as simple as always
using latest versions of all the submodule repositories, or as complex as managing versioning of shared
libraries, dags and code across multiple teams and doing that following your release process.
libraries, Dags and code across multiple teams and doing that following your release process.

An example of such complex approach can found in this
`Manage dags at scale <https://s.apache.org/airflow-manage-dags-at-scale>`_ presentation from the Airflow
`Manage Dags at scale <https://s.apache.org/airflow-manage-dags-at-scale>`_ presentation from the Airflow
Summit.


Mounting dags from an externally populated PVC
Mounting Dags from an externally populated PVC
----------------------------------------------

In this approach, Airflow will read the dags from a PVC which has ``ReadOnlyMany`` or ``ReadWriteMany`` access mode. You will have to ensure that the PVC is populated/updated with the required dags (this won't be handled by the chart). You pass in the name of the volume claim to the chart:
In this approach, Airflow will read the Dags from a PVC which has ``ReadOnlyMany`` or ``ReadWriteMany`` access mode. You will have to ensure that the PVC is populated/updated with the required Dags (this won't be handled by the chart). You pass in the name of the volume claim to the chart:

.. code-block:: bash

Expand All @@ -225,7 +225,7 @@ In this approach, Airflow will read the dags from a PVC which has ``ReadOnlyMany
--set dags.persistence.existingClaim=my-volume-claim \
--set dags.gitSync.enabled=false

Mounting dags from a private GitHub repo using Git-Sync sidecar
Mounting Dags from a private GitHub repo using Git-Sync sidecar
---------------------------------------------------------------
Create a private repo on GitHub if you have not created one already.

Expand Down Expand Up @@ -270,7 +270,7 @@ Finally, from the context of your Airflow Helm chart directory, you can install

helm upgrade --install airflow apache-airflow/airflow -f override-values.yaml

If you have done everything correctly, Git-Sync will pick up the changes you make to the dags
If you have done everything correctly, Git-Sync will pick up the changes you make to the Dags
in your private GitHub repo.

You should take this a step further and set ``dags.gitSync.knownHosts`` so you are not susceptible to man-in-the-middle
Expand Down
2 changes: 1 addition & 1 deletion chart/docs/production-guide.rst
Original file line number Diff line number Diff line change
Expand Up @@ -262,7 +262,7 @@ Typical scenarios where you would like to use your custom image:
See `Building the image <https://airflow.apache.org/docs/docker-stack/build.html>`_ for more
details on how you can extend and customize the Airflow image.

Managing DAG Files
Managing Dag Files
------------------

See :doc:`manage-dag-files`.
Expand Down
6 changes: 3 additions & 3 deletions chart/docs/quick-start.rst
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,7 @@ Install the chart
export RELEASE_NAME=example-release
helm install $RELEASE_NAME apache-airflow/airflow --namespace $NAMESPACE

Use the following code to install the chart with Example dags:
Use the following code to install the chart with Example Dags:

.. code-block:: bash

Expand Down Expand Up @@ -88,7 +88,7 @@ Extending Airflow Image
-----------------------

The Apache Airflow community, releases Docker Images which are ``reference images`` for Apache Airflow.
However, when you try it out you want to add your own dags, custom dependencies,
However, when you try it out you want to add your own Dags, custom dependencies,
packages, or even custom providers.

.. note::
Expand All @@ -100,7 +100,7 @@ packages, or even custom providers.

The best way to achieve it, is to build your own, custom image.

Adding dags to your image
Adding Dags to your image
.........................

1. Create a project
Expand Down
6 changes: 3 additions & 3 deletions chart/docs/using-additional-containers.rst
Original file line number Diff line number Diff line change
Expand Up @@ -22,9 +22,9 @@ Sidecar Containers
------------------

If you want to deploy your own sidecar container, you can add it through the ``extraContainers`` parameter.
You can define different containers for the scheduler, webserver, worker, triggerer, DAG processor, flower, create user Job and migrate database Job Pods.
You can define different containers for the scheduler, webserver, worker, triggerer, Dag processor, flower, create user Job and migrate database Job Pods.

For example, sidecars that sync dags from object storage.
For example, sidecars that sync Dags from object storage.

.. code-block:: yaml

Expand All @@ -49,7 +49,7 @@ Init Containers
---------------

You can also deploy extra init containers through the ``extraInitContainers`` parameter.
You can define different containers for the scheduler, webserver, worker, triggerer, DAG processor, create user Job and migrate database Job pods.
You can define different containers for the scheduler, webserver, worker, triggerer, Dag processor, create user Job and migrate database Job pods.

For example, an init container that just says hello:

Expand Down
2 changes: 1 addition & 1 deletion contributing-docs/03_contributors_quick_start.rst
Original file line number Diff line number Diff line change
Expand Up @@ -511,7 +511,7 @@ Using Breeze
1. Starting the Breeze environment using ``breeze start-airflow`` starts the Breeze environment with last configuration run(
In this case Python version and backend are picked up from last execution ``breeze --python 3.10 --backend postgres``)
It also automatically starts the API server (FastAPI api and UI), triggerer, dag processor and scheduler. It drops you in tmux with triggerer to the right, and
Scheduler, API server (FastAPI api and UI), DAG processor from left to right at the bottom. Use ``[Ctrl + B] and Arrow keys`` to navigate.
Scheduler, API server (FastAPI api and UI), Dag processor from left to right at the bottom. Use ``[Ctrl + B] and Arrow keys`` to navigate.

.. code-block:: bash

Expand Down
2 changes: 1 addition & 1 deletion contributing-docs/05_pull_requests.rst
Original file line number Diff line number Diff line change
Expand Up @@ -246,7 +246,7 @@ Airflow Operators might have some fields added to the list of ``template_fields`
set in the constructor (``__init__`` method) of the operator and usually their values should
come from the ``__init__`` method arguments. The reason for that is that the templated fields
are evaluated at the time of the operator execution and when you pass arguments to the operator
in the DAG, the fields that are set on the class just before the ``execute`` method is called
in the Dag, the fields that are set on the class just before the ``execute`` method is called
are processed through templating engine and the fields values are set to the result of applying the
templating engine to the fields (in case the field is a structure such as dict or list, the templating
engine is applied to all the values of the structure).
Expand Down
4 changes: 2 additions & 2 deletions contributing-docs/09_testing.rst
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@ includes:
rendered correctly for various configuration parameters.

* `System tests <testing/system_tests.rst>`__ are automatic tests that use external systems like
Google Cloud and AWS. These tests are intended for an end-to-end DAG execution.
Google Cloud and AWS. These tests are intended for an end-to-end Dag execution.

* `Task SDK integration tests <testing/task_sdk_integration_tests.rst>`__ are specialized tests that verify
the integration between the Apache Airflow Task SDK package and a running Airflow instance.
Expand All @@ -55,7 +55,7 @@ You can also run other kinds of tests when you are developing Airflow packages:
* `Python client tests <testing/python_client_tests.rst>`__ are tests we run to check if the Python API
client works correctly.

* `DAG testing <testing/dag_testing.rst>`__ is a document that describes how to test DAGs in a local environment
* `Dag testing <testing/dag_testing.rst>`__ is a document that describes how to test Dags in a local environment
with ``dag.test()``.

------
Expand Down
Loading