LoginSignup
4
0

More than 5 years have passed since last update.

Airflow を単体の docker container で立ち上げる

Posted at

目的

  • Airflow の最小構成を構築することにより airflow.cfg などの設定方法の理解を深める
  • puckel/docker-airflow を参考にする

結論

diff --git a/script/entrypoint.sh b/script/entrypoint.sh
index fb3f9ad..62d4198 100755
--- a/script/entrypoint.sh
+++ b/script/entrypoint.sh
@@ -70,8 +70,8 @@ fi
 case "$1" in
   webserver)
     airflow initdb
-    if [ "$AIRFLOW__CORE__EXECUTOR" = "LocalExecutor" ]; then
-      # With the "Local" executor it should all run in one container.
+    if [ "$AIRFLOW__CORE__EXECUTOR" != "CeleryExecutor" ]; then
+      # With the "Sequential" or "Local" executor it should all run in one container.
       airflow scheduler &
     fi
     exec airflow webserver

※ PR 送りましたが author はもうメンテしてないようで merge される可能性は低いです。。
  あと、数時間前に別の人が同じような PR してたので確認不足でした。。
  https://github.com/puckel/docker-airflow/pull/316

以下のコマンドで build して起動

docker build --rm -t puckel/docker-airflow .
#  -e LOAD_EX=y で example_dags を有効にする
docker run -d -p 8080:8080 -e LOAD_EX=y puckel/docker-airflow webserver

airflow.cfg について

airflow/config_templates にある default_airflow.cfg などを参考にするとよい
また puckel/docker-airflow や、Google Cloud Composer などの設定と見比べるとより理解が深まると思われる

SequentialExecutor と LocalExecutor

puckel/docker-airflow でも docker container 単体で動作させる際の Executor としては SequentialExecutor が選択されています

By default, docker-airflow runs Airflow with SequentialExecutor :

docker run -d -p 8080:8080 puckel/docker-airflow webserver

しかし前述の通り patch を当てないとこれは動作しません
今まではここで疑問を持たずに docker-compose-LocalExecutor.yml を利用してきました
おそらく SequentialEecutor にも airflow scheduler は必要なのだと思われます

airflow/executors/local_executor.py には以下のようなコメントがあります

LocalExecutor runs tasks by spawning processes in a controlled fashion in different
modes. Given that BaseExecutor has the option to receive a parallelism parameter to
limit the number of process spawned, when this parameter is 0 the number of processes
that LocalExecutor can spawn is unlimited.
The following strategies are implemented:
1. Unlimited Parallelism (self.parallelism == 0): In this strategy, LocalExecutor will
spawn a process every time execute_async is called, that is, every task submitted to the
LocalExecutor will be executed in its own process. Once the task is executed and the
result stored in the result_queue, the process terminates. There is no need for a
task_queue in this approach, since as soon as a task is received a new process will be
allocated to the task. Processes used in this strategy are of class LocalWorker.
2. Limited Parallelism (self.parallelism > 0): In this strategy, the LocalExecutor spawns
the number of processes equal to the value of self.parallelism at start time,
using a task_queue to coordinate the ingestion of tasks and the work distribution among
the workers, which will take a task as soon as they are ready. During the lifecycle of
the LocalExecutor, the worker processes are running waiting for tasks, once the
LocalExecutor receives the call to shutdown the executor a poison token is sent to the
workers to terminate them. Processes used in this strategy are of class QueuedLocalWorker.
Arguably, SequentialExecutor could be thought as a LocalExecutor with limited
parallelism of just 1 worker, i.e. self.parallelism = 1.
This option could lead to the unification of the executor implementations, running
locally, into just one LocalExecutor with multiple modes.

SequentialExecutor は parallelism を 1 に限定した LocalExecutor と考えることもできます
docker-compose-LocalExecutor.yml も postgresql と webserver しか service はありません
ちなみに SequentialExecutor 以外では sqlite が利用できないことから postgresql が利用されています

SequentialExecutor 側にもこんなコメントがあります

This executor will only run one task instance at a time, can be used
for debugging. It is also the only executor that can be used with sqlite
since sqlite doesn't support multiple connections.
Since we want airflow to work out of the box, it defaults to this
SequentialExecutor alongside sqlite as you first install it.

デバッグ用、初期インストール時のデフォルト

4
0
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
4
0