Python celery
Quoting authors of the project:
- Celery is "an asynchronous task queue/job queue based on distributed message passing. It is focused on real-time operation, but supports scheduling as well. (...) Tasks can execute asynchronously (in the background) or synchronously (wait until ready)."
Contents
Installation
Install the package python-celery. As with most python-based packages you get a package compatible with Python 3.x. If you need Python 2.x compatibility, install python2-celery instead.
Quoting Celery documentation: "Celery requires a solution to send and receive messages" - one of the options is rabbitmq which also can be installed from official repositories.
Configuration
Celery
For configuration files the directory /etc/celery/ needs to be created. An example configuration file is provided within Celery documentation.
Start/enable the celery@celery.service.
RabbitMQ
RabbitMQ stores its configuration within /etc/rabbitmq/rabbitmq-env.conf
The default configuration:
NODENAME=rabbit@rakieta NODE_IP_ADDRESS=0.0.0.0 NODE_PORT=5672 LOG_BASE=/var/log/rabbitmq MNESIA_BASE=/var/lib/rabbitmq/mnesia
You probably want to replace 0.0.0.0 with 127.0.0.1, RabbitMQ does not support Unix sockets. 
For simple configurations, you may also want to add HOME=/var/lib/rabbitmq. Read more about environmental variables within RabbitMQ docs
Start/enable the rabbitmq.service. 
rabbitmq-service is being started as rabbitmq user with home folder stored within /var/lib/rabbitmq - you may want to make sure rabbitmq user owns this folder and all subfoldersFollow RabbitMQ documentation and add your user and virtual host:
$ cd /var/lib/rabbitmq $ su rabbitmq -c 'rabbitmqctl add_user myuser mypassword' $ su rabbitmq -c 'rabbitmqctl add_vhost myvhost' $ su rabbitmq -c 'rabbitmqctl set_user_tags myuser mytag' $ su rabbitmq -c 'rabbitmqctl set_permissions -p myvhost myuser ".*" ".*" ".*"'
Read RabbitMQ admin guide to understand the above.
If issuing su rabbitmq -c "rabbitmqctl status" results in badrpc,nodedown visit this blog post for more information how to fix the problem.
su rabbitmq -c "erl" and as a result you should get an erlang prompt with no errorsSecurity
You may want to read a security section from relevant Celery documentation
Example task
Celery application
Follow Celery documentation to create a python sample task:
$ nano test.py
from celery import Celery
    
    app = Celery('tasks', backend='amqp', broker='amqp://myuser:mypassword@localhost:5672/myvhost')
    
    @app.task
    def add(x, y):
        return x + y
amqp://myuser:mypassword@localhost:5672/myvhost - use the same credentials/vhost you have created when configuring RabbitMQ
backend='amqp' - this parameter is optional since RabbitMQ is the default broker utilised by celery.
Test run
While in the same directory as your test.py you can run: 
$ celery -A task worker --loglevel=info
Then from another console (but within same directory) create:
$ nano call.py
from test import add
    
    add.delay(4, 4)
Run it:
$ python call.py
First, the console should log some information suggesting worker was called:
Received task: task.add[f4aff99a-7477-44db-9f6e-7e0f9342cd4e] Task task.add[f4aff99a-7477-44db-9f6e-7e0f9342cd4e] succeeded in 0.0007182330009527504s: 8
Prepare module for Celery service
Procedure below is slightly different than what you will find within Celery documentation
Create test_task module:
# mkdir /lib/python3.5/site-packages/test_task # touch /lib/python3.5/site-packages/test_task/__init__.py # touch /lib/python3.5/site-packages/test_task/test_task.py # touch /lib/python3.5/site-packages/test_task/celery.py
# nano /lib/python3.5/site-packages/test_task/celery.py
from __future__ import absolute_import
from celery import Celery
app = Celery('tasks', backend='amqp', broker='amqp://myuser:mypassword@localhost:5672/myvhost')
if __name__ == '__main__':
 app.start()
# nano /lib/python3.5/site-packages/test_task/test_task.py
from __future__ import absolute_import from test_task.celery import app @app.task def add(x, y): return x + y
At this point if you issue python in your console you should be able to issue following without any error:
>>> from test_task import celery
In /etc/celery/celery.conf replace:
CELERY_APP="proj"
with the following line:
CELERY_APP="test_task"
Restart the celery@celery.service.
Run tasks periodically
Tasks can be ran periodicaly through Celery Beat, basic setup is described within relevant Celery documentation pages. An example:
If you want to specify CELERYBEAT_SCHEDULE within your celery.py, then you need to add the app.conf prefix to make celery recognise your scheduled tasks. After that you need to add the --beat --schedule=/var/lib/celery/celerybeat-schedule parameters when you start the celery daemon. Further, the /var/lib/celery directory must exist within the celery-relevant environment and be owned by the user that runs celery.
Run Celery in chroot (experimental)
Installing celery in a chroot adds an additional layer of security. To achieve an advanced security level, the chroot should include only the files needed to run the Celery application and all files should have the most restrictive permissions possible. For example, as much as possible should be owned by root, directories such as /usr/bin should be unreadable and unwriteable, etc.
This section adapts Nginx#Installation in a chroot for creating the Celery chroot.
Create chroot directory and devices
Arch comes with an http user and group by default which we can use to run celery. The chroot will be in /srv/http/apps/celery.
# mkdir -p /srv/http/apps/celery # cd /srv/http/apps/celery
Celery needs /dev/null and /dev/urandom. Celery will not crash at startup if /dev/random is missing. To install these in the chroot create the /dev/ directory and add the devices with mknod. Avoid mounting all of /dev/ to ensure that, even if the chroot is compromised, an attacker must break out of the chroot to access important devices like /dev/sda1.
/srv/http/apps/celery is mounted without no-dev optionls -l /dev/{null,urandom} to better understand the mknod options.# mkdir /srv/http/apps/celery/dev # mknod -m 0666 /srv/http/apps/celery/dev/null c 1 3 # mknod -m 0666 /srv/http/apps/celery/dev/random c 1 8 # mknod -m 0444 /srv/http/apps/celery/dev/urandom c 1 9
Create necessary directories
The original idea was to use python-virtualenv to bring all necessary python deliverables. Unfortunately, virtualenv does only half of work for us and since we are going to run Celery in chroot we need to take care of all other dependencies. We use an environment created by virtualenv for further adjustments:
# virtualenv --always-copy /srv/http/apps/celery
Create required directories:
# cd /srv/http/apps/celery
# mkdir {usr,dev,etc,run,tmp,var,proc}
# mv {lib,bin,include} usr
# ln -s usr/lib lib
# ln -s usr/bin bin
# ln -s usr/lib lib64
# ln -s usr/include include
# cd usr/
lib64 and usr/lib64 to usr/lib: cd $JAIL; ln -s usr/lib lib64 and cd $JAIL/usr; ln -s lib lib64.# ln -s lib lib64
Celery requires /proc/loadavg - bind mount it together with /srv/http/apps/celery/tmp and /srv/http/apps/celery/run as tmpfs's. The size should be limited to ensure an attacker cannot eat all the RAM:
# touch /srv/http/apps/celery/proc/loadavg # mount --bind /proc/loadavg /srv/http/apps/celery/proc/loadavg # mount -t tmpfs none /srv/http/apps/celery/run -o 'noexec,size=1M' # mount -t tmpfs none /srv/http/apps/celery/tmp -o 'noexec,size=100M'
In order to preserve the mounts across reboots, the following entries should be added to /etc/fstab:
/etc/fstab
tmpfs /srv/http/apps/celery/run tmpfs rw,noexec,relatime,size=1024k 0 0 tmpfs /srv/http/apps/celery/tmp tmpfs rw,noexec,relatime,size=102400k 0 0 /proc/loadavg /srv/http/apps/celery/proc/loadavg none bind
Create a log folder for celery:
# mkdir -p /srv/http/apps/celery/var/log/celery # chown http:http /srv/http/apps/celery/var/log/celery
Populate chroot
Copy python dependencies:
# cp $(ldd /usr/bin/python | grep /usr/lib | sed -sre 's/(.+)(\/usr\/lib\/\S+).+/\2/g') /srv/http/apps/celery/lib
linux-vdso.so: it is not a real library and does not exist in /usr/lib. Also ld-linux-x86-64.so will likely be listed in /lib64 for a 64 bit system.We are running from chroot, hence normal virtualenv behavior will not work and we need to accommodate for that by copying complete python lib folder (without site-packages):
# mv /srv/http/apps/celery/lib/python3.5/site-packages /tmp # rm -r /srv/http/apps/celery/lib/python3.5/* # mv /tmp/site-packages /srv/http/apps/celery/lib/python3.5/ # # cp -r -p /usr/lib/python3.5 /tmp # rm -r /tmp/python3.5/site-packages # mv /tmp/python3.5/* /srv/http/apps/celery/lib/python3.5/
Install celery:
# source /srv/http/apps/celery/bin/activate # pip install celery
Celery requires libssl:
# cp $(ldd /usr/lib/libssl.so.1.0.0 | grep /usr/lib | sed -sre 's/(.+)(\/usr\/lib\/\S+).+/\2/g') /srv/http/apps/celery/usr/lib # cp /usr/lib/libssl.so* /srv/http/apps/celery/usr/lib
Celery requires libgcc_s if you want to use multithreading:
# cp /usr/lib/libgcc_s* /srv/http/apps/celery/usr/lib
# cp $(ldd /usr/lib/libssl.so.1.0.0 | grep /usr/lib | sed -sre 's/(.+)(\/usr\/lib\/\S+).+/\2/g') /srv/http/apps/celery/usr/lib # cp /usr/lib/libssl.so* /srv/http/apps/celery/usr/lib
Celery also requires /bin/getent, which in turn requires libnss_files:
# cp /bin/getent /srv/http/apps/celery/bin # cp /lib/libnss_files* /srv/http/apps/celery/lib
/bin/env is required to register $HOME after chrooting but before running celery:
# cp /bin/env /srv/http/apps/celery/bin
Copy your task module (in this example the task module is called test_task and is stored within /lib/python3.5/site-packages/test_task):
# cp -r /lib/python3.5/site-packages/test_task lib/python3.5/site-packages
Copy over some miscellaneous but necessary libraries and system files.
# cp -rfvL /etc/{services,localtime,nsswitch.conf,nscd.conf,protocols,hosts,ld.so.cache,ld.so.conf,resolv.conf,host.conf} /srv/http/apps/celery/etc
Create restricted user/group files for the chroot. This way only the users needed for the chroot to function exist in it and none of the system users/groups are leaked to attackers, should they gain access to the chroot.
/srv/http/apps/celery/etc/group
http:x:33: nobody:x:99:
/srv/http/apps/celery/etc/passwd
http:x:33:33:http:/:/bin/false nobody:x:99:99:nobody:/:/bin/false
/srv/http/apps/celery/etc/shadow
http:x:14871:::::: nobody:x:14871::::::
/srv/http/apps/celery/etc/gshadow
http::: nobody:::
Your package within chroot
Now you can copy your package into chroot.
Assuming your project lives in your_project directory and its structure looks like following:
. |-setup.py |-CHANGES.txt |-MANIFEST.in |-README.txt |-package_name |-__init__.py |-celery.py |-task_1.py |-task_2.py |-(...)
Install your package in development mode:
# cd /srv/http/apps/celery/your_project # source ../bin/activate # pip install -e .
All paths are relevant to your main root, you need to:
$ nano /srv/http/apps/celery/lib/python3.5/site-packages/package_name.egg-link
/your_project .
$ nano /srv/http/apps/celery/lib/python3.5/site-packages/easy-install.pth
/your_project (...)
python setup.py develop and in such case you would have to update /srv/http/apps/celery/lib/python3.5/site-packages/setuptools.pth
$ nano /srv/http/apps/celery/lib/python3.5/site-packages/setuptools.pth
/usr/lib/python3.5/site-packages
Test chroot
Run following to confirm chroot is correctly set up:
# /usr/bin/chroot --userspec=root:root /srv/http/apps/celery env -i HOME=/ /usr/bin/python -m celery worker -c 10 -A package_name --uid=33 --gid=33 --pidfile=/run/celery.pid --logfile=/var/log/celery/celery.log --loglevel="INFO"
Celery will be started by root but then will drop to http user.
- RabbitMQ must be running before you perform above test.
- Broker credentials within your project must match what was set within RabbitMQ.
- You may wish to ensure your firewall is configured to allow celery/rabbitmq traffic.
systemd chroot unit
Prepare a systemd unit:
# nano /etc/systemd/system/celery.service
[Unit] Description=Celery Nodes Daemon After=network.target [Service] Type=oneshot ExecStart=/usr/bin/chroot --userspec=root:root /srv/http/apps/celery /usr/bin/env -i HOME=/ /usr/bin/python -m celery multi start 2 -A package_name --uid=33 --gid=33 --pidfile:1=/run/celery1.pid --pidfile:2=/run/celery2.pid --logfile=/var/log/celery/celery.log ExecStop=/usr/bin/chroot --userspec=root:root /srv/http/apps/celery /usr/bin/python -m celery multi stopwait 2 --uid=33 --gid=33 --pidfile:1=/run/celery1.pid --pidfile:2=/run/celery2.pid --logfile=/var/log/celery/celery.log --loglevel="INFO" ExecReload=/usr/bin/chroot --userspec=root:root /srv/http/apps/celery /usr/bin/python -m celery multi restart 2 -A package_name --uid=33 --gid=33 --pidfile:1=/run/celery1.pid --pidfile:2=/run/celery2.pid --logfile=/var/log/celery/celery.log KillMode=control-group RemainAfterExit=yes [Install] WantedBy=multi-user.target
Start celery.service.
Troubleshooting
If the #systemd chroot unit does not report issues but the celery service is not running, you can start the chrooted celery from console and add a more detailed log level. For example:
# /usr/bin/chroot --userspec=root:root /srv/http/apps/celery /usr/bin/env -i HOME=/ /usr/bin/python -m celery worker -A package_name --uid=33 --gid=33 --pidfile=/run/celery.pid --logfile=/var/log/celery/celery.log --loglevel="INFO"