Listen to PostgreSQL events with pg_notify and Python

With PostgreSQL we can easily publish and listen events from one connection to another. It’s cool because those notifications belong on a transaction. In this example I’m going to create a wrapper to help me to listen the events with Python.

To notify events I only need to use pg_notify function. For example:

select pg_notify('channel', 'xxx')

To listen the events

import psycopg2

from pg_listener import on_db_event

dsn = f"dbname='gonzalo123' user='username' host='localhost' password='password'"

conn = psycopg2.connect(dsn)
conn.set_isolation_level(psycopg2.extensions.ISOLATION_LEVEL_AUTOCOMMIT)

for payload in on_db_event(conn, 'channel'):
    print(payload)

The magic resides in on_db_event. We need to pass a psycopg2 connection, and the channel name. We can iterate over the function and retrieve the payload when someone triggers the event on that channel

def on_db_event(conn: connection, channel: str):
    with conn:
        with conn.cursor() as cursor:
            cursor.execute(f"LISTEN {channel};")
            logger.info(f"Waiting for notifications on channel '{channel}'.")

            while True:
                if select.select([conn], [], [], 5) != ([], [], []):
                    conn.poll()
                    while conn.notifies:
                        notify = conn.notifies.pop(0)
                        yield notify.payload

As I often use Django and Django uses one connection wrapper I need to create a native psycopg2 connection. Maybe it’s possible to retrieve it from Django connection (show me if you know how to do it).

def conn_from_django(django_connection):
    db_settings = django_connection.settings_dict
    dsn = f"dbname='{db_settings['NAME']}' " \
          f"user='{db_settings['USER']}' " \
          f"host='{db_settings['HOST']}' " \
          f"password='{db_settings['PASSWORD']}'"

    conn = psycopg2.connect(dsn)
    conn.set_isolation_level(ISOLATION_LEVEL_AUTOCOMMIT)

    return conn

You can install the library using pip

pip install pglistener-gonzalo123

Source code available in my github

Running Python/Django docker containers with non-root user

Running linux processes as root is not a good idea. One problem or exploit with the process can give to the attacker a root shell. When we run one docker container, especially if this container is in production it shouldn’t be run as root.

To do that we only need to generate a Dockerfile to properly run our Python/Django application as non-root. That’s my boilerplate:

FROM python:3.8 AS base

ENV APP_HOME=/src
ENV APP_USER=appuser

RUN groupadd -r $APP_USER && \
    useradd -r -g $APP_USER -d $APP_HOME -s /sbin/nologin -c "Docker image user" $APP_USER

WORKDIR $APP_HOME

ENV TZ 'Europe/Madrid'
RUN echo $TZ > /etc/timezone && apt-get update && \
    apt-get install -y tzdata && \
    rm /etc/localtime && \
    ln -snf /usr/share/zoneinfo/$TZ /etc/localtime && \
    dpkg-reconfigure -f noninteractive tzdata && \
    apt-get clean

RUN pip install --upgrade pip

FROM base

ENV PYTHONDONTWRITEBYTECODE 1
ENV PYTHONUNBUFFERED 1

COPY requirements.txt .
RUN pip install -r requirements.txt

ADD src .

RUN chown -R $APP_USER:$APP_USER $APP_HOME
USER $APP_USER

Also, in a Django application, we normally use a nginx as a reverse proxy. Nginx normally runs as root at it launches its process as non-root, but we also can run nginx as non-root.

FROM nginx:1.17.4

RUN rm /etc/nginx/conf.d/default.conf
RUN rm /etc/nginx/nginx.conf
COPY nginx.conf /etc/nginx/conf.d
COPY etc/nginx.conf /etc/nginx/nginx.conf

RUN chown -R nginx:nginx /var/cache/nginx && \
    chown -R nginx:nginx /var/log/nginx && \
    chown -R nginx:nginx /etc/nginx/conf.d && \
    chmod -R 766 /var/log/nginx/

RUN touch /var/run/nginx.pid && \
    chown -R nginx:nginx /var/run/nginx.pid && \
    chown -R nginx:nginx /var/cache/nginx

USER nginx

And that’s all. Source code of a boilerplate application in my github

Django logs to ELK using Filebeat

I’ve written a post about how to send Django logs to ELK stack. You can read it here. In that post I’ve used logstash client with a sidecar docker container. Logstash client works but it needs too much resources. Nowadays it’s better to use Filebeat as data shipper instead of Logstash client. Filebeat it’s also a part of ELK stack. It’s a golang binary much lightweight than logstash client.

The idea is almost the same than the other post. Here we’ll also build a sidecar container with our django application logs mounted.

version: '3'
services:
  # Application
  api:
    image: elk:latest
    command: /bin/bash ./docker-entrypoint-wsgi.sh
    build:
      context: .
      dockerfile: Dockerfile
    environment:
      DEBUG: 'True'
    volumes:
      - logs_volume:/src/logs
      - static_volume:/src/staticfiles
  nginx:
    image: elk-nginx:latest
    build:
      context: .docker/nginx
      dockerfile: Dockerfile
    volumes:
      - static_volume:/src/staticfiles
    ports:
      - 8000:8000
    depends_on:
      - api
  filebeat:
    image: filebeat:latest
    build:
      context: .docker/filebeat
      dockerfile: Dockerfile
    volumes:
      - logs_volume:/app/logs
    command: filebeat -c /etc/filebeat/filebeat.yml -e -d "*" -strict.perms=false
    depends_on:
      - api

  ...

With filebeat we can perform actions to prepare our logs to be ready to be stored within elasticsearch. But, at least here, it’s much more easy to prepare the logs in the django application:

class CustomisedJSONFormatter(json_log_formatter.JSONFormatter):
    def json_record(self, message: str, extra: dict, record: logging.LogRecord):
        context = extra
        django = {
            'app': settings.APP_ID,
            'name': record.name,
            'filename': record.filename,
            'funcName': record.funcName,
            'msecs': record.msecs,
        }
        if record.exc_info:
            django['exc_info'] = self.formatException(record.exc_info)

        return {
            'message': message,
            'timestamp': now(),
            'level': record.levelname,
            'context': context,
            'django': django
        }

And in settings.py we use our CustomisedJSONFormatter

LOGGING = {
    'version': 1,
    'disable_existing_loggers': False,
    'formatters': {
        'simple': {
            'format': '[%(asctime)s] %(levelname)s|%(name)s|%(message)s',
            'datefmt': '%Y-%m-%d %H:%M:%S',
        },
        "json": {
            '()': CustomisedJSONFormatter,
        },
    },
    'handlers': {
        'applogfile': {
            'level': 'DEBUG',
            'class': 'logging.handlers.RotatingFileHandler',
            'filename': Path(BASE_DIR).resolve().joinpath('logs', 'app.log'),
            'maxBytes': 1024 * 1024 * 15,  # 15MB
            'backupCount': 10,
            'formatter': 'json',
        },
        'console': {
            'level': 'DEBUG',
            'class': 'logging.StreamHandler',
            'formatter': 'simple'
        }
    },
    'root': {
        'handlers': ['applogfile', 'console'],
        'level': 'DEBUG',
    }
}

And that’s all. Our Application logs centralized in ELK and ready to consume with Kibana

Source code available here

Don’t repeat password validator in Django

Django has a set of default password validators by default in django.contrib.auth.password_validator

AUTH_PASSWORD_VALIDATORS = [
    {
        'NAME': 'django.contrib.auth.password_validation.UserAttributeSimilarityValidator',
    },
    {
        'NAME': 'django.contrib.auth.password_validation.MinimumLengthValidator',
    },
    {
        'NAME': 'django.contrib.auth.password_validation.CommonPasswordValidator',
    },
    {
        'NAME': 'django.contrib.auth.password_validation.NumericPasswordValidator',
    },
]

Normally those validators aren’t enough (at least for me) but it’s very easy to create customs validators. There’re also several validators that we can use, for example those ones.

I normally need to avoid users to repeat passwords (for example the last ten ones). To do that we need to create a custom validator. Whit this validator we also need to create a model to store the last passwords (the hash). The idea is to persits the hash of the password each time the user changes the password. As well as is always a good practice not to use the default User model, we’re going to create a CustomUser model

class CustomUser(AbstractUser):
    def __init__(self, *args, **kwargs):
        super(CustomUser, self).__init__(*args, **kwargs)
        self.original_password = self.password

    def save(self, *args, **kwargs):
        super(CustomUser, self).save(*args, **kwargs)
        if self._password_has_been_changed():
            CustomUserPasswordHistory.remember_password(self)

    def _password_has_been_changed(self):
        return self.original_password != self.password

And now we can create our CustomUserPasswordHistory implementing the remember_password method.

class CustomUserPasswordHistory(models.Model):
    username = models.ForeignKey(CustomUser, on_delete=models.CASCADE)
    old_pass = models.CharField(max_length=128)
    pass_date = models.DateTimeField()

    @classmethod
    def remember_password(cls, user):
        cls(username=user, old_pass=user.password, pass_date=localtime()).save()

Now the validator:

class DontRepeatValidator:
    def __init__(self, history=10):
        self.history = history

    def validate(self, password, user=None):
        for last_pass in self._get_last_passwords(user):
            if check_password(password=password, encoded=last_pass):
                self._raise_validation_error()

    def get_help_text(self):
        return _("You cannot repeat passwords")

    def _raise_validation_error(self):
        raise ValidationError(
            _("This password has been used before."),
            code='password_has_been_used',
            params={'history': self.history},
        )

    def _get_last_passwords(self, user):
        all_history_user_passwords = CustomUserPasswordHistory.objects.filter(username_id=user).order_by('id')

        to_index = all_history_user_passwords.count() - self.history
        to_index = to_index if to_index > 0 else None
        if to_index:
            [u.delete() for u in all_history_user_passwords[0:to_index]]

        return [p.old_pass for p in all_history_user_passwords[to_index:]]

We can see how it works with the unit tests:

class UserCreationTestCase(TestCase):
    def setUp(self):
        self.user = User.objects.create(username='gonzalo')

    def test_persist_password_to_history(self):
        self.user.set_password('pass1')
        self.user.save()

        all_history_user_passwords = CustomUserPasswordHistory.objects.filter(username_id=self.user)
        self.assertEqual(1, all_history_user_passwords.count())


class DontRepeatValidatorTestCase(TestCase):
    def setUp(self):
        self.user = User.objects.create(username='gonzalo')
        self.validator = DontRepeatValidator()

    def test_validator_with_new_pass(self):
        self.validator.validate('pass33', self.user)
        self.assertTrue(True)

    def test_validator_with_repeated_pass(self):
        for i in range(0, 11):
            self.user.set_password(f'pass{i}')
            self.user.save()

        with self.assertRaises(ValidationError):
            self.validator.validate('pass3', self.user)

    def test_keep_only_10_passwords(self):
        for i in range(0, 11):

            self.user.set_password(f'pass{i}')
            self.user.save()

        self.validator.validate('xxxx', self.user)

        all_history_user_passwords = CustomUserPasswordHistory.objects.filter(username_id=self.user)
        self.assertEqual(10, all_history_user_passwords.count())

Full source code in my github

Monitoring Django applications with Grafana and Kibana using Prometheus and Elasticsearch

When we’ve one application we need to monitor the logs in one way or another. Not only the server’s logs (500 errors, response times and things like that). Sometimes the user complains about the application. Without logs we cannot do anything. We can save logs within files and let grep and tail do the magic. This’s assumable with a single on-premise server, but nowadays with clouds and docker this’s a nightmare. We need a central log collector to collect all the logs of the application and use this collector to create alerts, and complex searches of our application logs.

I normally work with AWS. In AWS we’ve CloudWatch. It’s pretty straightforward to connect our application logs to CloudWatch when we’re using AWS. When we aren’t using AWS we can use the ELK stack. In this example we’re going to send our Django application logs to a Elasticsearch database. Let’s start:

The idea is not to send the logs directly. The idea save the logs to log files. We can use this LOGGING configuration to do that:

LOGGING = {
    'version': 1,
    'disable_existing_loggers': False,
    'formatters': {
        "json": {
            '()': CustomisedJSONFormatter,
        },
    },
    'handlers': {
        'console': {
            'class': 'logging.StreamHandler',
        },
        'app_log_file': {
            'level': LOG_LEVEL,
            'class': 'logging.handlers.RotatingFileHandler',
            'filename': os.path.join(LOG_PATH, 'app.log.json'),
            'maxBytes': 1024 * 1024 * 15,  # 15MB
            'backupCount': 10,
            'formatter': 'json',
        },
    },
    'root': {
        'handlers': ['console', 'app_log_file'],
        'level': LOG_LEVEL,
    },
}

Here I’m using a custom JSON formatter:

import json_log_formatter
import logging
from django.utils.timezone import now


class CustomisedJSONFormatter(json_log_formatter.JSONFormatter):
    def json_record(self, message: str, extra: dict, record: logging.LogRecord):
        extra['name'] = record.name
        extra['filename'] = record.filename
        extra['funcName'] = record.funcName
        extra['msecs'] = record.msecs
        if record.exc_info:
            extra['exc_info'] = self.formatException(record.exc_info)

        return {
            'message': message,
            'timestamp': now(),
            'level': record.levelname,
            'context': extra
        }

With this configuration our logs are going to be something like that:

{"message": "Hello from log", "timestamp": "2020-04-26T19:35:59.427098+00:00", "level": "INFO", "app_id": "Logs", "context": {"random": 68, "name": "app.views", "filename": "views.py", "funcName": "index", "msecs": 426.8479347229004}}

Now we’re going to use logstash as data shipper to send the logs to elastic search. We need to create a pipeline:

input {
    file {
        path => "/logs/*"
        start_position => "beginning"
        codec => "json"
    }
}

output {
  elasticsearch {
        index => "app_logs"
        hosts => ["elasticsearch:9200"]
    }
}

We’re going to use Docker to build our stack, so our logstash and our django containers will share the logs volumes.

Now we need to visualize the logs. Kibana is perfect for this task. We can set up a Kibana server connected to the Elasticsearch and visualize the logs:

Also we can monitor our server performance. Prometheus is the de facto standard for doing that. In fact it’s very simple to connect our Django application to Prometheus. We only need to add django-prometheus dependency, install the application and set up two middlewares:

INSTALLED_APPS = [
   ...
   'django_prometheus',
   ...
]

MIDDLEWARE = [
    'django_prometheus.middleware.PrometheusBeforeMiddleware', # <-- this one
    'app.middleware.RequestLogMiddleware',
    'django.middleware.security.SecurityMiddleware',
    'django.contrib.sessions.middleware.SessionMiddleware',
    'django.middleware.common.CommonMiddleware',
    'django.middleware.csrf.CsrfViewMiddleware',
    'django.contrib.auth.middleware.AuthenticationMiddleware',
    'django.contrib.messages.middleware.MessageMiddleware',
    'django.middleware.clickjacking.XFrameOptionsMiddleware',
    'django_prometheus.middleware.PrometheusAfterMiddleware', # <-- this one
]

also we need to set up some application routes

from django.contrib import admin
from django.urls import path, include

urlpatterns = [
    path('admin/', admin.site.urls),
    path('p/', include('django_prometheus.urls')), # <-- prometheus routes
    path('', include('app.urls'))
]

The easiest way to visualize the data stored in prometheus is using Grafana. In Grafana we need to create a datasource with Prometheus and build our custom dashboard. We can import pre-built dashboards. For example this one: https://grafana.com/grafana/dashboards/9528

Here the docker-compose file with all the project:

version: '3'
services:
  web:
    image: web:latest
    restart: always
    command: /bin/bash ./docker-entrypoint.sh
    volumes:
      - static_volume:/src/staticfiles
      - logs_volume:/src/logs
    environment:
      DEBUG: 'False'
      LOG_LEVEL: DEBUG

  nginx:
    image: nginx:latest
    restart: always
    volumes:
      - static_volume:/src/staticfiles
    ports:
      - 80:80
    depends_on:
      - web
      - grafana

  prometheus:
    image: prometheus:latest
    restart: always
    build:
      context: .docker/prometheus
      dockerfile: Dockerfile

  grafana:
    image: grafana:latest
    restart: always
    depends_on:
      - prometheus
    environment:
      - GF_SECURITY_ADMIN_USER=${GF_SECURITY_ADMIN_USER}
      - GF_SECURITY_ADMIN_PASSWORD=${GF_SECURITY_ADMIN_PASSWORD}
      - GF_USERS_DEFAULT_THEME=${GF_USERS_DEFAULT_THEME}
      - GF_USERS_ALLOW_SIGN_UP=${GF_USERS_ALLOW_SIGN_UP}
      - GF_USERS_ALLOW_ORG_CREATE=${GF_USERS_ALLOW_ORG_CREATE}
      - GF_AUTH_ANONYMOUS_ENABLED=${GF_AUTH_ANONYMOUS_ENABLED}

  logstash:
    image: logstash:latest
    restart: always
    depends_on:
      - elasticsearch
    volumes:
      - logs_volume:/logs:ro

  elasticsearch:
    image: docker.elastic.co/elasticsearch/elasticsearch:7.5.2
    restart: always
    environment:
      - discovery.type=single-node
      - http.host=0.0.0.0
      - xpack.security.enabled=false
      - ES_JAVA_OPTS=-Xms750m -Xmx750m
    volumes:
      - elasticsearch_volume:/usr/share/elasticsearch/data

  kibana:
    image: kibana:latest
    restart: always
    ports:
      - 5601:5601
    depends_on:
      - elasticsearch
volumes:
  elasticsearch_volume:
  static_volume:
  logs_volume:
  grafana_data:

And that’s all. Our Django application up and running fully monitored.

Source code in my github

Deploying Django application to AWS EC2 instance with Docker

In AWS we have several ways to deploy Django (and not Django applications) with Docker. We can use ECS or EKS clusters. If we don’t have one ECS or Kubernetes cluster up and running, maybe it can be complex. Today I want to show how deploy a Django application in production mode within a EC2 host. Let’s start.

I’m getting older to provision one host by hand I prefer to use docker. The idea is create one EC2 instance (one simple Amazon Linux AMI AWS-supported image). This host don’t have docker installed. We need to install it. When we launch one instance, when we’re configuring the instance, we can specify user data to configure an instance or run a configuration script during launch.

We only need to put this shell script to set up docker

#! /bin/bash
yum update -y
yum install -y docker
usermod -a -G docker ec2-user
curl -L https://github.com/docker/compose/releases/download/1.25.5/docker-compose-`uname -s`-`uname -m` | sudo tee /usr/local/bin/docker-compose > /dev/null
chmod +x /usr/local/bin/docker-compose
service docker start
chkconfig docker on

rm /etc/localtime
ln -s /usr/share/zoneinfo/Europe/Madrid /etc/localtime

ln -s /usr/local/bin/docker-compose /usr/bin/docker-compose

docker swarm init

We also need to attach one IAM role to our instance. This IAM role only need to allow us the following policies:

  • AmazonEC2ContainerRegistryReadOnly (because we’re going to use AWS ECR as container registry)
  • CloudWatchAgentServerPolicy (because we’re going to emit our logs to Cloudwatch)

Also we need to set up a security group to allow incoming SSH connections to port 22 and HTTP connections (in our example to port 8000)

When we launch our instance we need to provide a keypair to connect via ssh. I like to put this keypair in my .ssh/config

Host xxx.eu-central-1.compute.amazonaws.com
    User ec2-user
    Identityfile ~/.ssh/keypair-xxx.pem

To deploy our application we need to follow those steps:

  • Build our docker images
  • Push our images to a container registry (in this case ECR)
  • Deploy the application.

I’ve created a simple shell script called deploy.sh to perform all tasks:

#!/usr/bin/env bash

set -a
[ -f deploy.env ] && . deploy.env
set +a

echo "$(tput setaf 1)Building docker images ...$(tput sgr0)"
docker build -t ec2-web -t ec2-web:latest -t $ECR/ec2-web:latest .
docker build -t ec2-nginx -t $ECR/ec2-nginx:latest .docker/nginx

echo "$(tput setaf 1)Pusing to ECR ...$(tput sgr0)"
aws ecr get-login-password --region $REGION --profile $PROFILE |
  docker login --username AWS --password-stdin $ECR
docker push $ECR/ec2-web:latest
docker push $ECR/ec2-nginx:latest

CMD="docker stack deploy -c $DOCKER_COMPOSE_YML ec2 --with-registry-auth"
echo "$(tput setaf 1)Deploying to EC2 ($CMD)...$(tput sgr0)"
echo "$CMD"

DOCKER_HOST="ssh://$HOST" $CMD
echo "$(tput setaf 1)Building finished $(date +'%Y%m%d.%H%M%S')$(tput sgr0)"

This script assumes that there’s a deploy.env file with our personal configuration (AWS profile, the host of the EC2, instance, The ECR and things like that)

PROFILE=xxxxxxx

DOKER_COMPOSE_YML=docker-compose.yml
HOST=ec2-user@xxxx.eu-central-1.compute.amazonaws.com

ECR=9999999999.dkr.ecr.eu-central-1.amazonaws.com
REGION=eu-central-1

In this example I’m using docker swarm to deploy the application. I want to play also with secrets. This dummy application don’t have any sensitive information but I’ve created one "ec2.supersecret" variable

echo "super secret text" | docker secret create ec2.supersecret -

That’s the docker-compose.yml file:

version: '3.8'
services:
  web:
    image: 999999999.dkr.ecr.eu-central-1.amazonaws.com/ec2-web:latest
    command: /bin/bash ./docker-entrypoint.sh
    environment:
      DEBUG: 'False'
    secrets:
      - ec2.supersecret
    deploy:
      replicas: 1
    logging:
      driver: awslogs
      options:
        awslogs-group: /projects/ec2
        awslogs-region: eu-central-1
        awslogs-stream: app
    volumes:
      - static_volume:/src/staticfiles
  nginx:
    image: 99999999.dkr.ecr.eu-central-1.amazonaws.com/ec2-nginx:latest
    deploy:
      replicas: 1
    logging:
      driver: awslogs
      options:
        awslogs-group: /projects/ec2
        awslogs-region: eu-central-1
        awslogs-stream: nginx
    volumes:
      - static_volume:/src/staticfiles:ro
    ports:
      - 8000:80
    depends_on:
      - web
volumes:
  static_volume:

secrets:
  ec2.supersecret:
    external: true

And that’s all. Maybe ECS or EKS are better solutions to deploy docker applications in AWS, but we also can deploy easily to one docker host in a EC2 instance that it can be ready within a couple of minutes.

Source code in my github

Django reactive users with Celery and Channels

Today I want to build a prototype. The idea is to create two Django applications. One application will be the master and the other one will the client. Both applications will have their User model but each change within master User model will be propagated through the client (or clients). Let me show you what I’ve got in my mind:

We’re going to create one signal in User model (at Master) to detect user modifications:

  • If certain fields have been changed (for example we’re going to ignore last_login, password and things like that) we’re going to emit a event
  • I normally work with AWS, so the event will be a SNS event.
  • The idea to have multiple clients, so each client will be listening to one SQS queue. Those SQSs queues will be mapped to the SNS event.
  • To decouple the SNS sending og the message we’re going to send it via Celery worker.
  • The second application (the Client) will have one listener to the SQS queue.
  • Each time the listener have a message it will persists the user information within the client’s User model
  • And also it will emit on message to one Django Channel’s consumer to be sent via websockets to the browser.

The Master

We’re going to emit the event each time the User model changes (and also when we create or delete one user). To detect changes we’re going to register on signal in the pre_save to mark if the model has been changed and later in the post_save we’re going to emit the event via Celery worker.

@receiver(pre_save, sender=User)
def pre_user_modified(sender, instance, **kwargs):
    instance.is_modified = None

    if instance.is_staff is False and instance.id is not None:
        modified_user_data = UserSerializer(instance).data
        user = User.objects.get(username=modified_user_data['username'])
        user_serializer_data = UserSerializer(user).data

        if user_serializer_data != modified_user_data:
            instance.is_modified = True

@receiver(post_save, sender=User)
def post_user_modified(sender, instance, created, **kwargs):
    if instance.is_staff is False:
        if created or instance.is_modified:
            modified_user_data = UserSerializer(instance).data
            user_changed_event.delay(modified_user_data, action=Actions.INSERT if created else Actions.UPDATE)

@receiver(post_delete, sender=User)
def post_user_deleted(sender, instance, **kwargs):
    deleted_user_data = UserSerializer(instance).data
    user_changed_event.delay(deleted_user_data, action=Actions.DELETE)

We need to register our signals in apps.py

from django.apps import AppConfig

class MasterConfig(AppConfig):
    name = 'master'

    def ready(self):
        from master.signals import pre_user_modified
        from master.signals import post_user_modified
        from master.signals import post_user_deleted

Our Celery task will send the message to sns queue

@shared_task()
def user_changed_event(body, action):
    sns = boto3.client('sns')
    message = {
        "user": body,
        "action": action
    }
    response = sns.publish(
        TargetArn=settings.SNS_REACTIVE_TABLE_ARN,
        Message=json.dumps({'default': json.dumps(message)}),
        MessageStructure='json'
    )
    logger.info(response)

AWS

In Aws We need to create one SNS messaging service and one SQS queue linked to this SNS.

The Client

First we need one command to run the listener.

class Actions:
    INSERT = 0
    UPDATE = 1
    DELETE = 2

switch_actions = {
    Actions.INSERT: insert_user,
    Actions.UPDATE: update_user,
    Actions.DELETE: delete_user,
}

class Command(BaseCommand):
    help = 'sqs listener'

    def handle(self, *args, **options):
        self.stdout.write(self.style.WARNING("starting listener"))
        sqs = boto3.client('sqs')

        queue_url = settings.SQS_REACTIVE_TABLES

        def process_message(message):
            decoded_body = json.loads(message['Body'])
            data = json.loads(decoded_body['Message'])

            switch_actions.get(data['action'])(
                data=data['user'],
                timestamp=message['Attributes']['SentTimestamp']
            )

            notify_to_user(data['user'])

            sqs.delete_message(
                QueueUrl=queue_url,
                ReceiptHandle=message['ReceiptHandle'])

        def loop():
            response = sqs.receive_message(
                QueueUrl=queue_url,
                AttributeNames=[
                    'SentTimestamp'
                ],
                MaxNumberOfMessages=10,
                MessageAttributeNames=[
                    'All'
                ],
                WaitTimeSeconds=20
            )

            if 'Messages' in response:
                messages = [message for message in response['Messages'] if 'Body' in message]
                [process_message(message) for message in messages]

        try:
            while True:
                loop()
        except KeyboardInterrupt:
            sys.exit(0)

Here we persists the model in Client’s database

def insert_user(data, timestamp):
    username = data['username']
    serialized_user = UserSerializer(data=data)
    serialized_user.create(validated_data=data)
    logging.info(f"user: {username} created at {timestamp}")

def update_user(data, timestamp):
    username = data['username']
    try:
        user = User.objects.get(username=data['username'])
        serialized_user = UserSerializer(user)
        serialized_user.update(user, data)
        logging.info(f"user: {username} updated at {timestamp}")
    except User.DoesNotExist:
        logging.info(f"user: {username} don't exits. Creating ...")
        insert_user(data, timestamp)

def delete_user(data, timestamp):
    username = data['username']
    try:
        user = User.objects.get(username=username)
        user.delete()
        logging.info(f"user: {username} deleted at {timestamp}")
    except User.DoesNotExist:
        logging.info(f"user: {username} don't exits. Don't deleted")

And also emit one message to channel’s consumer

def notify_to_user(user):
    username = user['username']
    serialized_user = UserSerializer(user)
    emit_message_to_user(
        message=serialized_user.data,
        username=username, )

Here the Consumer:

class WsConsumer(AsyncWebsocketConsumer):
    @personal_consumer
    async def connect(self):
        await self.channel_layer.group_add(
            self._get_personal_room(),
            self.channel_name
        )

    @private_consumer_event
    async def emit_message(self, event):
        message = event['message']
        await self.send(text_data=json.dumps(message))

    def _get_personal_room(self):
        username = self.scope['user'].username
        return self.get_room_name(username)

    @staticmethod
    def get_room_name(room):
        return f"{'ws_room'}_{room}"

def emit_message_to_user(message, username):
    group = WsConsumer.get_room_name(username)
    channel_layer = get_channel_layer()
    async_to_sync(channel_layer.group_send)(group, {
        'type': WsConsumer.emit_message.__name__,
        'message': message
    })

Our consumer will only allow to connect only if the user is authenticated. That’s because I like Django Channels. This kind of thing are really simple to to (I’ve done similar things using PHP applications connected to a socket.io server and it was a nightmare). I’ve created a couple of decorators to ensure authentication in the consumer.

def personal_consumer(func):
    @wraps(func)
    async def wrapper_decorator(*args, **kwargs):
        self = args[0]

        async def accept():
            value = await func(*args, **kwargs)
            await self.accept()
            return value

        if self.scope['user'].is_authenticated:
            username = self.scope['user'].username
            room_name = self.scope['url_route']['kwargs']['username']
            if username == room_name:
                return await accept()

        await self.close()

    return wrapper_decorator

def private_consumer_event(func):
    @wraps(func)
    async def wrapper_decorator(*args, **kwargs):
        self = args[0]
        if self.scope['user'].is_authenticated:
            return await func(*args, **kwargs)

    return wrapper_decorator

That’s the websocket route

from django.urls import re_path

from client import consumers

websocket_urlpatterns = [
    re_path(r'ws/(?P&amp;lt;username&amp;gt;\w+)$', consumers.WsConsumer),
]

Finally we only need to connect our HTML page to the websocket

{% block title %}Example{% endblock %}
{% block header_text %}Hello <span id="name">{{ request.user.first_name }}</span>{% endblock %}

{% block extra_body %}
  <script>
    var ws_scheme = window.location.protocol === "https:" ? "wss" : "ws"
    var ws_path = ws_scheme + '://' + window.location.host + "/ws/{{ request.user.username }}"
    var ws = new ReconnectingWebSocket(ws_path)
    var render = function (key, value) {
      document.querySelector(`#${key}`).innerHTML = value
    }
    ws.onmessage = function (e) {
      const data = JSON.parse(e.data);
      render('name', data.first_name)
    }

    ws.onopen = function () {
      console.log('Connected')
    };
  </script>
{% endblock %}

Here a docker-compose with the project:

version: '3.4'

services:
  redis:
    image: redis
  master:
    image: reactive_master:latest
    command: python manage.py runserver 0.0.0.0:8001
    build:
      context: ./master
      dockerfile: Dockerfile
    depends_on:
      - "redis"
    ports:
      - 8001:8001
    environment:
      REDIS_HOST: redis
  celery:
    image: reactive_master:latest
    command: celery -A master worker --uid=nobody --gid=nogroup
    depends_on:
      - "redis"
      - "master"
    environment:
      REDIS_HOST: redis
      SNS_REACTIVE_TABLE_ARN: ${SNS_REACTIVE_TABLE_ARN}
      AWS_DEFAULT_REGION: ${AWS_DEFAULT_REGION}
      AWS_ACCESS_KEY_ID: ${AWS_ACCESS_KEY_ID}
      AWS_SECRET_ACCESS_KEY: ${AWS_SECRET_ACCESS_KEY}
  client:
    image: reactive_client:latest
    command: python manage.py runserver 0.0.0.0:8000
    build:
      context: ./client
      dockerfile: Dockerfile
    depends_on:
      - "redis"
    ports:
      - 8000:8000
    environment:
      REDIS_HOST: redis
  listener:
    image: reactive_client:latest
    command: python manage.py listener
    build:
      context: ./client
      dockerfile: Dockerfile
    depends_on:
      - "redis"
    environment:
      REDIS_HOST: redis
      SQS_REACTIVE_TABLES: ${SQS_REACTIVE_TABLES}
      AWS_DEFAULT_REGION: ${AWS_DEFAULT_REGION}
      AWS_ACCESS_KEY_ID: ${AWS_ACCESS_KEY_ID}
      AWS_SECRET_ACCESS_KEY: ${AWS_SECRET_ACCESS_KEY}

And that’s all. Here a working example of the prototype in action:

Source code in my github.

Building real time Python applications with Django Channels, Docker and Kubernetes

Three years ago I wrote an article about webockets. In fact I’ve written several articles about Websockets (Websockets and real time communications is something that I’m really passionate about), but today I would like to pick up this article. Nowadays I’m involved with several Django projects so I want to create a similar working prototype with Django. Let’s start:

In the past I normally worked with libraries such as socket.io to ensure browser compatibility with Websockets. Nowadays, at least in my world, we can assume that our users are using a modern browser with websocket support, so we’re going to use plain Websockets instead external libraries. Django has a great support to Websockets called Django Channels. It allows us to to handle Websockets (and other async protocols) thanks to Python’s ASGI’s specification. In fact is pretty straightforward to build applications with real time communication and with shared authentication (something that I have done in the past with a lot of effort. I’m getting old and now I like simple things :))

The application that I want to build is the following one: One Web application that shows the current time with seconds. Ok it’s very simple to do it with a couple of javascript lines but this time I want to create a worker that emits an event via Websockets with the current time. My web application will show that real time update. This kind of architecture always have the same problem: The initial state. In this example we can ignore it. When the user opens the browser it must show the current time. As I said before in this example we can ignore this situation. We can wait until the next event to update the initial blank information but if the event arrives each 10 seconds our user will have a blank screen until the next event arrives. In our example we’re going to take into account this situation. Each time our user connects to the Websocket it will ask to the server for the initial state.

Our initial state route will return the current time (using Redis). We can authorize our route using the standard Django’s protected routes

from django.contrib.auth.decorators import login_required
from django.http import JsonResponse
from ws.redis import redis

@login_required
def initial_state(request):
    return JsonResponse({'current': redis.get('time')})

We need to configure our channels and define a our event:

from django.urls import re_path

from ws import consumers

websocket_urlpatterns = [
    re_path(r'time/tic/$', consumers.WsConsumer),
]

As we can see here we can reuse the authentication middleware in channel’s consumers also.

import json
import json
from channels.generic.websocket import AsyncWebsocketConsumer


class WsConsumer(AsyncWebsocketConsumer):
    GROUP = 'time'

    async def connect(self):
        if self.scope["user"].is_anonymous:
            await self.close()
        else:
            await self.channel_layer.group_add(
                self.GROUP,
                self.channel_name
            )
            await self.accept()

    async def tic_message(self, event):
        if not self.scope["user"].is_anonymous:
            message = event['message']

            await self.send(text_data=json.dumps({
                'message': message
            }))

We’re going to need a worker that each second triggers the current time (to avoid problems we’re going to trigger our event each 0.5 seconds). To perform those kind of actions Django has a great tool called Celery. We can create workers and scheduled task with Celery (exactly what we need in our example). To avoid the “initial state” situation our worker will persists the initial state into a Redis storage

app = Celery('config')
app.config_from_object('django.conf:settings', namespace='CELERY')
app.autodiscover_tasks()


@app.on_after_configure.connect
def setup_periodic_tasks(sender, **kwargs):
   sender.add_periodic_task(0.5, ws_beat.s(), name='beat every 0.5 seconds')


@app.task
def ws_beat(group=WsConsumer.GROUP, event='tic_message'):
   current_time = time.strftime('%X')
   redis.set('time', current_time)
   message = {'time': current_time}
   channel_layer = channels.layers.get_channel_layer()
   async_to_sync(channel_layer.group_send)(group, {'type': event, 'message': message})

Finally we need a javascript client to consume our Websockets

let getWsUri = () => {
  return window.location.protocol === "https:" ? "wss" : "ws" +
    '://' + window.location.host +
    "/time/tic/"
}

let render = value => {
  document.querySelector('#display').innerHTML = value
}

let ws = new ReconnectingWebSocket(getWsUri())

ws.onmessage = e => {
  const data = JSON.parse(e.data);
  render(data.message.time)
}

ws.onopen = async () => {
  let response = await axios.get("/api/initial_state")
  render(response.data.current)
}

Basically that’s the source code (plus Django the stuff).

Application architecture
The architecture of the application is the following one:

Frontend
The Django application. We can run this application in development with
python manage.py runserver

And in production using a asgi server (uvicorn in this case)

uvicorn config.asgi:application --port 8000 --host 0.0.0.0 --workers 1

In development mode:

celery -A ws worker -l debug

And in production

celery -A ws worker --uid=nobody --gid=nogroup

We need this scheduler to emit our event (each 0.5 seconds)

celery -A ws beat

Message Server for Celery
In this case we’re going to use Redis

Docker
With this application we can use the same dockerfile for frontend, worker and scheduler using different entrypoints

Dockerfile:

FROM python:3.8

ENV TZ 'Europe/Madrid'
RUN echo $TZ > /etc/timezone && \
apt-get update && apt-get install -y tzdata && \
rm /etc/localtime && \
ln -snf /usr/share/zoneinfo/$TZ /etc/localtime && \
dpkg-reconfigure -f noninteractive tzdata && \
apt-get clean

ENV PYTHONDONTWRITEBYTECODE 1
ENV PYTHONUNBUFFERED 1

ADD . /src
WORKDIR /src

RUN pip install -r requirements.txt

RUN mkdir -p /var/run/celery /var/log/celery
RUN chown -R nobody:nogroup /var/run/celery /var/log/celery

And our whole application within a docker-compose file

version: '3.4'

services:
  redis:
    image: redis
  web:
    image: clock:latest
    command: /bin/bash ./docker-entrypoint.sh
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
      interval: 1m30s
      timeout: 10s
      retries: 3
      start_period: 40s
    depends_on:
      - "redis"
    ports:
      - 8000:8000
    environment:
      ENVIRONMENT: prod
      REDIS_HOST: redis
  celery:
    image: clock:latest
    command: celery -A ws worker --uid=nobody --gid=nogroup
    depends_on:
      - "redis"
    environment:
      ENVIRONMENT: prod
      REDIS_HOST: redis
  cron:
    image: clock:latest
    command: celery -A ws beat
    depends_on:
      - "redis"
    environment:
      ENVIRONMENT: prod
      REDIS_HOST: redis

If we want to deploy our application in a K8s cluster we need to migrate our docker-compose file into a k8s yaml files. I assume that we’ve deployed our docker containers into a container registry (such as ECR)

Frontend:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: clock-web-api
spec:
  replicas: 1
  selector:
    matchLabels:
      app: clock-web-api
      project: clock
  template:
    metadata:
      labels:
        app: clock-web-api
        project: clock
    spec:
      containers:
        - name: web-api
          image: my-ecr-path/clock:latest
          args: ["uvicorn", "config.asgi:application", "--port", "8000", "--host", "0.0.0.0", "--workers", "1"]
          ports:
            - containerPort: 8000
          env:
            - name: REDIS_HOST
              valueFrom:
                configMapKeyRef:
                  name: clock-app-config
                  key: redis.host
---
apiVersion: v1
kind: Service
metadata:
  name: clock-web-api
spec:
  type: LoadBalancer
  selector:
    app: clock-web-api
    project: clock
  ports:
    - protocol: TCP
      port: 8000 # port exposed internally in the cluster
      targetPort: 8000 # the container port to send requests to

Celery worker

apiVersion: apps/v1
kind: Deployment
metadata:
  name: clock-web-api
spec:
  replicas: 1
  selector:
    matchLabels:
      app: clock-web-api
      project: clock
  template:
    metadata:
      labels:
        app: clock-web-api
        project: clock
    spec:
      containers:
        - name: web-api
          image: my-ecr-path/clock:latest
          args: ["uvicorn", "config.asgi:application", "--port", "8000", "--host", "0.0.0.0", "--workers", "1"]
          ports:
            - containerPort: 8000
          env:
            - name: REDIS_HOST
              valueFrom:
                configMapKeyRef:
                  name: clock-app-config
                  key: redis.host
---
apiVersion: v1
kind: Service
metadata:
  name: clock-web-api
spec:
  type: LoadBalancer
  selector:
    app: clock-web-api
    project: clock
  ports:
    - protocol: TCP
      port: 8000 # port exposed internally in the cluster
      targetPort: 8000 # the container port to send requests to

Celery scheduler

apiVersion: apps/v1
kind: Deployment
metadata:
  name: clock-cron
spec:
  replicas: 1
  selector:
    matchLabels:
      app: clock-cron
      project: clock
  template:
    metadata:
      labels:
        app: clock-cron
        project: clock
    spec:
      containers:
        - name: clock-cron
          image: my-ecr-path/clock:latest
          args: ["celery", "-A", "ws", "beat"]
          env:
            - name: REDIS_HOST
              valueFrom:
                configMapKeyRef:
                  name: clock-app-config
                  key: redis.host

Redis

apiVersion: apps/v1
kind: Deployment
metadata:
  name: clock-redis
spec:
  replicas: 1
  selector:
    matchLabels:
      app: clock-redis
      project: clock
  template:
    metadata:
      labels:
        app: clock-redis
        project: clock
    spec:
      containers:
        - name: clock-redis
          image: redis
          ports:
            - containerPort: 6379
              name: clock-redis
---
apiVersion: v1
kind: Service
metadata:
  name: clock-redis
spec:
  type: ClusterIP
  ports:
    - port: 6379
      targetPort: 6379
  selector:
    app: clock-redis

Shared configuration

apiVersion: v1
kind: ConfigMap
metadata:
  name: clock-app-config
data:
  redis.host: "clock-redis"

We can deploy or application to our k8s cluster

kubectl apply -f .k8s/

And see it running inside the cluster locally with a port forward

kubectl port-forward deployment/clock-web-api 8000:8000

And that’s all. Our Django application with Websockets using Django Channels up and running with docker and also using k8s.

Source code in my github