Named Volumes for ClickHouse Container

How to set up a named volume for ClickHouse server running in a container

Nov 04, 2023

Context and Motivation

ClickHouse is a fast and resource efficient open-source database fro real-time apps and analytics. It is also used by Plausible Analytics for storing event data.

Plausible has very reasonbale pricing, so for production loads, I’d recommend using their cloud-hosted option unless you have specific requirements where your data is located.

Both, ClickHouse and Plausible, can easily be deployed as containers. This is my preferred dev environment setup because then it becomes easy to give each developer their own instance of Plausible (or just ClickHouse) they fully control. This makes for a much nicer developer experience (DevEx).

When following the steps suggested by Plausible at https://plausible.io/docs/self-hosting with template code at https://github.com/plausible/hosting all docker volumes are created with generated names that don’t allow identifying what they are used for.

Here is an example of the volumes list in Docker Desktop, showing what I am referring to:

You can click on those names and then see the content of those volumes. In some cases it is very obvious, in other cases it may leave you wondering. Also, you may be working with multiple clients and so you may have multiple volumes that are used by ClickHouse (or any other container such as Postgres for its data).

Unfortunately, there is no universal solution to the problem. Fundamentally, in each case we need to find out, what volumes a particular container uses. In the case of ClickHouse, there are two such volumes:

Event data
Logs

Each of these is in a different location. Event data is stored at /var/lib/clickhouse and the logs are stored at /var/log/clickhouse-server.

When we look at the template provided by the Plausible folks, they suggest the following for the volumes:

# other code left out for brevity

volumes:
  # other code left out for brevity
  event-data:
    driver: local

(Source: https://github.com/plausible/hosting/blob/bb6decee4d33ccf84eb235b6053443a01498db53/docker-compose.yml)

This doesn’t solve the problem, though. It just lists the volumes used elsewhere in the docker-compose file.

Solution

To solve this issue we need to give those volumes specific names in the docker-compose files. Here is an example:

volumes:
   event-data:
      driver: local
      name: optarix-plausible-event-db-data

This solves just one half of the problem. As it turns out the ClickHouse container uses a second volume for its logs. So let’s add that as well with a suitable name:

volumes:
   event-data:
      driver: local
      name: optarix-plausible-event-db-data
   clickhouse-logs:
      driver: local
      name: optarix-plausible-clickhouse-logs

Equipped with this, we can now provide ClickHouse with two volumes in the docker-compose file as follows:

version: '3.3'

services:
   #  other code left out for brevity

   plausible_events_db:
      image: clickhouse/clickhouse-server:23.3.7.5-alpine
      container_name: optarix-plausible_events_db
      hostname: optarix-plausible_events_db.local
      # restart: always
      volumes:
         - event-data:/var/lib/clickhouse
         - clickhouse-logs:/var/log/clickhouse-server
         - ./clickhouse/clickhouse-config.xml:/etc/clickhouse-server/config.d/logging.xml:ro
         - ./clickhouse/clickhouse-user-config.xml:/etc/clickhouse-server/users.d/logging.xml:ro
      ulimits:
         nofile:
            soft: 262144
            hard: 262144
      networks:
         optarix:

I’ve highlighted the two volumes in bold.

Once we restart the set of containers, now we have a much nicer list of the volumes in Docker Desktop:

Now when I’m every interested in the content of a volume it’s much easier to locate. When you want to delete the correct volume through the UI, this is much easier to get right. The CLI will provide nicer output as well:

In Closing

Obviously, you can use this technique for other containers as well. Postgres would be such an example.

Feel free to comment or ask questions in response to this post.

For more tipps and tricks from commercial software engineering, subscribe to the GeekCoder Journal. Happy coding!

GeekCoder Journal

Named Volumes for ClickHouse Container

How to set up a named volume for ClickHouse server running in a container

Context and Motivation

Solution

In Closing

Discussion about this post