Named Volumes for ClickHouse Container
How to set up a named volume for ClickHouse server running in a container
Context and Motivation
ClickHouse is a fast and resource efficient open-source database fro real-time apps and analytics. It is also used by Plausible Analytics for storing event data.
Plausible has very reasonbale pricing, so for production loads, I’d recommend using their cloud-hosted option unless you have specific requirements where your data is located.
Both, ClickHouse and Plausible, can easily be deployed as containers. This is my preferred dev environment setup because then it becomes easy to give each developer their own instance of Plausible (or just ClickHouse) they fully control. This makes for a much nicer developer experience (DevEx).
When following the steps suggested by Plausible at https://plausible.io/docs/self-hosting with template code at https://github.com/plausible/hosting all docker volumes are created with generated names that don’t allow identifying what they are used for.
Here is an example of the volumes list in Docker Desktop, showing what I am referring to:
You can click on those names and then see the content of those volumes. In some cases it is very obvious, in other cases it may leave you wondering. Also, you may be working with multiple clients and so you may have multiple volumes that are used by ClickHouse (or any other container such as Postgres for its data).
Unfortunately, there is no universal solution to the problem. Fundamentally, in each case we need to find out, what volumes a particular container uses. In the case of ClickHouse, there are two such volumes:
Event data
Logs
Each of these is in a different location. Event data is stored at /var/lib/clickhouse
and the logs are stored at /var/log/clickhouse-server
.
When we look at the template provided by the Plausible folks, they suggest the following for the volumes:
# other code left out for brevity
volumes:
# other code left out for brevity
event-data:
driver: local
This doesn’t solve the problem, though. It just lists the volumes used elsewhere in the docker-compose file.
Solution
To solve this issue we need to give those volumes specific names in the docker-compose files. Here is an example:
volumes:
event-data:
driver: local
name: optarix-plausible-event-db-data
This solves just one half of the problem. As it turns out the ClickHouse container uses a second volume for its logs. So let’s add that as well with a suitable name:
volumes:
event-data:
driver: local
name: optarix-plausible-event-db-data
clickhouse-logs:
driver: local
name: optarix-plausible-clickhouse-logs
Equipped with this, we can now provide ClickHouse with two volumes in the docker-compose file as follows:
version: '3.3'
services:
# other code left out for brevity
plausible_events_db:
image: clickhouse/clickhouse-server:23.3.7.5-alpine
container_name: optarix-plausible_events_db
hostname: optarix-plausible_events_db.local
# restart: always
volumes:
- event-data:/var/lib/clickhouse
- clickhouse-logs:/var/log/clickhouse-server
- ./clickhouse/clickhouse-config.xml:/etc/clickhouse-server/config.d/logging.xml:ro
- ./clickhouse/clickhouse-user-config.xml:/etc/clickhouse-server/users.d/logging.xml:ro
ulimits:
nofile:
soft: 262144
hard: 262144
networks:
optarix:
I’ve highlighted the two volumes in bold.
Once we restart the set of containers, now we have a much nicer list of the volumes in Docker Desktop:
Now when I’m every interested in the content of a volume it’s much easier to locate. When you want to delete the correct volume through the UI, this is much easier to get right. The CLI will provide nicer output as well:
In Closing
Obviously, you can use this technique for other containers as well. Postgres would be such an example.
Feel free to comment or ask questions in response to this post.
For more tipps and tricks from commercial software engineering, subscribe to the GeekCoder Journal. Happy coding!