Well, we’ve done it. We have our orchestration layer set up with Airflow, a documentation server, and a notification server.
Now it’s time to add a data warehouse to our stack. As I’ve mentioned before, it doesn’t really matter what you use as long as your data is centralized. Having data scattered across multiple sources only forces you to write a bunch of ETLs to bring it back together again, which is unnecessarily time-consuming. I try to keep all of my development on a single database instance locally and then commit to a single cloud DB instance for analytics.
Why PostgreSQL?
We need a database, and I’m using PostgreSQL. Here’s why:
- SQL-Based: If you’ve worked with any SQL database before, PostgreSQL will feel familiar.
- Actively Maintained: It’s open-source, robust, and backed by a large community.
- Free: No licensing costs.
- Great Documentation: There’s an abundance of guides and forums for troubleshooting.
Setting Up PostgreSQL with Docker Compose
Like the other services in our stack, we’ll use Docker Compose to spin up PostgreSQL. This keeps our setup modular, consistent, and easy to replicate or modify.
The official PostgreSQL image on Docker Hub provides all the details you need, including an example docker-compose.yml
file.
Here’s the foundational docker-compose.yml
:
|
|
Enhancing the Configuration
We’re going to improve this setup by:
- Setting Container Names: To avoid conflicts and make the containers easily identifiable.
- Customizing Ports: To prevent port conflicts.
- Increasing Shared Memory (
shm_size
): PostgreSQL benefits from additional shared memory, especially if you’re working with larger datasets. - Specifying Environment Variables:
POSTGRES_USER
: The database superuser name.POSTGRES_DB
: The default database to be created.
Here’s the updated docker-compose.yml
:
|
|
Exploring Adminer
Adminer is a lightweight, full-featured database management tool written in PHP. In this setup, it provides a web-based UI for interacting with your PostgreSQL database.
To access Adminer:
- Open your browser and navigate to
http://localhost:5431
. - Log in using the
POSTGRES_USER
,POSTGRES_PASSWORD
, andPOSTGRES_DB
values you specified in your.env
file or directly in the Compose file.
Starting the Services
Run the following command in the directory where your docker-compose.yml
file is located:
|
|
This will start both the PostgreSQL and Adminer containers in detached mode.
Verifying the Setup
Check Running Containers:
1
docker ps
You should see
sa-postgres
andsa-adminer
in the list.Test Database Access: Use a database client like
psql
or Adminer to connect to PostgreSQL. Use the credentials and ports defined in your Compose file.Inspect Data Volume: The
sa-postgres-data
volume stores your database data. Even if the container is removed, your data persists in this volume.
Next Steps
With PostgreSQL up and running, I now have the backbone of my data stack. Over the next few articles, we’ll look at how to connect this database to some ETL processes!