Well, we’ve done it. We have our orchestration layer set up with Airflow, a documentation server, and a notification server.

Now it’s time to add a data warehouse to our stack. As I’ve mentioned before, it doesn’t really matter what you use as long as your data is centralized. Having data scattered across multiple sources only forces you to write a bunch of ETLs to bring it back together again, which is unnecessarily time-consuming. I try to keep all of my development on a single database instance locally and then commit to a single cloud DB instance for analytics.

Why PostgreSQL?

We need a database, and I’m using PostgreSQL. Here’s why:

  • SQL-Based: If you’ve worked with any SQL database before, PostgreSQL will feel familiar.
  • Actively Maintained: It’s open-source, robust, and backed by a large community.
  • Free: No licensing costs.
  • Great Documentation: There’s an abundance of guides and forums for troubleshooting.

Setting Up PostgreSQL with Docker Compose

Like the other services in our stack, we’ll use Docker Compose to spin up PostgreSQL. This keeps our setup modular, consistent, and easy to replicate or modify.

The official PostgreSQL image on Docker Hub provides all the details you need, including an example docker-compose.yml file.

Here’s the foundational docker-compose.yml:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
version: '3.9'

services:

  db:
    image: postgres
    restart: always
    shm_size: 128mb
    environment:
      POSTGRES_PASSWORD: example

  adminer:
    image: adminer
    restart: always
    ports:
      - 8080:8080

Enhancing the Configuration

We’re going to improve this setup by:

  1. Setting Container Names: To avoid conflicts and make the containers easily identifiable.
  2. Customizing Ports: To prevent port conflicts.
  3. Increasing Shared Memory (shm_size): PostgreSQL benefits from additional shared memory, especially if you’re working with larger datasets.
  4. Specifying Environment Variables:
    • POSTGRES_USER: The database superuser name.
    • POSTGRES_DB: The default database to be created.

Here’s the updated docker-compose.yml:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
version: '3.9'

services:

  db:
    image: postgres
    container_name: sa-postgres
    shm_size: 512mb
    restart: unless-stopped
    ports:
      - "5432:5432"
    environment:
      POSTGRES_USER: ${POSTGRES_USER}
      POSTGRES_PASSWORD: ${POSTGRES_PASSWORD}
      POSTGRES_DB: ${POSTGRES_DB}
    volumes:
      - sa-postgres-data:/var/lib/postgresql/data

  adminer:
    image: adminer
    container_name: sa-adminer
    restart: unless-stopped
    ports:
      - 5431:8080

volumes:
  sa-postgres-data:

Exploring Adminer

Adminer is a lightweight, full-featured database management tool written in PHP. In this setup, it provides a web-based UI for interacting with your PostgreSQL database.

To access Adminer:

  1. Open your browser and navigate to http://localhost:5431.
  2. Log in using the POSTGRES_USER, POSTGRES_PASSWORD, and POSTGRES_DB values you specified in your .env file or directly in the Compose file.

Starting the Services

Run the following command in the directory where your docker-compose.yml file is located:

1
docker-compose up -d

This will start both the PostgreSQL and Adminer containers in detached mode.

Verifying the Setup

  1. Check Running Containers:

    1
    
    docker ps
    

    You should see sa-postgres and sa-adminer in the list.

  2. Test Database Access: Use a database client like psql or Adminer to connect to PostgreSQL. Use the credentials and ports defined in your Compose file.

  3. Inspect Data Volume: The sa-postgres-data volume stores your database data. Even if the container is removed, your data persists in this volume.

Next Steps

With PostgreSQL up and running, I now have the backbone of my data stack. Over the next few articles, we’ll look at how to connect this database to some ETL processes!