Tracking Small-Cap Cryptocurrencies: A Practical Data Pipeline with a Side of Fun Part 1 🪙

Introduction

We all have hobbies, right? Some people collect stamps; I like to collect data on small-cap cryptocurrencies. Because nothing says “good use of free time” like tracking obscure digital coins with names like MoonFluff and RocketPoodle.

But in all seriousness, tracking small-cap coins isn’t just about watching volatile charts (though that’s part of the fun). It’s about building a robust data pipeline, exploring how to transform raw API outputs into actionable analytics, and flexing some data engineering muscles along the way.

This article takes you through my process of setting up a pipeline that pulls crypto data, transforms it, and sets the stage for deeper analysis—all while wrestling with the CoinMarketCap API and pretending I totally know what I’m doing. 😏

1. Why Track Small-Cap Coins?

Because it’s fun! And by “fun,” I mean intellectually engaging in that “trying to balance a data pipeline on 10,000 API credits per month” kind of way. Small-cap coins are particularly interesting because they’re chaotic, high-risk, and packed with potential. They’re the cryptocurrency world’s equivalent of indie bands: a little messy but full of possibilities. 🎸

For this project, my goal is to:

Get Data: Start collecting small-cap crypto data to populate my data stack.
Build a Pipeline: Use my existing setup, which includes:
- Python + Requests: Pull raw data from the API.
- FastAPI + Pydantic: Transform and validate data in the first stage.
- Postgres: Store data in a relational database.
- DBT: Create meaningful data models and analytics-ready transformations.
- OLAP Design: Combine all this into a system primed for analysis and visualization. 📊

If nothing else, I’ll end up with something pretty in Grafana. And if a small-cap coin I track happens to explode in value? Well, we’ll pretend I had a good financial reason for this exercise all along.

2. To Scrape or Not to Scrape?

Ah, the eternal question. Web scraping might seem like the scrappy underdog, but in practice, it’s a lot like parking in a fire lane: occasionally useful, generally frowned upon, and prone to getting you in trouble. 🚫🚗

Web Scraping: The Rogue’s Path

Pros: Free (kind of), unregulated (mostly), and limited only by your HTML-parsing skills.
Cons: Violates terms of service. Risk of IP bans. Feels a bit like trying to sneak into a concert by pretending you’re the drummer.

The API: Structured, Reliable, Legal

CoinMarketCap’s API is the grown-up choice. It’s designed for data access, and more importantly, it keeps the lawyers happy. Here’s what I discovered:

Free Tier: 10,000 credits/month. Generous if you’re not greedy.
Endpoints: I’m using cryptocurrency/listings/latest because it’s the buffet of crypto data.
Cost: 1 credit per 100 coins. Add a convert option (e.g., BTC), and it’s an extra credit. Do I really need to know the price in Dogecoin? No. So USD it is.
Update Frequency: Every 60 seconds. Perfect for tracking volatile markets—or as I call it, “digital gambling.” 🎰

3. Testing the API

First, I tested the API to figure out how many credits I’d burn through and whether I’d accidentally DDOS myself. Here’s the Python script I used to pull the top 500 small-cap coins:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
import requests

API_KEY = "YOUR_API_KEY"
URL = "https://pro-api.coinmarketcap.com/v1/cryptocurrency/listings/latest"

PARAMS = {
    "start": 1,
    "limit": 500,  # Top 500 coins
    "convert": "USD",
    "market_cap_max": 1e9  # Small-cap threshold
}

HEADERS = {
    "Accepts": "application/json",
    "X-CMC_PRO_API_KEY": API_KEY
}

response = requests.get(URL, headers=HEADERS, params=PARAMS)
data = response.json()

print(f"Credits used: {len(data['data']) // 100 + 1} (rough estimate)")

Results

Credits Used: 3 credits. It turns out there weren’t quite 500 small-cap coins in the dataset. 🤷‍♂️
How Often Can I Query? With 10,000 credits/month, I can query hourly (30 days × 24 hours × 3 credits = 2,160 credits/month) and still have room for error—or my inevitable curiosity.

4. Exploring and Normalizing the Data

The data from the API is… dense. It’s like a buffet where everything looks great until you realize half the options are nested structures. Take the quote field, for example: it’s an array, but I’m only using USD, so that’s one more thing to flatten.

Example Data Structure

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
{
    "id": 20947,
    "name": "Sui",
    "symbol": "SUI",
    "slug": "sui",
    "num_market_pairs": 440,
    "date_added": "2022-07-12T08:03:11.000Z",
    "tags": ["binance-launchpool", "layer-1"],
    "max_supply": 10000000000,
    "circulating_supply": 2927660018.558888,
    "total_supply": 10000000000,
    "quote": {
        "USD": {
            "price": 3.55,
            "volume_24h": 1872104986.2806106,
            "market_cap": 10408819004.025967,
            "percent_change_24h": -4.40,
            "last_updated": "2024-12-04T17:21:00.000Z"
        }
    }
}

Normalizing the Data

Here’s how I split the data into three normalized tables:

Coins Table: Static metadata like name, symbol, and tags. 🪙
Market Data Table: Transactional data like price and volume_24h. 📈
Tags Table: A many-to-one table linking tags to the coin’s id. 🏷️

Future Proofing: OLAP Planning

While I’m normalizing data now, I’m keeping an eye on OLAP (Online Analytical Processing) design. The goal? Build the smallest possible data structure with the most flexibility for recombining metrics like market cap and price changes into aggregate views.

5. What’s Next?

Now that I’ve explored the API and laid the groundwork for normalizing the data, the next step is to extend my FastAPI setup to:

Validate the incoming data. ✅
Store it in my Postgres database. 🐘
Let DBT work its magic to turn raw numbers into actionable insights. ✨

Stay tuned for the next article, where we’ll dive into FastAPI, Pydantic, and Postgres as we build the second stage of this pipeline. Spoiler: it involves more JSON parsing and a lot of coffee. ☕

Introduction#

1. Why Track Small-Cap Coins?#

2. To Scrape or Not to Scrape?#

Web Scraping: The Rogue’s Path#

The API: Structured, Reliable, Legal#

3. Testing the API#

Results#

4. Exploring and Normalizing the Data#

Example Data Structure#

Normalizing the Data#

Future Proofing: OLAP Planning#

5. What’s Next?#