Introduction
We all have hobbies, right? Some people collect stamps; I like to collect data on small-cap cryptocurrencies. Because nothing says “good use of free time” like tracking obscure digital coins with names like MoonFluff and RocketPoodle.
But in all seriousness, tracking small-cap coins isn’t just about watching volatile charts (though that’s part of the fun). It’s about building a robust data pipeline, exploring how to transform raw API outputs into actionable analytics, and flexing some data engineering muscles along the way.
This article takes you through my process of setting up a pipeline that pulls crypto data, transforms it, and sets the stage for deeper analysisāall while wrestling with the CoinMarketCap API and pretending I totally know what Iām doing. š
1. Why Track Small-Cap Coins?
Because it’s fun! And by “fun,” I mean intellectually engaging in that “trying to balance a data pipeline on 10,000 API credits per month” kind of way. Small-cap coins are particularly interesting because they’re chaotic, high-risk, and packed with potential. They’re the cryptocurrency worldās equivalent of indie bands: a little messy but full of possibilities. šø
For this project, my goal is to:
- Get Data: Start collecting small-cap crypto data to populate my data stack.
- Build a Pipeline: Use my existing setup, which includes:
- Python + Requests: Pull raw data from the API.
- FastAPI + Pydantic: Transform and validate data in the first stage.
- Postgres: Store data in a relational database.
- DBT: Create meaningful data models and analytics-ready transformations.
- OLAP Design: Combine all this into a system primed for analysis and visualization. š
If nothing else, Iāll end up with something pretty in Grafana. And if a small-cap coin I track happens to explode in value? Well, weāll pretend I had a good financial reason for this exercise all along.
2. To Scrape or Not to Scrape?
Ah, the eternal question. Web scraping might seem like the scrappy underdog, but in practice, itās a lot like parking in a fire lane: occasionally useful, generally frowned upon, and prone to getting you in trouble. š«š
Web Scraping: The Rogueās Path
- Pros: Free (kind of), unregulated (mostly), and limited only by your HTML-parsing skills.
- Cons: Violates terms of service. Risk of IP bans. Feels a bit like trying to sneak into a concert by pretending you’re the drummer.
The API: Structured, Reliable, Legal
CoinMarketCapās API is the grown-up choice. Itās designed for data access, and more importantly, it keeps the lawyers happy. Hereās what I discovered:
- Free Tier: 10,000 credits/month. Generous if youāre not greedy.
- Endpoints: Iām using
cryptocurrency/listings/latest
because itās the buffet of crypto data. - Cost: 1 credit per 100 coins. Add a convert option (e.g., BTC), and itās an extra credit. Do I really need to know the price in Dogecoin? No. So USD it is.
- Update Frequency: Every 60 seconds. Perfect for tracking volatile marketsāor as I call it, “digital gambling.” š°
3. Testing the API
First, I tested the API to figure out how many credits Iād burn through and whether Iād accidentally DDOS myself. Here’s the Python script I used to pull the top 500 small-cap coins:
|
|
Results
- Credits Used: 3 credits. It turns out there werenāt quite 500 small-cap coins in the dataset. š¤·āāļø
- How Often Can I Query? With 10,000 credits/month, I can query hourly (30 days Ć 24 hours Ć 3 credits = 2,160 credits/month) and still have room for errorāor my inevitable curiosity.
4. Exploring and Normalizing the Data
The data from the API is… dense. Itās like a buffet where everything looks great until you realize half the options are nested structures. Take the quote
field, for example: itās an array, but Iām only using USD, so thatās one more thing to flatten.
Example Data Structure
|
|
Normalizing the Data
Hereās how I split the data into three normalized tables:
- Coins Table: Static metadata like
name
,symbol
, andtags
. šŖ - Market Data Table: Transactional data like
price
andvolume_24h
. š - Tags Table: A many-to-one table linking
tags
to the coināsid
. š·ļø
Future Proofing: OLAP Planning
While Iām normalizing data now, Iām keeping an eye on OLAP (Online Analytical Processing) design. The goal? Build the smallest possible data structure with the most flexibility for recombining metrics like market cap and price changes into aggregate views.
5. Whatās Next?
Now that Iāve explored the API and laid the groundwork for normalizing the data, the next step is to extend my FastAPI setup to:
- Validate the incoming data. ā
- Store it in my Postgres database. š
- Let DBT work its magic to turn raw numbers into actionable insights. āØ
Stay tuned for the next article, where weāll dive into FastAPI, Pydantic, and Postgres as we build the second stage of this pipeline. Spoiler: it involves more JSON parsing and a lot of coffee. ā