Cutting EC2 GPU instance cost in practice (a g6e story)
GPU bills usually balloon because a big card was left on and forgotten. Cutting cost isn't a flashy optimization — it's repeating three things: pick the right instance · turn it off when idle · raise utilization. Here's the order that worked on real projects.
1. Classify the workload first
Not all "GPU" is the same.
- Inference (serving) — low latency, steady traffic. Cost-per-throughput is what matters.
- Training/fine-tuning — short and heavy. If interruptible, spot is powerful.
- Batch/experiments — loose deadlines. The biggest room to raise utilization via scheduling and queues.
Skip the classification and you'll size everything to the most expensive instance.
2. Instances: newer generations are often cheaper
For inference, moving from older g5 (A10G) to g6e (L40S) often raises per-card throughput, so you serve the same throughput with fewer cards. What matters isn't price-per-hour but price-per-throughput.
# Compare on cost-per-throughput (example)
price/hr ÷ (requests per second) → cost per request
g5.xlarge : $1.006 / 120 rps = $0.0084 / 1k req
g6e.xlarge: $1.861 / 280 rps = $0.0066 / 1k req ← cheaper
Numbers vary by workload. Always benchmark with your own model. Choosing from a price sheet alone is wrong.
3. Turn it off when idle — the biggest saving
- Schedule dev/experiment instances to auto-stop nights and weekends.
- Set the autoscale floor to 0 or 1 so cards don't idle during quiet hours.
- Report "ownerless GPUs" weekly by tag.
Most savings come not from cheaper cards, but from reducing the hours they're on.
4. Spot + checkpoints
Interruptible training and batch can drop 60–70% on spot. The key is interruption tolerance:
- Periodic checkpoints make restart cost near zero.
- A queue (e.g., Kueue) re-queues automatically when spot is reclaimed.
5. Make utilization measurable
"GPUs are expensive" usually means "GPUs are idle." Put real usage on a dashboard with DCGM metrics, and make throughput-per-card a KPI — and you'll run more without buying more.
This is part of the cost optimization and GPU scheduling we handle in the Cloud & Infrastructure line. If the bill scares you every month, start with a free cost diagnosis.