Cloud / AWS in practice

Cutting EC2 GPU instance cost in practice (a g6e story)

GPU bills usually balloon because a big card was left on and forgotten. Cutting cost isn't a flashy optimization — it's repeating three things: pick the right instance · turn it off when idle · raise utilization. Here's the order that worked on real projects.

1. Classify the workload first

Not all "GPU" is the same.

  • Inference (serving) — low latency, steady traffic. Cost-per-throughput is what matters.
  • Training/fine-tuning — short and heavy. If interruptible, spot is powerful.
  • Batch/experiments — loose deadlines. The biggest room to raise utilization via scheduling and queues.

Skip the classification and you'll size everything to the most expensive instance.

2. Instances: newer generations are often cheaper

For inference, moving from older g5 (A10G) to g6e (L40S) often raises per-card throughput, so you serve the same throughput with fewer cards. What matters isn't price-per-hour but price-per-throughput.

# Compare on cost-per-throughput (example)
price/hr ÷ (requests per second)  →  cost per request
g5.xlarge :  $1.006 / 120 rps  =  $0.0084 / 1k req
g6e.xlarge:  $1.861 / 280 rps  =  $0.0066 / 1k req   ← cheaper

Numbers vary by workload. Always benchmark with your own model. Choosing from a price sheet alone is wrong.

3. Turn it off when idle — the biggest saving

  • Schedule dev/experiment instances to auto-stop nights and weekends.
  • Set the autoscale floor to 0 or 1 so cards don't idle during quiet hours.
  • Report "ownerless GPUs" weekly by tag.

Most savings come not from cheaper cards, but from reducing the hours they're on.

4. Spot + checkpoints

Interruptible training and batch can drop 60–70% on spot. The key is interruption tolerance:

  • Periodic checkpoints make restart cost near zero.
  • A queue (e.g., Kueue) re-queues automatically when spot is reclaimed.

5. Make utilization measurable

"GPUs are expensive" usually means "GPUs are idle." Put real usage on a dashboard with DCGM metrics, and make throughput-per-card a KPI — and you'll run more without buying more.


This is part of the cost optimization and GPU scheduling we handle in the Cloud & Infrastructure line. If the bill scares you every month, start with a free cost diagnosis.

Want this work done for you?

A free 30-minute consult to set the direction first.

Request a consult