Cloud Cost Efficiency: What We've Learned Managing Infrastructure at Scale

April 23, 2026

When we built the HPC infrastructure for a FAANG company's AI research division, we were managing more than 6,000 GPUs across 20 clusters simultaneously. The scale made one thing very clear: in cloud infrastructure, the difference between a well-managed environment and a poorly managed one isn't measured in percentages — it's measured in orders of magnitude.

The clients I've seen struggle most with cloud costs aren't the ones who chose the wrong provider or the wrong services. They're the ones who provisioned for a theoretical worst case and never revisited it, or who optimized for performance without anyone assigned to watch the bill.

After years of building and managing cloud infrastructure for clients across financial services, real estate, media, and AI research, here's what I've actually found to matter.

The provisioning trap

The most common source of cloud waste I see is over-provisioning at launch combined with under-optimization over time. A team spins up infrastructure for a new product, sizes it for expected peak load, and then moves on to the next thing. Six months later, the infrastructure is running at 30% utilization and nobody has touched it.

Cloud providers make provisioning frictionless. That's the point. But it means the discipline of right-sizing has to come from the team, not the platform. AWS won't tell you that your RDS instance is twice the size you need. It'll just keep charging you.

The fix is structural: assign someone the explicit responsibility for reviewing resource utilization on a regular cadence — monthly at minimum, weekly for high-spend environments. Not as a one-time audit, but as an ongoing operational function.

Pricing models are decisions, not defaults

Most teams default to on-demand pricing because it's the path of least resistance. On-demand is the right choice for variable or unpredictable workloads — but for anything with a predictable baseline, it's the most expensive option.

Reserved instances on AWS typically deliver 30-60% savings compared to on-demand for the same compute. The trade-off is a 1 or 3 year commitment. For workloads that will clearly be running for that duration — production databases, core application servers, always-on analytics infrastructure — the math is straightforward.

Spot instances take this further: up to 90% off on-demand pricing for workloads that can tolerate interruption. AI training runs, batch data processing, non-critical background jobs — these are natural candidates. On the HPC project, we used spot instances for burst compute capacity during training cycles, with on-demand reserved capacity as the baseline. That combination reduced compute costs significantly without affecting research velocity.

The decision framework is simple: on-demand for unpredictable, reserved for predictable baselines, spot for interruption-tolerant workloads.

‍

Pricing model decision framework

On-demand

Baseline cost

Pay per hour with no commitment. Maximum flexibility, maximum cost.

Variable workloads

Reserved instances

30–60% savings

1 or 3 year commitment on predictable baseline capacity. Best for production workloads.

Predictable baselines

Spot instances

Up to 90% savings

Unused capacity at steep discount. Workload can be interrupted — plan for it.

Interruption-tolerant

Storage tiering — where costs hide

Standard storage

Data accessed daily. Production databases, active application assets, real-time analytics.

Daily access

Infrequent access

Data accessed monthly. Backups, older logs, historical reports. Significant cost reduction.

Monthly access

Cold storage / Glacier

Compliance archives, data retained but rarely retrieved. Lowest cost, retrieval takes minutes to hours.

Rarely accessed

‍

Storage is where costs hide

Compute costs get attention because they're visible and variable. Storage costs accumulate quietly and are easy to ignore until they're significant.

The key principle is tiering. Not all data needs to live in high-performance storage. Data that's accessed daily belongs in S3 Standard or equivalent. Data accessed monthly belongs in a lower-cost tier. Data that's retained for compliance but rarely accessed belongs in cold storage or Glacier. Most organizations pay S3 Standard pricing for data across all three categories because nobody set up lifecycle policies.

On data-intensive projects, we implement lifecycle policies from day one — automatic transitions between storage tiers based on access patterns. It's a one-time configuration that compounds over time. The older a dataset, the more the savings accumulate.

Cost attribution is a prerequisite for optimization

You can't optimize what you can't measure, and most teams can't measure their cloud spend at the level of granularity that makes optimization actionable.

Cost allocation tags — applied to every resource at creation — let you attribute spend to specific teams, projects, products, or clients. Without them, your AWS bill is a single number that tells you what you spent but not where or why.

On every project we deliver, tagging is part of the initial infrastructure setup, not an afterthought. A team that can see that Project A consumed 60% of their compute budget last month, while Project B consumed 15% and was three times the size, has the information they need to make decisions. A team looking at a single total doesn't.

The FinOps mindset

The organizations that manage cloud costs well tend to have one thing in common: they treat infrastructure spend as a product decision, not just an IT cost. Engineers understand the cost implications of architectural choices. Product managers include infrastructure costs in ROI calculations. Finance has visibility into cloud spend at the project level.

This isn't about penny-pinching — it's about making informed trade-offs. Spending more on managed services to reduce engineering overhead is often the right call. Spending more on reserved instances to reduce per-unit compute cost is often the right call. Making those choices consciously, with the data to support them, is what separates teams that control their cloud spend from teams that react to it.

Is your cloud spend growing faster than your business? Let's talk about what's driving it →

‍