This layer is the (only?) physical layer in the tech stack. Infrastructure is the famous cloud, the data centers, the actual hardware itself. It’s the metal (chips, servers, storage, and networking) that makes computation possible. Every AI instruction, from training a model to serving a single chatbot reply, ultimately runs on this layer. Just as your laptop provides the “metal” that allows programs to run on, the infrastructure layer provides the industrial-scale compute backbone that powers the digital economy and this new wave of AI.

The Cloud

We could write forever about the cloud. Matter of fact, there are a lot of books on it. I remember when I started reading into this and told my friend I wanted to be a cloud Architect. With no context, that statement sounds like you are going crazy. For reference, a Cloud Architect is the person who design the way your systems will be structured on the cloud, but that's not important right now.

Anyways, the cloud is a service some companies provide that let you run computational instructions in someone else's hardware. Back before these companies existed, lets say you wanted to create a website where you would sell your handmade quilts. Well, you had to buy servers. These are the computers where your programs would run, and where you would store your data. These are literally racks of metal.

ChatGPT Image Aug 4, 2025, 09_15_27 PM.png

It meant you had to connect them all in a room, give them maintenance, have a certain temperature in the room. Even if you wanted to specialize in making and selling your quilts, you needed to dedicate time and effort to maintaining these servers.

That is, until the cloud came along. In 2006, Amazon realized they needed to expand their capacity in their data centers (servers) for big sales, like Thanksgiving or Christmas. However, the rest of the year, they didn't really need the capacity. So they decided to launch services where other companies could access those servers and run any programs they needed. This was the start of Amazon Web Services (AWS).

Just like that, Microsoft launched their services called Azure, and Google followed suit with their Google Cloud platform (GCP). These three are the main Cloud providers, typically called Hyperscalers. Their Cloud products were a revolution in the Tech space since it meant a lot of startups didn't need to spend a ton of money building out their infrastructure, they could just rent it form someone else, hence that famous saying “The Cloud is just someone else's computer”.

The Two Perspectives

When we talk about infrastructure, it helps to think about two main players:

The operators — the hyperscalers who run the data centers.
The users — the companies who rent from them.

That’s really the relationship that matters.

On the operator side, take Microsoft Azure as an example. Microsoft works with vendors to build massive data centers, then they run them day to day: managing energy, cooling, physical security, hardware refreshes, etc. The whole point is that you, as a customer, don’t need to worry about any of this. If I just want to run my quilt e-commerce site, I don’t need to learn the optimal temperature of a server room. I just pay Azure, and it runs.

ChatGPT Image Aug 23, 2025, 01_59_39 PM.png

On the user side, the value is obvious: why spend billions building and maintaining your own data centers when you can rent capacity as you need it? There are downsides, mind you, vendor lock-in is a huge topic. Some older companies still have what’s called “on-prem” (their own servers they owned before the cloud took off), so today many run in a hybrid model, some workloads in their data centers, some on the cloud. But for most companies, especially startups, renting is the only sane option.

So the job of hyperscalers isn’t always to build the services themselves, it’s to make sure the infrastructure is ready so customers can run whatever workloads they want. In short, they handle the heavy lifting, so everyone else can just plug in and go.

Cool, what about AI?

AI needs a LOT of compute. And it needs it in two instances: first, to train the models, and second to run them. When an AI model is already trained and being used (say, asking ChatGPT a question), the process in the background is called inference, the actual use of the model.