8 AI Terms Every Data Centre Professional Needs to Know
AI has arrived in the data centre industry not as a distant trend to monitor, but as an immediate and structural force reshaping facility design, power strategy, investment capital, and market geography.
Across the GCC and globally, the question is no longer whether AI will affect digital infrastructure. The question is whether infrastructure professionals have the conceptual vocabulary required to respond to it effectively.
This is the first in a GDCA research series, ‘AI Decoded’, which explores the language, economics, and infrastructure implications of the intelligence economy for the Gulf data centre market.
- Large Language Models – and the Rise of Multimodal AI
What it is
A Large Language Model (LLM) is an AI system trained on vast volumes of data to develop a generalised ability to understand, reason about, and generate content.
Originally, these systems focused primarily on text. Increasingly, however, they can process and generate across multiple data types including images, audio, video, and code.
Multimodal AI refers to systems that can understand and generate content across more than one type of data, such as text, images, audio, video, and code.
The leading models in this category include OpenAI’s GPT series, Google’s Gemini, Meta’s Llama, Anthropic’s Claude, and xAI’s Grok.
The GCC is also developing its own foundation models. In the UAE, Jais – developed by G42 and the Mohamed bin Zayed University of Artificial Intelligence – represents one of the region’s most prominent Arabic-language models. Saudi Arabia is pursuing similar ambitions through initiatives linked to SDAIA and the National Centre for AI.
The most advanced systems are now more accurately described as foundation or multimodal models rather than purely language models. A user might submit an image alongside a written question and receive a reasoned written response. A model might read a document, analyse a chart, and summarise findings in natural language within a single interaction.
The term “Large Language Model” remains widely used across the industry, but readers should understand that it now describes a broader and more capable class of system than the name strictly implies.
These models are described as “large” for a technical reason rather than a marketing one. They contain billions, and in frontier cases, hundreds of billions, of adjustable numerical parameters refined during the training process.
Why it matters for infrastructure
From an infrastructure perspective, LLMs and multimodal foundation models are exceptionally demanding.
Training a frontier model requires simultaneous calculations across thousands of specialised processors over periods lasting weeks or months. This process consumes enormous amounts of power and places sustained pressure on compute, cooling, and networking systems.
Deploying a trained model at scale creates a different, but equally important, infrastructure challenge. AI services need to be positioned close to end users to minimise latency and maintain acceptable response times.
The key infrastructure implication is one of kind, not simply scale.
Before the rise of LLMs, most enterprise workloads could be accommodated within relatively standard colocation or cloud environments. AI workloads increasingly do not fit that model. They require facilities purpose-built for high-density, high-throughput compute, with power densities, cooling systems, and network architectures that differ materially from those designed for conventional enterprise or web-scale environments.
In the GCC, sovereign and commercial model development is already translating directly into infrastructure demand. When a government or enterprise chooses to develop or host a foundation model, it is also making a significant data centre investment decision.
- Training and Inference
What it is
Training and inference are the two core phases of an AI model’s lifecycle, and they have fundamentally different infrastructure profiles.
Training is the process through which an AI model learns. Large datasets are repeatedly fed through the model while billions of parameters are adjusted to reduce prediction errors.
Training frontier AI models is computationally intensive, time-limited, and extremely power hungry. It typically occurs once during the creation of a model, with periodic retraining or updates afterwards.
Inference is what happens once the model is deployed.
When a user submits a prompt and receives a response, the model is performing inference. Although each individual inference request is less computationally intensive than training, inference happens continuously, at scale, and under strict latency constraints.
A user expecting an almost instant response cannot wait for a geographically distant facility to process their request.
Why it matters for infrastructure
The distinction between training and inference is arguably the single most commercially important concept for infrastructure professionals engaging with AI demand.
Training clusters are relatively few in number but extremely large in scale.
They require:
- Thousands of co-located GPUs
- Ultra-high-speed interconnects
- Stable, high-volume power supply
- Advanced cooling infrastructure
- Sophisticated networking architecture
These environments are typically built by hyperscalers and a small number of highly capitalised AI companies.
Training clusters are already under development across several GCC markets, including Abu Dhabi, Riyadh, and Doha, often as components of wider sovereign AI strategies.
Inference infrastructure is different.
This is where the longer-term volume opportunity for the regional data centre market likely sits.
As AI services become integrated into enterprise software, government platforms, industrial systems, and consumer applications, inference workloads will need to be distributed across a much broader geographic footprint.
Inference environments are more latency sensitive, closer to end users, and likely to exist across a far wider range of markets than the concentrated training clusters attracting today’s headlines.
For investors and developers, understanding whether a prospective tenant requires training or inference capacity affects almost every aspect of infrastructure design – from rack density and cooling strategy to contract structure and operational profile.
- GPU – Graphics Processing Unit
What it is
The GPU was not originally designed for artificial intelligence. It was designed for video games.
Rendering computer graphics requires millions of mathematical operations to be performed simultaneously. To support this, chip designers developed processors with thousands of smaller parallel processing cores rather than the smaller number of highly powerful cores found in traditional CPUs.
When AI researchers began experimenting with neural network training in the late 2000s and early 2010s, they discovered that these parallel processing architectures were exceptionally well suited to AI workloads.
NVIDIA recognised this opportunity earlier than most.
The company invested heavily in AI-focused software and hardware ecosystems, positioning its H100 and more recently B200 chips at the centre of the current AI infrastructure buildout.
Other chipmakers and hyperscalers are now developing their own AI-specific processors, but NVIDIA’s early lead has given it a level of market influence that continues to shape global AI infrastructure investment.
Why it matters for infrastructure
Modern AI-grade GPUs are not simply computationally powerful. They are also exceptionally heat intensive.
This thermal reality is one of the defining engineering challenges of the current AI infrastructure market.
A single NVIDIA H100 GPU carries a thermal design power of roughly 700 watts. A standard AI server containing eight GPUs can therefore generate more than 5 kilowatts of heat before networking, memory, and supporting systems are even considered.
AI racks operating at 50 to 100 kilowatts are becoming increasingly common.
That is materially different from conventional enterprise environments, where racks have historically operated closer to 10 to 15 kilowatts.
At these densities, traditional ai- cooling approaches begin reaching their physical limits.
The infrastructure implications extend well beyond cooling.
GPU scarcity has also created a two-tier dynamic in cloud infrastructure markets.
Large hyperscalers such as Microsoft, Google, Amazon, and Meta secured major forward allocations of NVIDIA hardware through their scale and long-standing supplier relationships. Smaller operators and enterprise buyers have often faced lead times lasting many months.
This scarcity helped create an entirely new category of infrastructure provider: the neocloud.
For GCC governments pursuing sovereign AI ambitions, GPU access has also become a geopolitical and policy issue. US export controls on advanced AI chips have made access to high-end compute infrastructure strategically important across multiple Gulf markets.
- Tokens
What it is
A token is the basic unit of text processed by a language model. In commercial practice, it is also the unit through which most AI services are bought and sold.
Tokens do not map perfectly onto words. They are closer to syllables or fragments of language. A rough rule of thumb is that 1,000 tokens equate to approximately 750 words.
When a user submits a prompt to an AI model, the text is converted into tokens. The model processes these tokens and generates new tokens as output.
Most major AI providers – including OpenAI, Google, and Anthropic – price their APIs on this basis. Customers are typically charged separately for input tokens and output tokens.
For enterprise users, token economics therefore become a direct operational cost consideration.
A customer service platform handling millions of daily interactions, a government document processing system, or an AI-assisted legal research platform all generate token demand that scales directly with usage.
The concept of a context window is also measured in tokens. A larger context window allows a model to process and reason across longer documents, larger codebases, or extended conversations within a single interaction.
Why it matters for infrastructure
Tokens are effectively the bridge between AI user demand and physical compute demand.
Token throughput – the number of tokens a platform can process per second – becomes a direct measure of AI infrastructure capacity.
As AI adoption expands across enterprise and government environments in the GCC, aggregate token demand will become an increasingly meaningful infrastructure planning variable for cloud providers and data centre operators.
The growth in context window size compounds this challenge.
Models capable of processing entire books, large technical documents, or extensive code repositories require materially more GPU memory and higher-performance infrastructure than earlier generations of AI systems.
For infrastructure providers, token growth is ultimately a quantifiable expression of AI-driven compute demand. It connects user behaviour directly to rack density, network throughput, and power consumption.
- Accelerated Compute
What it is
Accelerated compute refers to the use of specialised processors to handle workloads that would be impractically slow on conventional CPUs.
In practice, this usually means GPUs, but it also includes custom AI chips such as Google’s Tensor Processing Units (TPUs), Amazon’s Trainium and Inferentia processors, and emerging AI-focused silicon from companies including Cerebras, Graphcore, and Groq.
The term reflects a broader architectural shift in computing. For decades, most data centres were built around general-purpose CPUs designed to handle a wide range of tasks reasonably well. AI is changing that model.
Increasingly, compute environments are being designed around purpose-built hardware optimised for highly parallel AI workloads.
Importantly, this transition is no longer confined to frontier AI training campuses.
Accelerated compute is now moving into enterprise data centres, cloud platforms, and increasingly the edge – wherever AI inference is being deployed at scale and where running these workloads on conventional CPUs no longer makes economic sense.
Why it matters for infrastructure
For the data centre industry, accelerated compute is not simply a change in server specifications.
It represents a structural shift in what facilities are expected to do and how they must be designed.
Traditional CPU-based environments were built around moderate rack densities, predictable thermal loads, and air cooling. Accelerated compute environments operate very differently.
AI infrastructure routinely pushes rack densities beyond what conventional air cooling can manage efficiently, driving liquid cooling from an engineering edge case into a mainstream facility requirement.
This creates major commercial implications.
Accelerated compute infrastructure typically carries a materially higher capital cost per megawatt than conventional data centre deployments. Operators must also develop new operational capabilities around liquid cooling, high-density power delivery, and GPU-focused infrastructure management.
The shift raises strategic questions for operators:
- How should existing general-purpose capacity be repositioned?
- How should GPU-dense space be priced differently from standard colocation?
- What operational expertise will be required to manage increasingly AI-focused environments?
For the GCC, accelerated compute sits at the centre of regional sovereign AI ambitions.
The UAE’s NVIDIA-backed initiatives, Saudi Arabia’s hyperscale AI investments, and broader Gulf sovereign compute strategies are all, fundamentally, investments in accelerated compute capacity.
Understanding the term is therefore essential not only from a technical perspective, but from a commercial and geopolitical one as well.
- Liquid Cooling vs. Air Cooling
What it is
Cooling is not a peripheral issue in AI infrastructure. It is one of the defining engineering challenges of the AI era.
As GPU rack densities have climbed from the 10 to 20 kW typical of conventional enterprise environments toward 50, 80, and in some cases well over 100 kW per rack, the method used to extract heat has become a major design and commercial variable.
For most of the data centre industry’s history, air cooling was the default approach. Conditioned air was circulated through the facility, passed through server racks to absorb heat, and then returned to cooling systems for reconditioning. At conventional rack densities, this model worked effectively. It was well understood, relatively straightforward to operate, and widely supported by equipment manufacturers.
AI infrastructure has now pushed well beyond the point where air cooling can efficiently manage thermal loads.
According to researchers, average rack power density more than doubled between 2022 and 2024, rising from around 8 kW to 17 kW, and is projected to reach roughly 30kW by 2027. AI training racks are already operating significantly ahead of that average.
The practical ceiling for air cooling sits at approximately 35-40kW per rack. Beyond this point, moving enough conditioned air through standard facility environments becomes increasingly difficult from both a physical and operational perspective.
This challenge is accelerating quickly.
NVIDIA’s Blackwell GB200 NVL72 platform already requires more than 120kW per rack – well beyond the practical limits of conventional air cooling. Schneider Electric expects next-generation AI architectures to push densities even higher over the coming product cycle.
The industry’s response has been rapid.
By 2024, liquid-based cooling already accounted for 46% of the global data centre cooling market, according to Mordor Intelligence. Dell’Oro Group reported that the liquid cooling market nearly doubled during 2025, approaching $3Bn in annual value, and forecasts it could reach $7Bn by 2029.
At the same time, the market is steadily moving away from purely air-cooled environments. A 2025 survey by S&P Global Market Intelligence found that only 45% of data centres now operate entirely on-air cooling, while 59% plan to implement liquid cooling within the next five years.
This transition is taking several forms.
Direct-to-chip cooling
Direct-to-chip cooling routes chilled liquid through cold plates mounted directly onto GPUs and CPUs, absorbing heat at the source before it enters the surrounding air.
This is currently the dominant form of liquid cooling in AI infrastructure and accounts for almost half of the liquid cooling market.
Major server manufacturers including Dell, HPE, and Supermicro now support direct-to-chip cooling natively within their latest AI server platforms.
Hyperscalers are also deploying the technology at scale. Microsoft began fleet-wide deployment of direct-to-chip cooling across Azure campuses during 2025, while AWS developed its own internal coolant distribution system and reported a 46% reduction in mechanical cooling energy consumption during peak loads.
Immersion cooling
Immersion cooling takes a more radical approach by submerging servers entirely in a thermally conductive, electrically non-conductive liquid.
Single-phase immersion keeps the coolant in liquid form throughout the process, while two-phase immersion allows it to vaporise and recondense to achieve very high thermal efficiency.
Immersion environments can support rack densities of 100 to 250kW and beyond, but they require purpose-built tanks and specialised server configurations.
CoreWeave, one of the world’s largest GPU cloud providers, has designed all new facilities from 2025 onwards around liquid cooling. Its current GB200 deployments reportedly operate with approximately 85% liquid cooling and 15% air cooling.
Why it matters for infrastructure
The shift from air cooling to liquid cooling is not an incremental engineering upgrade. It represents a fundamental redesign of the modern data centre environment.
Facilities originally designed around conventional air cooling cannot simply be converted into high-density AI environments without substantial investment, operational disruption, and infrastructure redesign.
JLL estimates that liquid cooling retrofits can be 20-30% cheaper than equivalent high-density air-cooling upgrades. Even so, the scale of reengineering involved remains significant.
This is beginning to create a widening distinction between legacy data centre stock and genuinely AI-capable infrastructure – a divide that is likely to affect:
- Asset valuations
- Tenant retention
- Infrastructure competitiveness
- Long-term relevance
Liquid cooling also introduces entirely new operational disciplines.
Operators must manage:
- Leak detection
- Pipe integrity
- Water treatment
- Coolant management
- New maintenance and operational procedures
The operational challenge is growing quickly.
The Uptime Institute found that average rack density increased by 38% between 2022 and 2024, with the steepest growth occurring in hyperscale and AI deployments. In many cases, infrastructure demand is evolving faster than operators can build the internal expertise required to support it.
The capability gap between operators who can support liquid-cooled AI environments and those who cannot is already widening.
For the GCC, the cooling question carries an additional and highly regional dimension. Liquid cooling systems that rely heavily on evaporative cooling towers consume significant volumes of water – a sensitive issue across much of the Gulf.
This is increasing interest in:
- Closed-loop liquid cooling systems
- Water-efficient thermal management
- Waste heat recovery applications
- More sustainable cooling architectures
In a region defined by high ambient temperatures, cooling strategy is not simply an engineering consideration.
It is increasingly a sustainability, regulatory, operational, and commercial one as well -with direct implications for operating cost, infrastructure efficiency, and long-term social licence to build.
- Neoclouds
What it is
A neocloud is a category of infrastructure company that emerged in response to the GPU shortages accompanying the AI investment surge of 2022 and 2023.
Unlike hyperscalers such as Microsoft Azure, Amazon Web Services, and Google Cloud, which provide broad cloud ecosystems spanning storage, databases, software platforms, and managed services, neoclouds focus more narrowly on GPU compute capacity.
Their proposition is relatively straightforward: access to high-performance AI infrastructure without the broader hyperscale platform wrapped around it.
Leading examples include CoreWeave, Lambda Labs, and Nebius.
In the GCC, regionally focused GPU cloud offerings are also beginning to emerge, particularly through sovereign-backed initiatives seeking to commercialise nationally controlled AI compute capacity.
Neoclouds are not simply smaller hyperscalers. They occupy a structurally different position in the market, operating as asset-heavy businesses built around GPU procurement and AI-ready infrastructure.
Why it matters for infrastructure
Neoclouds matter for the data centre industry in several ways.
First, they are becoming major infrastructure demand drivers in their own right.
CoreWeave, for example, has signed some of the largest data centre leasing agreements ever recorded as it seeks GPU-ready colocation capacity to support customer growth.
As the category expands, neoclouds are becoming an increasingly important tenant profile for operators globally.
Second, the rise of neoclouds validates the existence of a large underserved market for GPU access.
The hyperscalers were not fully meeting this demand, whether because of pricing, procurement complexity, or infrastructure constraints. The resulting market gap attracted substantial venture capital and public market investment into GPU-focused infrastructure providers.
Third, and particularly relevant for the GCC, the neocloud model provides a potential commercial framework for sovereign AI infrastructure.
Rather than building compute environments solely for internal government use, Gulf states can deploy sovereign compute infrastructure commercially – providing GPU capacity to regional and international AI tenants while simultaneously supporting wider national digital economy objectives.
This positions sovereign compute not simply as a cost centre, but as a revenue-generating infrastructure platform.
- Agentic AI
What it is
Agentic AI refers to systems capable of autonomously planning and executing sequences of actions in pursuit of a broader objective.
Traditional AI interactions are generally prompt-and-response based. A user asks a question and receives an answer.
Agentic systems operate differently.
A user may instead provide a broader goal – researching a topic, drafting a report, reviewing a contract, or completing a workflow – and the system independently determines the required steps, executes them using available tools and data sources, and adapts its behaviour if obstacles emerge.
Examples already appearing in the market include:
- AI coding agents
- Autonomous research assistants
- Enterprise workflow automation systems
- AI agents capable of interacting across software platforms such as email, calendars, CRMs, and document systems
The major AI labs including OpenAI, Anthropic, Google DeepMind, and Microsoft are all investing heavily in agentic systems.
Why it matters for infrastructure
Agentic AI is likely to become one of the next major infrastructure challenges following the current wave of LLM deployment.
The first implication is persistence: Conventional inference workloads are relatively stateless. A query is processed and the compute resource is released.
Agentic systems operate across much longer timeframes. Tasks may continue for minutes, hours, or even days, fundamentally changing the utilisation profile of AI infrastructure.
The second implication is connectivity: Agentic systems interact continuously with external tools, APIs, enterprise software platforms, and databases. This increases the importance of low-latency, high-bandwidth connectivity between AI infrastructure and broader digital ecosystems.
The third implication is scale: The future infrastructure challenge is unlikely to involve a single AI agent. It is more likely to involve fleets of continuously operating agents working in parallel across enterprise environments.
For operators and cloud providers, this creates a potentially significant upward revision to long-term inference demand expectations.
For the GCC, agentic AI is particularly relevant because many national digital transformation agendas are built around automation, workflow optimisation, and smart service delivery.
Smart city systems, logistics platforms, government services, and financial applications are all likely candidates for agentic deployment.
Each represents a growing source of persistent inference demand that will require regional infrastructure to support it.
Conclusion
The concepts explored in this article are not abstract technical definitions.
They are the language of a structural market transformation already reshaping facility design, infrastructure investment, capital allocation, and government strategy across the global data centre industry.
For GCC infrastructure professionals, the opportunity is significant. The Gulf is not simply a passive recipient of global AI trends. It is becoming an increasingly active participant, supported by sovereign capital, strategic intent, growing energy availability, and ambitious national digital economy agendas.
