Condor Galaxy Supercomputer — G42 and Cerebras Multi-ExaFLOP AI Training Infrastructure

The Condor Galaxy 1 supercomputer represents the computational foundation upon which Jais and other Arabic foundation models are built. Constructed through a partnership between G42 and Cerebras Systems, this multi-exaFLOP AI training system based on Cerebras CS-2 wafer-scale engines provides the raw computational throughput necessary for training models at the 70-billion parameter scale within commercially viable timeframes.

Wafer-Scale Computing

The Cerebras CS-2 wafer-scale engine at the heart of Condor Galaxy 1 takes a fundamentally different approach to AI computing than conventional GPU clusters. Rather than distributing computation across thousands of individual GPU chips connected by network links, each CS-2 integrates 850,000 AI-optimized compute cores onto a single wafer-scale chip — the largest chip ever built. This integration eliminates the communication bottlenecks that limit GPU cluster efficiency, particularly for the large matrix operations that dominate transformer training.

For Arabic LLM training, the wafer-scale approach delivers specific advantages. Arabic tokenizers typically produce longer sequences than English tokenizers for equivalent semantic content, increasing the computational cost per training example. The CS-2’s on-chip memory bandwidth and inter-core communication speed maintain efficiency at these longer sequence lengths, whereas GPU clusters experience degraded utilization as sequence lengths increase.

Training Capacity

Condor Galaxy 1’s multi-exaFLOP performance enables training runs that would take months on conventional GPU clusters to complete in weeks. The Jais 2 training campaign — 70 billion parameters on 600+ billion Arabic tokens — represents one of the largest single training runs conducted on Cerebras hardware, and the system’s performance characteristics were instrumental in making this scale of training economically feasible.

The training infrastructure’s efficiency is measured not just in peak FLOPS but in sustained utilization during actual training runs. Condor Galaxy 1 achieves high sustained utilization because the wafer-scale architecture eliminates the network communication overhead that reduces effective utilization in GPU clusters during distributed training. This efficiency advantage means that the actual training cost per token is lower than headline FLOPS comparisons might suggest.

Wafer-Scale Efficiency for Arabic Training

The efficiency advantages of wafer-scale computing become particularly pronounced when training Arabic LLMs. Arabic’s morphological complexity means that expressing equivalent semantic content requires more tokens than English — a single Arabic word can encode subject, verb, object, tense, gender, number, and person through morphological inflection that English distributes across multiple words. The CAMeL Lab at NYU Abu Dhabi has documented over 300,000 possible POS tags for Arabic versus approximately 50 for English, with an average of 12 morphological analyses per word. This complexity translates directly to longer training sequences for equivalent content coverage.

The CS-2’s on-chip memory bandwidth and inter-core communication architecture maintain near-linear utilization at these longer sequence lengths. Conventional GPU clusters, connected by network fabric that introduces communication overhead during distributed training, experience utilization degradation as sequence lengths increase. For the Jais training campaigns — where Arabic token sequences systematically exceed English equivalents in length — this efficiency differential compounds across billions of training examples to produce meaningful cost savings.

The training pipeline for Jais 2 processed 600+ billion Arabic tokens across 17 identified regional dialects, supplemented by English training data for bilingual capability. The corpus included Modern Standard Arabic from news and academic sources, dialectal Arabic from social media and transcribed broadcast content, classical Arabic from literary and religious texts, and Arabizi from messaging platforms. Each content category presents different sequence length distributions and computational characteristics. The CS-2’s ability to maintain high utilization across this variance — short social media posts and long academic papers processed with equal efficiency — reflects the architectural advantage of integration over distribution.

Comparison with Competing Infrastructure

Condor Galaxy 1’s position in the Arabic AI infrastructure landscape must be assessed alongside Saudi Arabia’s HUMAIN data center program. HUMAIN’s planned 11 data centers across two campuses, targeting 1.9 GW capacity by 2030 and 6 GW by 2034 at an estimated total cost of $77 billion, represents a fundamentally different approach to AI computing infrastructure. Where Condor Galaxy 1 uses specialized Cerebras wafer-scale hardware optimized for training efficiency, HUMAIN’s infrastructure is designed for general-purpose AI computing — training, serving, and cloud deployment — using conventional GPU hardware from NVIDIA and AMD.

The xAI partnership, establishing a 500 MW data center in Saudi Arabia, and partnerships with AWS for cloud deployment provide HUMAIN with infrastructure breadth that Condor Galaxy 1’s specialized design does not offer. However, for the specific task of Arabic LLM training, the CS-2’s architectural advantages in sequence processing and sustained utilization provide efficiency that general-purpose GPU clusters cannot match.

TII’s computing infrastructure for Falcon model development takes yet another approach. The Abu Dhabi research institute maintains dedicated GPU clusters optimized for research workloads — flexible enough to support the architectural experimentation that produced the hybrid Mamba-Transformer design of Falcon-H1 Arabic. TII’s infrastructure prioritizes research agility over raw training throughput, reflecting the institute’s focus on architectural innovation rather than simply scaling existing designs.

Strategic Implications

The existence of Condor Galaxy 1 in the UAE provides G42 with sovereign AI training capability — the ability to train frontier models without depending on foreign cloud providers or computing infrastructure. For a nation investing billions in AI as a strategic priority, this sovereignty over the training pipeline is as significant as sovereignty over the training data.

The G42-Cerebras partnership also provides commercial advantages. Jais 2 is both trained and served on Cerebras systems, with the company reporting that their ML techniques uniquely enabled by Cerebras hardware achieved state-of-the-art quality using only a fraction of the compute used to train similar-sized models in the past. This efficiency claim, if validated at scale, would make Arabic AI training significantly more cost-effective than training on conventional GPU infrastructure.

Microsoft’s $2.3 billion investment in G42 in 2024 provides additional context. The investment connects G42’s sovereign computing capability to Microsoft’s Azure cloud ecosystem, enabling Jais deployment on Azure while maintaining training sovereignty on Condor Galaxy. This arrangement — train on sovereign infrastructure, deploy through global cloud — represents a model that other nations developing AI sovereignty strategies are studying closely.

Future Infrastructure Trajectory

The Condor Galaxy program’s roadmap extends beyond the CG-1 system. Cerebras has announced plans for next-generation wafer-scale chips with increased core counts, higher memory bandwidth, and improved inter-core communication. For Arabic AI training, these improvements would enable larger model sizes, longer context windows, and more complex training procedures — supporting the continued evolution of the Jais model family.

The broader competition among Condor Galaxy, HUMAIN’s data centers, and TII’s research clusters creates a distributed computing ecosystem across the Gulf states that collectively represents the largest AI infrastructure investment outside the United States and China. This investment density ensures that Arabic AI development is not constrained by computing resources — a situation that contrasts sharply with Arabic AI research in less wealthy Arabic-speaking countries where infrastructure limitations remain the primary bottleneck.

Power Efficiency and Operational Economics

Condor Galaxy 1’s operational economics differ fundamentally from GPU-based data centers due to the CS-2’s power efficiency profile. Traditional GPU clusters — the infrastructure powering HUMAIN’s planned 11 data centers and the Stargate UAE computing cluster — distribute computation across thousands of discrete chips, each with its own power supply, cooling requirements, and interconnect overhead. The CS-2’s single-wafer integration reduces the total system power per useful FLOP by eliminating network switch power consumption, reducing memory hierarchy power overhead, and consolidating cooling to a single thermal management system per wafer rather than per chip.

For Arabic AI training specifically, where the longer token sequences characteristic of Arabic text increase computation per training example, the power efficiency advantage of wafer-scale computing translates directly to lower training costs per Arabic token processed. This efficiency differential compounds across the hundreds of billions of tokens in Jais 2’s training corpus, producing meaningful cost savings that shape the economics of Arabic LLM development. The G42-Cerebras partnership’s claim that their ML techniques achieved state-of-the-art quality using a fraction of the compute used for similar models reflects this architectural efficiency advantage rather than algorithmic shortcuts.

The operational economics also influence the pace of model iteration. Lower training costs per run enable more frequent experimental training campaigns — testing new data compositions, evaluating architectural modifications, validating safety framework updates — at each increment advancing Arabic AI capability. The rapid progression from Jais-13B through Jais 2 across four generations in approximately two years reflects the economic feasibility that Condor Galaxy’s efficiency enables. Competing development programs relying on conventional GPU clusters face proportionally higher per-experiment costs, potentially constraining iteration speed.

Geopolitical Context of Sovereign Computing

The geopolitical significance of Condor Galaxy 1 extends beyond technical capability to questions of AI supply chain independence. Global GPU supply constraints — driven by unprecedented demand from AI development programs worldwide — have created allocation bottlenecks that could delay model training timelines for organizations dependent on NVIDIA hardware procurement. Condor Galaxy’s Cerebras-based architecture provides G42 with an alternative compute pathway that bypasses GPU supply chains entirely, insulating Arabic AI development from the procurement delays that have affected competing programs.

The US-UAE technology relationship adds additional context. Following the October 2024 framework governing technology transfers, the Condor Galaxy program’s continued operation reflects bilateral agreement that sovereign AI computing capability in allied nations serves shared strategic interests. The Microsoft-G42 partnership, with its $2.3 billion investment, further integrates Emirati sovereign computing into the Western technology ecosystem — training on Condor Galaxy, deploying through Azure — creating mutual dependency that stabilizes the technology relationship.

Saudi Arabia’s contrasting approach — building HUMAIN’s data center infrastructure with conventional GPU hardware from NVIDIA and AMD — creates a parallel computing ecosystem in the Gulf. The combined computing capacity of Condor Galaxy, HUMAIN’s data centers (targeting 1.9 GW by 2030), and the Stargate UAE project (1 GW) will provide the Arabic-speaking world with AI computing infrastructure exceeding that of any non-US, non-China technology ecosystem. This concentration of computing power in the Gulf states shapes the geographic distribution of Arabic AI development, with consequences for Arabic-speaking countries in North Africa and the Levant that lack comparable sovereign computing resources.

The computing infrastructure landscape for Arabic AI continues to evolve as investment accelerates. The MENA AI ecosystem’s funding trajectory — $858 million in AI VC during 2025, Saudi Arabia’s $9.1 billion in 2025 AI funding, the UAE AI market projected to reach $4.25 billion by 2033 — ensures that infrastructure constraints will not limit Arabic AI development in the foreseeable future. Whether through wafer-scale computing (Condor Galaxy), conventional GPU clusters (HUMAIN, Stargate), or future quantum-classical hybrid architectures, the Gulf states’ infrastructure investment provides the computational foundation for continued advancement of Arabic LLMs, agentic AI systems, and the broader Arabic AI ecosystem that 400 million Arabic speakers depend on for digital language technology that serves their linguistic needs.

The Condor Galaxy program represents more than a supercomputer — it embodies the UAE’s commitment to sovereign computing capability that ensures Arabic AI development proceeds at the pace of ambition rather than the pace of foreign hardware procurement. The integration of Cerebras wafer-scale engines into a multi-exaFLOP training system provides computational sovereignty that complements the data sovereignty represented by Arabic-first training corpora and the model sovereignty represented by ground-up Arabic LLM architecture design.

The sustained investment in Condor Galaxy and its successor systems ensures that the UAE’s sovereign computing capability will scale alongside the ambitions of the Arabic AI development programs it supports. As model sizes increase, context windows expand, and multimodal capability demands grow, the wafer-scale computing architecture provides an efficiency trajectory that maintains economic feasibility for Arabic AI training campaigns at frontier scale.

Jais — Complete Model Profile — Models trained on Condor Galaxy
Jais 2 Deep Analysis — Latest training campaign details
G42 Company Profile — Corporate strategy
HUMAIN Data Center Program — Competing infrastructure investment
HUMAIN Company Profile — Saudi national AI company
TII Company Profile — Abu Dhabi research institute
Falcon-H1 Architecture — Alternative infrastructure approach
AI Sovereignty Analysis — Computing sovereignty strategy

Condor GalaxyG42CerebrasSupercomputerAI Infrastructure