Organization

MBZUAI — Mohamed bin Zayed University of Artificial Intelligence

Profile of MBZUAI, the world's first graduate AI university — Jais LLM research, Institute of Foundation Models, and contributions to Arabic AI academic excellence.

The Mohamed bin Zayed University of Artificial Intelligence, established in Abu Dhabi in 2019, holds the distinction of being the world’s first graduate-level university dedicated entirely to AI research and education. Named after the UAE’s president, MBZUAI operates at the intersection of academic research and sovereign AI development, contributing both trained researchers and technical innovations to the Arabic AI ecosystem.

MBZUAI’s most visible contribution to Arabic AI is its role in the Jais LLM partnership with G42 and Cerebras. The university’s Institute of Foundation Models provides the research expertise that guides Jais’s training methodology, evaluation strategy, and safety framework development. This academic contribution ensures that Jais development follows rigorous research standards while benefiting from G42’s computational resources and Cerebras’s hardware capabilities.

The university’s faculty includes researchers with publication records at top AI venues including NeurIPS, ICML, ACL, and EMNLP. Graduate students contribute to Arabic AI research across natural language processing, computer vision, speech processing, and machine learning theory. The university’s location in Abu Dhabi provides access to the Gulf AI ecosystem’s resources and partnerships while maintaining academic independence.

MBZUAI’s broader impact includes training the human capital that the Gulf AI ecosystem requires. Graduates enter research positions at TII, G42, and other Gulf AI organizations, as well as international positions that extend the university’s network globally.

Jais LLM Research Contributions

MBZUAI’s role in the Jais LLM partnership extends beyond advisory input to fundamental research that shapes model design. The university’s Institute of Foundation Models provides the scientific methodology governing training data curation, tokenizer design, attention mechanism optimization for Arabic syntax, and evaluation framework development. This academic rigor ensures that Jais development follows reproducible research standards — training decisions are documented, ablation studies validate architectural choices, and evaluation uses established benchmarks rather than ad hoc testing.

The Jais development timeline illustrates MBZUAI’s research influence across four model generations. Jais-13B (August 2023) established the foundational Arabic training methodology with 116 billion Arabic tokens. Jais-30B (November 2023) validated scaling laws for Arabic LLMs. The 2024 Family Release (20 models, 590M to 70B parameters) provided empirical data on Arabic model scaling across the full parameter spectrum. Jais 2 (December 2025, 70B parameters, 600B+ Arabic tokens) synthesized four generations of research into a flagship model covering 17 dialects and Arabizi.

MBZUAI researchers contributed to the training data curation strategy that distinguishes Jais from adapted models. The insistence on native Arabic content — removing machine-translated text through quality classifiers — reflects research demonstrating that translated Arabic training data introduces systematic artifacts that benchmarks often miss but Arabic speakers immediately perceive. The dialect-aware training strategy, explicitly targeting 17 regional varieties, emerged from linguistic research on Arabic dialectal variation conducted at MBZUAI.

Arabic NLP Research Programs

Beyond the Jais partnership, MBZUAI conducts Arabic NLP research across multiple fronts. Natural language processing research addresses Arabic-specific challenges including morphological analysis (Arabic averages 12 morphological analyses per word with over 300,000 possible POS tags), dialectal identification (the NADI shared task series evaluates nuanced Arabic dialect identification), diacritization (adding short vowels that disambiguate Arabic text), and named entity recognition (complicated by Arabic’s lack of capitalization and vowel-omitting orthography).

Computer vision research at MBZUAI includes Arabic OCR (optical character recognition), Arabic scene text detection, and Arabic document understanding — capabilities essential for agent systems that process physical Arabic documents. Speech processing research addresses Arabic ASR challenges including dialectal variation, speaker mismatch across dialects, and hallucination in generative ASR models.

Machine learning theory research at MBZUAI contributes foundational advances in training methodology, optimization algorithms, and architectural design that inform Arabic LLM development across the ecosystem — not only Jais but any Arabic model developer who reads MBZUAI publications.

Relationship with CAMeL Lab and NYU Abu Dhabi

MBZUAI’s NLP research complements the work of the CAMeL Lab at NYU Abu Dhabi, the other major Arabic NLP research group in the UAE. The CAMeL Lab, established in September 2014 under Dr. Nizar Habash, maintains the most comprehensive suite of Arabic NLP tools including CAMeL Tools (Python NLP suite), MADAMIRA (morphological tagger), CALIMA Star (morphological analyzer), YAMAMA (multi-dialect analyzer running 5x faster than MADAMIRA), and CaMeL Parser (dependency parser).

The CAMeL Lab’s corpora resources — GUMAR (100M words of Gulf Arabic), MADAR (25 city dialects), CaMeL Treebank (188K words from pre-Islamic poetry to social media), QALB (2M manually corrected words), and SAMER (26K lemma readability lexicon) — provide gold-standard Arabic linguistic data that MBZUAI and other researchers use for evaluation and analysis. The complementary relationship between MBZUAI’s model-development focus and CAMeL Lab’s tool and data focus creates a rich Arabic NLP research ecosystem within Abu Dhabi alone.

Open Arabic LLM Leaderboard Contribution

MBZUAI’s research involvement extends to evaluation infrastructure. The Open Arabic LLM Leaderboard, launched in May 2024 by 2A2I, TII, and Hugging Face, provides the standardized evaluation framework used by the Arabic AI community. MBZUAI researchers contribute to benchmark development, evaluation methodology refinement, and results analysis. The OALL’s evolution from v1 (including translated tasks) to v2 (native Arabic only, with ArabicMMLU, ALRAGE, AraTrust, and MadinahQA) reflects the research community’s insistence on evaluation that captures genuine Arabic capability rather than translation processing.

With over 700 models submitted from more than 180 organizations, the OALL serves as the community standard for Arabic LLM comparison. MBZUAI’s academic standing lends credibility to the evaluation framework, ensuring that leaderboard rankings carry weight in both research and commercial contexts.

Talent Pipeline and Industry Impact

MBZUAI’s role as a talent pipeline shapes the Gulf AI ecosystem’s human capital. Graduate students trained in Arabic NLP, foundation model development, and evaluation methodology enter positions at G42 (Jais development), TII (Falcon development), SDAIA/HUMAIN (ALLaM development), and other Gulf AI organizations. This talent distribution creates a shared methodological foundation across competing organizations — researchers trained at MBZUAI bring common standards, evaluation practices, and research culture to their respective employers.

The university’s international faculty recruitment brings global AI expertise to the Gulf. Researchers from leading international institutions — Stanford, MIT, CMU, Tsinghua, ETHZ — bring diverse research perspectives that enrich the Gulf AI ecosystem. Their publication records at NeurIPS, ICML, ACL, and EMNLP ensure that MBZUAI’s research meets international peer review standards.

SDAIA’s national strategy targeting 20,000 AI specialists across Saudi Arabia creates demand for AI talent that MBZUAI helps supply. While MBZUAI serves primarily the UAE, its graduates and training methodologies influence AI education across the Gulf, contributing to the broader MENA goal of developing indigenous AI capability rather than depending on foreign talent and technology.

MBZUAI’s Academic Programs and Research Output

MBZUAI’s graduate programs — Master of Science and Doctor of Philosophy in Machine Learning, Computer Vision, and Natural Language Processing — are fully funded, covering tuition, living expenses, and research resources. This full-funding model attracts international talent that might otherwise pursue graduate studies at MIT, Stanford, or Cambridge, redirecting AI research talent toward the UAE and Arabic AI research specifically.

The university’s research output includes publications at NeurIPS, ICML, ACL, EMNLP, CVPR, and other top AI conferences — contributing both Arabic-specific research (Arabic LLM training, dialectal NLP, cultural alignment) and general AI methodology (training optimization, architecture design, evaluation frameworks). This dual contribution ensures that MBZUAI’s research is relevant to both the Arabic AI community and the global AI research ecosystem.

The Jais collaboration with G42 and Cerebras represents MBZUAI’s highest-profile research partnership. MBZUAI researchers contribute training methodology, evaluation design, and safety framework development to the Jais project — applying academic rigor to a commercial-scale Arabic LLM development program. The partnership demonstrates that academic-commercial collaboration can produce results that neither sector could achieve independently: G42 provides computing resources and commercial deployment, Cerebras provides hardware innovation, and MBZUAI provides research methodology and evaluation expertise.

MBZUAI’s Role in the Arabic AI Research Ecosystem

Within the Arabic AI research ecosystem, MBZUAI occupies a position complementary to other key institutions. Where NYU Abu Dhabi’s CAMeL Lab focuses on Arabic NLP tools and linguistic resources, MBZUAI focuses on foundation model development and training methodology. Where TII focuses on architectural innovation (hybrid Mamba-Transformer), MBZUAI focuses on training data curation and model evaluation. Where KAUST focuses on cultural alignment methodology (AceGPT’s RLAIF), MBZUAI focuses on scaling laws and efficiency optimization.

These complementary research directions — coordinated through publications, conferences, and informal collaboration rather than centralized planning — create a distributed Arabic AI research program that covers the full research landscape. The result is an Arabic AI research ecosystem that is more diverse and comprehensive than any single institution could produce.

MBZUAI’s Institute of Foundation Models provides focused research infrastructure for Arabic LLM development. The institute maintains computing resources, evaluation benchmarks, and research staff dedicated to foundation model research — creating the institutional continuity needed for multi-year research programs that produce models like Jais 2. The institute’s research agenda addresses Arabic-specific training methodology, cross-lingual transfer learning between Arabic and English, and evaluation framework development for Arabic AI systems.

MBZUAI’s Contribution to Open Arabic AI

MBZUAI’s commitment to open research and open-weight model distribution has shaped the Arabic AI ecosystem’s openness. The Jais models’ open-weight availability — enabling any organization to download, fine-tune, and deploy Arabic LLMs without licensing fees — reflects MBZUAI’s academic ethos applied to commercial-scale AI development. The 700+ model submissions to the OALL from 180+ organizations demonstrate the research community engagement that open-weight availability enables.

The university’s researchers contribute to Arabic AI benchmarks, evaluation tools, and training methodology publications that benefit the entire Arabic AI community. This community contribution — sharing research infrastructure alongside research results — accelerates Arabic AI development across institutions that lack MBZUAI’s resources, extending the impact of Abu Dhabi’s investment in AI education and research to the broader Arabic-speaking world.

The MENA AI ecosystem’s growth — $858 million in AI VC during 2025, UAE AI market projected to reach $4.25 billion by 2033 — creates increasing demand for the AI talent that MBZUAI produces. Graduates enter research positions at G42, TII, and other Gulf AI organizations, commercial roles at MENA AI startups, and academic positions at universities across the Arabic-speaking world. This talent distribution network extends MBZUAI’s institutional impact beyond Abu Dhabi to the entire Arabic AI ecosystem.

MBZUAI’s institutional significance extends beyond its research output to its role as the Arabic AI ecosystem’s talent pipeline. Every graduate trained at MBZUAI carries expertise in Arabic AI development — training methodology, evaluation design, safety frameworks, dialectal NLP — into their subsequent career, whether at G42, TII, HUMAIN, or the growing number of Arabic AI startups. This talent multiplication effect means that MBZUAI’s investment in graduate education generates returns across the entire Arabic AI ecosystem rather than concentrating capability at a single organization.

The university’s position at the intersection of fundamental AI research and Arabic language technology creates unique research opportunities unavailable at institutions focused on either dimension alone. Researchers at MBZUAI can explore how general AI methodology applies to Arabic-specific challenges — and how Arabic-specific insights can advance general AI understanding — in ways that institutions without Arabic AI research depth or without fundamental AI research breadth cannot pursue.

Institute of Foundation Models

MBZUAI’s Institute of Foundation Models focuses specifically on large-scale AI model research, providing the academic research infrastructure that supports Jais development and advances Arabic foundation model methodology. The institute’s research agenda spans pre-training methodology, efficient training techniques, evaluation framework design, and Arabic-specific model optimization. Faculty and students at the institute publish at top-tier AI conferences (NeurIPS, ICML, ACL, EMNLP), ensuring that MBZUAI’s Arabic AI research meets international standards and contributes to the global AI research community.

MENA AI Companies — Full company directory
Arabic LLMs — Foundation model coverage
Jais — Arabic LLM — Model partnership output
G42 Profile — Commercial partner
TII Profile — Research ecosystem peer
CAMeL Tools — Complementary NLP research
OALL Analysis — Evaluation contribution
Arabic AI Datasets — Research data resources

MbzuaiMENA AIArabic AI