Core42 sets new benchmark for Arabic large language models with release of Jais 30B
Core42, a G42 company and the UAE-based national-scale enabler for cloud and generative AI, announced the launch of Jais 30B, the newest and most proficient version of its open-source Arabic Large Language Model (LLM).
Featuring 30 billion parameters, this new iteration of Jais follows the release in August 2023 of the 13 billion parameter model, underscoring Core42's commitment to provide a rich linguistic and culture-focused generative AI experience for the over 400 million Arabic speakers worldwide.
Jais, born from the collaboration between Inception, now converged into Core42; Mohamed bin Zayed University of Artificial Intelligence (MBZUAI); and Cerebras Systems, immediately set a benchmark in the Arabic LLM landscape.
The model was trained on the Condor Galaxy-1 (CG-1) - one of the world's fastest AI supercomputers, with four exaFLOPS of training compute, 54 million cores, and 64-nodes - built by G42 in partnership with Cerebras Systems. Jais 13B went from concept to fine-tuned, leading open-source model in less than four months. Notably, the production training run for Jais 13B was completed in 21 days on CG-1.
The new Jais 30B model was trained on a substantially larger dataset than its predecessor, made of 126 billion Arabic tokens, 251 billion English tokens, and 50 billion code tokens and shows an increased performance across all key indicators. It offers 160% longer and more detailed answers in Arabic and a 233% increase in English, reflecting significant improvements in language generation.
The model also presents better performance in summarisation (53% in Arabic and 85% in English) and formatting (130% in Arabic and 134% in English). Jais 30B performance is now on par with monolingual English models, outperforming most open-source models in Foundation Model evaluations.
Jais 30 B's enhancements have been tested and validated using heuristic, cross-model comparison, and human evaluations, showing that the responses of the model's fine-tuned iterations outperform those of Jais 13B 96% of the time in Arabic and 97% in English.
Reaffirming its dedication to responsible and safe AI practices, the developing team has also further enhanced its processes and policies to guardrail biases and the production of hateful or harmful content by the model, a process made easier by its open-source release.
Jais's versatility and unique capabilities in the Arabic language domain have already shown promise in applications across various sectors including telecommunications, energy, education, healthcare and innovative solutions for the marketing communications industry.
Dr. Andrew Jackson, EVP, Chief AI Officer of Core42, said,
"The launch of Jais 30B marks another significant milestone for Core42 and represents a giant leap forward for the Arabic-speaking world in harnessing the potential of generative AI. This release underscores the powerful synergy between Core42's technological leadership, our extensive partner ecosystem, and our shared dedication to pushing the boundaries of what's possible in AI. I eagerly anticipate close collaboration with our customers and partners to explore new applications and continually enhance the model's capabilities as we intensify our efforts to create top-quality LLMs for various other languages."
Andrew Feldman, CEO and co-founder of Cerebras Systems, said,
"Less than eight weeks after we introduced Jais 13B to the global Arabic-speaking community, the Core42 and Cerebras teams have delivered a new state-of-the-art LLM that is more than double in size. Jais 30B leverages the incredible, massive compute of Condor Galaxy 1 to set another record in bilingual performance and impressively fast training time."
News Source: Emirates News Agency