Language modeling

Pre-trained models

Name License Size Date Training data Metadata
AraBERT @ GitHub License, commercial use ✔️ BERT-Base architechture Feb 2020 ~70M sentences or ~23GB (MSA) Segmenter
Arabic-BERT MIT BERT-BASE Mar 2020 ~8.2 Billion words ~95GB  
hULMonA @ GitHub undefined ULMFiT architecture Aug 2019 600,559 Wikipedia articles ~108M words  

More resources coming soon stay tuned ! 🤩 You are welcome to contribute to this project ! 🙏