Language modeling

Pre-trained models

Name	License	Size	Date	Training data	Metadata
AraBERT @ GitHub	License, commercial use ✔️	BERT-Base architechture	Feb 2020	~70M sentences or ~23GB (MSA)	Segmenter
Arabic-BERT	MIT	BERT-BASE	Mar 2020	~8.2 Billion words ~95GB
hULMonA @ GitHub	undefined	ULMFiT architecture	Aug 2019	600,559 Wikipedia articles ~108M words

More resources coming soon stay tuned ! 🤩 You are welcome to contribute to this project ! 🙏