Data Catalog
Sovereign corpora curated for high-impact AI research.
Total Sets: 07
Dataset Name
Dialect
Size
Access
Action
arabic-dialect-corpus
Multi-Dialect
507k rows
Open
egyptian-dialogue
Egyptian
4.3k rows
Open
egyptian-songs
Egyptian
3k rows
Open
eg-legal-reasoning
MSA (Legal)
1k rows
Open
arabic-feedback-corpus
Multi-Dialect
1.5k rows
Open
KSA-Legal-Instruct-v2
Saudi (Najdi)
100k rows
Sovereign
Gulf-Finance-Dialogue
Gulf (Emirati)
300k rows
Commercial