Unlocking Arabic Sentiment: The Arabic Feedback Corpus
Introducing the Arabic Feedback Corpus: A specialized dataset of 1,504 authentic customer reviews for sentiment analysis in the logistics and delivery sector, featuring intensity scores and conflict flags.
Abstract
Sentiment analysis in the Arabic language faces a unique challenge due to the diglossic nature of the language, where Modern Standard Arabic (MSA) differs significantly from spoken dialects. This "Dialect Gap" is particularly pronounced in customer feedback, which is often rich in colloquialisms and emotional nuances. This paper introduces the Arabic Feedback Corpus, a domain-specific dataset comprising 1,504 authentic customer reviews from the logistics and delivery sector. We detail the collection process, the multi-dimensional annotation scheme (including intensity and conflict flags), and the dataset's application in enhancing automated quality monitoring and courier performance analysis.
1. Introduction
The dominance of Modern Standard Arabic (MSA) in available datasets has created a blind spot in Arabic Natural Language Processing (NLP). While MSA is adequate for news and formal literature, it fails to capture the raw, emotional, and dialect-rich language of unsatisfied customers. In the logistics sector, precise sentiment analysis is critical for customer retention. Generic models trained on MSA often misclassify or miss the nuance of dialectal feedback. We cultivated the Arabic Feedback Corpus to bridge this gap, providing high-quality, domain-specific data that reflects authentic user expression in tracking packages, reporting delays, and evaluating service.
2. Data Collection and Annotation
The dataset consists of 1,504 authentic customer reviews, representing a linguistic blend of Egyptian Arabic and Modern Standard Arabic. These are not synthetic examples but real voices sharing their experiences regarding delivery speed, courier behavior, order fulfillment, and customer service.
2.1 Annotation Methodology
Each entry was meticulously labeled to capture the localized sentiment:
- Sentiment Label: Classification as Positive, Negative, or Neutral.
- Intensity Score: A float value quantifying the strength or confidence of the expressed sentiment.
- Conflict Flags: A binary indicator for complex cases where the tone conflicts with the sentiment (e.g., a polite phrasing covering a negative review, or aggressive slang used in a positive context).
3. The Arabic Feedback Corpus
The dataset is hosted on Hugging Face and optimized for immediate integration into training pipelines.
3.1 Key Metrics
- Size: 1,504 Rows
- Format: Parquet (~75.2 kB)
- Domain: Logistics & Delivery
- Linguistic Composition: Egyptian Arabic / MSA Mix
3.2 Feature Columns
The dataset includes the following fields:
text: The raw review content.label: The sentiment classification.score: The intensity of the sentiment.domain: Fixed as 'Logistics'.is_conflict: Boolean flag for conflicting sentiment cues.
3.3 Usage
The dataset can be loaded via the datasets library:
from datasets import load_dataset
# Load the Arabic Feedback Corpus
dataset = load_dataset("fr3on/arabic-feedback-corpus")
# Example: Access the first entry
print(dataset['train'][0])
4. Applications
This domain-specific dataset is designed to empower several key applications:
- Automated Quality Monitoring: Enabling systems to automatically flag negative delivery experiences in real-time with higher accuracy than generic models.
- Courier Performance Analysis: Facilitating the identification of patterns in feedback related to specific drivers or geographic regions.
- Enhanced Sentiment Models: Serving as a benchmark for fine-tuning large language models on dialectal Arabic to improve understanding of colloquialisms.
5. Conclusion
The Arabic Feedback Corpus enables a deeper understanding of customer sentiment in the logistics application domain. By training on real-world, dialect-rich data, we can build models that don't just translate words but understand the intent and emotion of the user. We invite the community to explore this resource to further the field of Arabic NLP.
Citation
@dataset{arabic_feedback_corpus,
title={Arabic Feedback Corpus: Logistics Domain Sentiment Analysis},
author={fr3on},
year={2026},
publisher={Hugging Face},
url={https://huggingface.co/datasets/fr3on/arabic-feedback-corpus}
}