
Our classifier is based on Detoxify's unbiased-toxic-roberta model, which was trained on the Jigsaw Unintended Bias dataset.
This model was selected for its high accuracy and reduced false positives in content from marginalized groups. We calibrate our thresholds using guidance from Google Publisher Policies.
Each page receives a composite safety grade based on the most concerning content on the page. We evaluate content the same way movie ratings work — giving you clear guidance on what's appropriate for your brand.
Note: Pages without completed safety review default to Grade D to protect advertisers.
Significant content is available for standard advertising
Each page is evaluated independently. When we mark content as advertiser-safe, that specific page meets all standards.
Urban Dictionary applies two distinct moderation layers to every piece of content — one for publication, and another for advertiser safety. This ensures ads never appear next to content that violates your risk tolerance.
- ✓ Only Grade A pages
- ✓ Ideal for family-safe or regulated brands
- ✓ Pristine content for maximum safety
- ✓ Grades A & B
- ✓ The sweet spot for reach and safety
- ✓ Suitable for most mainstream advertisers
- ✓ Grades A–C (limited additional inventory)
- ✓ Broadest reach, suitable for mature brands
- ✓ Carefully managed for brand alignment
Custom controls
Set category-level thresholds (e.g. exclude only identity-based attacks or sexual content) to tailor safety enforcement.
✓ Open methodology
- Model details, thresholds, and scoring code are available for audit.
✓ Bias mitigation
- We use fairness-optimized models and conduct regular audits to reduce disproportionate impact on minority content.
✓ Flexible targeting
- Choose which safety grades align with your brand's risk tolerance. Target only Grade A for maximum safety, or include B and C for broader reach.
Have a question about our brand safety policies?
Get detailed information about our moderation system, thresholds, and implementation.