رصد المحتوى غير الملائم على منصة يوتيوب آنياً باستخدام CLIP: منهجية الرؤية واللغة بنظام التعلم من المحاولة الصفرية

عبدالغني عبدالسلام عبدالسلام عبيد

doi:10.64489/ehrd0w49

Authors

عبدالغني عبدالسلام عبدالسلام عبيد
a.abied@azu.edu.ly
جامعة الزيتونة, Libya

Vol. 12 No. 12 (2026): Issue No. 12

Applied Sciences

Submitted 11 April 2026

Accepted 22 April 2026

Published 22 April 2026

Abstract
How to Cite
Metrics

The rapid expansion of online video platforms has significantly increased children’s exposure to potentially harmful content, including violent and explicit material. Traditional moderation techniques, such as keyword-based filtering and static blocklists, are insufficient to address the dynamic and multimodal nature of modern digital media. This study proposes a real-time content moderation system that integrates a browser extension with a guardian monitoring platform, enabling continuous supervision of YouTube video consumption.

The system leverages the CLIP (Contrastive Language–Image Pretraining) model to perform zero-shot classification of video frames by aligning visual and textual representations in a shared semantic space. The methodology involves periodic frame sampling, preprocessing, and similarity-based classification using predefined harmful and safe content labels. A dual-pass decision mechanism, combined with temporal consistency filtering, is employed to improve detection reliability and reduce false positives.

Experimental evaluation on a labeled dataset demonstrates that the proposed system achieves an accuracy of 78%, with a high recall for harmful content detection. The results indicate that the system effectively prioritizes safety by minimizing undetected harmful content while maintaining acceptable precision levels.

Overall, the proposed approach highlights the practical potential of zero-shot learning for real-time content moderation in dynamic environments. The system provides an effective, scalable, and privacy-aware solution for enhancing child safety in online video platforms.

How to Cite

“Real-Time Inappropriate Content Detection on YouTube Using CLIP: A Zero-Shot Vision-Language Approach”. 2026. Alrefak Journal for Knowledge 12 (12). https://doi.org/10.64489/ehrd0w49.

Download Citation

Real-Time Inappropriate Content Detection on YouTube Using CLIP: A Zero-Shot Vision-Language Approach

Authors

How to Cite

Login

Make a Submission

Menu

Language

Article Template Ar

Article Template En

journal is screened by

Indexing & Classification

Contact Info

Latest publications