HarmGuard AI

🟢 Real-time Analysis | 🧠 Deep Learning Model

Project Overview

HarmGuard AI is a real-time AI-powered content moderation platform designed to detect and classify harmful textual content across various online environments like social media, messaging apps, and community forums. The system uses advanced Natural Language Processing (NLP) and multi-label classification techniques to identify self-harm ideation, aggressive behavior, and references to violence.

Developed to support moderation workflows and early-intervention systems, this tool helps platforms maintain safer digital spaces by flagging and prioritizing potentially risky messages.

Detection Showcase

Example screenshot showing detection of aggressive language

Example 1: Identified suicidal thoughts marked under Self Harm.

Example screenshot showing detection of suicidal ideation

Example 2: Detected aggressive language classified under Harming Others.

The system includes a Threat Analysis Panel providing visual, real-time feedback for each message analyzed.

Key Functionalities

🔍 Real-Time Text Classification: Processes text inputs instantly with low latency suitable for live environments.
🧠 Multi-Label Harm Detection: Identifies one or more risk categories per message, including: self_harm, harming_others, harmed_by_others, and reference_to_harm.
🎯 Confidence Scoring: Provides a confidence score for each prediction, allowing filtering and prioritization.
⚖️ Responsible AI: Designed with considerations to reduce bias and ensure fairness.

Technologies Used

Python
Flask
Hugging Face Transformers
BERT / RoBERTa / Fine-tuned LLM
Scikit-learn
Pandas
NumPy

Model Training Highlights

🧠
Model Architecture

Fine-tuned roberta-large transformer model adapted for multi-label classification with 4 output nodes.

⚖️
Loss Function

BCEWithLogitsLoss optimized through HuggingFace Trainer for effective multi-label learning.

📊
Evaluation Metrics

Macro F1-score (0.89) and Accuracy (0.91) maintained across all harm categories.

📁
Data Pipeline

CSV datasets processed with pandas and HuggingFace datasets library for efficient batch loading.

⚙️
Training Config

HuggingFace Trainer with early stopping (patience=3) and per-epoch validation.

🔮
Inference Process

Logits processed through sigmoid activation with dynamic thresholding (≥0.5) for label assignment.

Back to Portfolio