A multi-modal hate speech detection system for Hinglish (Hindi-English code-mixed language). This Streamlit-based application allows users to detect hate speech in text, audio, video, and images using deep learning, NLP, OCR, and speech-to-text.
- What is Hate Speech Detection?
- Key Features
- 💻 Prerequisites
- 🚀 Installation
- ⬇️ Download the Model
- ☕ Usage
- 🛠️ Technology Stack
- 🏗️ File Structure
- 🤝 Contributors
- 📝 License
Hate speech detection is the process of identifying and classifying content (text, audio, video, images, etc.) as hate speech or non-hate speech. This project focuses on Hinglish, a code-mixed language, and supports detection across multiple modalities using advanced machine learning and NLP techniques.
- Multi-Modal Detection: Supports text, audio, video, and image hate speech detection in Hinglish.
- Real-Time Inference: Fast, interactive predictions via a modern Streamlit web interface.
- Robust Preprocessing: Advanced text cleaning, OCR for images, and speech-to-text for audio/video.
- Transfer Learning: Utilizes a fine-tuned BERT model for high-accuracy classification.
- User-Friendly UI: Intuitive navigation and clear results for all input types.
- Modular Codebase: Easily extendable for new modalities or languages.
- Python 3.8+
- pip
- Clone the repository:
git clone https://github.com/rahul-jaiswar-git/Hate-Shield-AI.git cd Hate-Shield-AI - Install dependencies:
pip install -r requirements.txt
- Download the model files:
- Download all model files from the Hugging Face repository: Hinglish-based-Hate-Speech-detection-model-v1 on Hugging Face
- Place all downloaded files (e.g.,
tf_model.h5,config.json,tokenizer_config.json,vocab.txt,special_tokens_map.json,label_encoder.pkl) into thehate_speech_model/directory.
You can download the pre-trained model and all required files from Hugging Face:
https://huggingface.co/rahuljaiswarofficial/Hinglish-based-Hate-Speech-detection-model-v1
After downloading, place all files in the hate_speech_model/ directory before running the app.
Run the Streamlit app:
streamlit run app.py- Use the sidebar to navigate between Home, Model Check, and About Us.
- In "Check Model", select the desired classification task (Text, Audio, Video, Image).
- Upload files or enter text as prompted.
- User Interface:
- Streamlit (web app framework)
- Pillow (image handling)
- Machine Learning & NLP:
- TensorFlow (deep learning backend)
- HuggingFace Transformers (BERT model)
- joblib (model serialization)
- numpy (numerical operations)
- Audio & Speech Processing:
- SpeechRecognition (speech-to-text)
- pydub, librosa, soundfile (audio file handling)
- Image & Video Processing:
- OpenCV (image/video processing)
- pytesseract (OCR)
- Utilities:
- requests (API calls)
project/
├── app.py # Main Streamlit app
├── requirements.txt # Python dependencies
├── README.md # Project documentation
├── styles.css # Custom styles for Streamlit
├── hate_speech_model/ # Model files and label encoder
├── Classifier/ # Classification modules for each modality
│ ├── text_classification.py
│ ├── audio_classification.py
│ ├── video_classification.py
│ ├── image_classification.py
├── Frontend/ # Images and GIFs for UI
│ ├── models.gif
│ ├── about us.gif
│ └── Hate.jpg
├── Dataset/ # (Optional) Data for training/testing
├── Eg Data/ # (Optional) Example data for demo
└── ...
|
Rahul Jaiswar |
Special Thanks: Open-source community, HuggingFace, and all referenced libraries
This project is licensed under the MIT License.
