🔍 AI Content Detector Pro

A sophisticated web application that uses advanced algorithms to detect whether text content was written by a human or generated by an AI model. The system employs multiple detection techniques including stylometric analysis, perplexity scoring, and machine learning classification for high-accuracy results.

✨ Features

🧠 Advanced Detection Methods

Combined Analysis: Integrates all methods for highest accuracy
Stylometric Analysis: Analyzes writing style patterns and linguistic features
Perplexity Analysis: Measures text complexity and predictability
ML Classification: Machine learning-based detection using trained models

📊 Comprehensive Analysis

Real-time Processing: Instant analysis with detailed results
Confidence Scoring: Probability-based predictions with confidence levels
Feature Breakdown: Detailed analysis of text characteristics
Visual Analytics: Interactive charts and graphs for result visualization

📁 File Support

Text Files (.txt): Direct text analysis
Word Documents (.docx): Extract and analyze content
PDF Files (.pdf): Extract text from PDF documents
Direct Input: Paste text directly for analysis

🎨 Modern Interface

Responsive Design: Works on desktop and mobile devices
Interactive Charts: Plotly-powered visualizations
Real-time Updates: Live analysis with progress indicators
Professional UI: Clean, modern interface with intuitive navigation

🚀 Quick Start

Installation

Clone the repository:

git clone https://github.com/yourusername/ai-content-detector-pro.git
cd ai-content-detector-pro

Create a virtual environment (recommended):

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install dependencies:

pip install -r requirements.txt

Usage

Start the application:

streamlit run app.py

Docker (Share with Friends)

Build image
```
docker build -t ai-content-detector .
```

Run container

docker run --rm -p 8501:8501 ai-content-detector

Open app
```
http://localhost:8501
```

The container already includes all Python/NLTK dependencies, so your friends only need Docker installed.

Open your browser and navigate to the provided URL (typically http://localhost:8501)
Choose your analysis method:
- Combined Analysis (Recommended): Uses all methods for best accuracy
- Stylometric Analysis: Focus on writing style patterns
- Perplexity Analysis: Analyze text complexity
- ML Classification: Machine learning-based detection
Input your content:
- Upload a document (.txt, .docx, .pdf)
- Or paste text directly into the input area
Analyze and view results:
- Get probability scores for human vs AI origin
- View detailed feature breakdowns
- Explore interactive visualizations

🔬 Detection Methods Explained

1. Stylometric Analysis 📈

Analyzes writing style characteristics including:

Vocabulary Richness: Measures diversity of word usage
Sentence Length Distribution: Analyzes sentence structure patterns
Word Frequency Analysis: Identifies repetitive patterns
Punctuation Usage: Examines punctuation patterns
Capitalization Patterns: Analyzes capitalization frequency
Word Length Variance: Measures variation in word lengths

2. Perplexity Analysis 🧮

Measures how "surprised" a language model is by the text:

Higher Perplexity: Suggests human-written content (more unpredictable)
Lower Perplexity: Suggests AI-generated content (more predictable)
Statistical Modeling: Uses probability distributions to assess text complexity

3. Machine Learning Classification 🤖

Uses trained models to classify content:

TF-IDF Vectorization: Converts text to numerical features
Random Forest Classifier: Ensemble learning for robust predictions
Synthetic Training Data: Generated human and AI text samples
Probability Scoring: Provides confidence levels for predictions

4. Combined Analysis 🎯

Integrates all three methods for maximum accuracy:

Weighted Combination: Balances different analysis methods
Cross-Validation: Reduces false positives and negatives
Robust Detection: Handles sophisticated AI-generated text

📊 Understanding Results

Confidence Levels

High Confidence (>70%): Strong indication of content origin
Medium Confidence (50-70%): Mixed signals, consider additional context
Low Confidence (<50%): Uncertain results, manual review recommended

Visual Indicators

Pie Charts: Show probability distribution
Gauge Charts: Display perplexity scores
Bar Charts: Feature breakdown analysis
Color Coding: Green for human, red for AI indicators

Detailed Metrics

Human-Written Probability: Percentage indicating human authorship
AI-Generated Probability: Percentage indicating AI generation
Confidence Level: Overall reliability of the analysis
Feature Scores: Individual characteristic measurements

🛠️ Technical Details

Architecture

Frontend: Streamlit web application
Backend: Python-based analysis engine
ML Pipeline: Scikit-learn for classification
Text Processing: NLTK for natural language processing
Visualization: Plotly for interactive charts

Dependencies

streamlit: Web application framework
numpy: Numerical computing
pandas: Data manipulation
scikit-learn: Machine learning algorithms
plotly: Interactive visualizations
nltk: Natural language processing
python-docx: Word document processing
PyPDF2: PDF text extraction

Model Training

The system automatically trains a machine learning model using:

Synthetic Human Text: Generated samples with natural variations
Synthetic AI Text: Generated samples with AI-like patterns
Feature Engineering: TF-IDF vectorization of text
Model Persistence: Saves trained models for reuse

🔧 Configuration

Analysis Settings

Detection Method: Choose analysis approach
Detailed Explanation: Toggle detailed feature breakdown
Feature Breakdown: Show individual feature scores

Model Management

Automatic Training: Models train on first run
Model Persistence: Trained models are saved locally
Model Status: Real-time model availability indicators

📈 Performance

Accuracy

Combined Analysis: Highest accuracy across all methods
Cross-Validation: Robust against different text types
False Positive Reduction: Minimizes incorrect AI detections

Speed

Real-time Analysis: Instant results for most text lengths
Optimized Processing: Efficient algorithms for large documents
Caching: Model persistence for faster subsequent runs

🤝 Contributing

We welcome contributions! Here's how you can help:

Development Setup

Fork the repository
Create a feature branch
Make your changes
Add tests if applicable
Submit a pull request

Areas for Improvement

Enhanced ML Models: Better training data and algorithms
Additional Features: More detection methods
UI Improvements: Better user experience
Performance Optimization: Faster processing
Documentation: More detailed guides

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

Streamlit: For the excellent web application framework
Scikit-learn: For machine learning capabilities
NLTK: For natural language processing tools
Plotly: For interactive visualizations
Open Source Community: For the libraries that make this possible

📞 Support

If you encounter any issues or have questions:

GitHub Issues: Report bugs and feature requests
Documentation: Check this README for usage instructions
Community: Join discussions in the repository

🔍 AI Content Detector Pro - Advanced AI-generated content detection using multiple analysis techniques

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
__pycache__		__pycache__
frontend		frontend
.dockerignore		.dockerignore
.gitignore		.gitignore
1.png		1.png
2.png		2.png
4.35.0		4.35.0
Dockerfile		Dockerfile
IMPLEMENTATION_SUMMARY.md		IMPLEMENTATION_SUMMARY.md
IMPROVEMENT_GUIDE.md		IMPROVEMENT_GUIDE.md
INSTALLATION_COMPLETE.md		INSTALLATION_COMPLETE.md
PROJECT_SUMMARY.md		PROJECT_SUMMARY.md
QUICK_UPGRADE.md		QUICK_UPGRADE.md
README.md		README.md
ai_detector_model.pkl		ai_detector_model.pkl
api.py		api.py
app.py		app.py
brown_kn3_lm.pkl		brown_kn3_lm.pkl
config.py		config.py
core.py		core.py
demo.py		demo.py
enhanced_detector.py		enhanced_detector.py
requirements.txt		requirements.txt
test.py		test.py
test_enhanced.py		test_enhanced.py
test_system.py		test_system.py

ShriramNarkhede/AI-Content-Detection-System

Folders and files

Latest commit

History

Repository files navigation