Skip to content

ShriramNarkhede/AI-Content-Detection-System

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

4 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

๐Ÿ” AI Content Detector Pro

A sophisticated web application that uses advanced algorithms to detect whether text content was written by a human or generated by an AI model. The system employs multiple detection techniques including stylometric analysis, perplexity scoring, and machine learning classification for high-accuracy results.

โœจ Features

๐Ÿง  Advanced Detection Methods

  • Combined Analysis: Integrates all methods for highest accuracy
  • Stylometric Analysis: Analyzes writing style patterns and linguistic features
  • Perplexity Analysis: Measures text complexity and predictability
  • ML Classification: Machine learning-based detection using trained models

๐Ÿ“Š Comprehensive Analysis

  • Real-time Processing: Instant analysis with detailed results
  • Confidence Scoring: Probability-based predictions with confidence levels
  • Feature Breakdown: Detailed analysis of text characteristics
  • Visual Analytics: Interactive charts and graphs for result visualization

๐Ÿ“ File Support

  • Text Files (.txt): Direct text analysis
  • Word Documents (.docx): Extract and analyze content
  • PDF Files (.pdf): Extract text from PDF documents
  • Direct Input: Paste text directly for analysis

๐ŸŽจ Modern Interface

  • Responsive Design: Works on desktop and mobile devices
  • Interactive Charts: Plotly-powered visualizations
  • Real-time Updates: Live analysis with progress indicators
  • Professional UI: Clean, modern interface with intuitive navigation

๐Ÿš€ Quick Start

Installation

  1. Clone the repository:
git clone https://github.com/yourusername/ai-content-detector-pro.git
cd ai-content-detector-pro
  1. Create a virtual environment (recommended):
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
  1. Install dependencies:
pip install -r requirements.txt

Usage

  1. Start the application:
streamlit run app.py

Docker (Share with Friends)

  1. Build image

    docker build -t ai-content-detector .
  2. Run container

    docker run --rm -p 8501:8501 ai-content-detector
  3. Open app

    http://localhost:8501
    

The container already includes all Python/NLTK dependencies, so your friends only need Docker installed.

  1. Open your browser and navigate to the provided URL (typically http://localhost:8501)

  2. Choose your analysis method:

    • Combined Analysis (Recommended): Uses all methods for best accuracy
    • Stylometric Analysis: Focus on writing style patterns
    • Perplexity Analysis: Analyze text complexity
    • ML Classification: Machine learning-based detection
  3. Input your content:

    • Upload a document (.txt, .docx, .pdf)
    • Or paste text directly into the input area
  4. Analyze and view results:

    • Get probability scores for human vs AI origin
    • View detailed feature breakdowns
    • Explore interactive visualizations

๐Ÿ”ฌ Detection Methods Explained

1. Stylometric Analysis ๐Ÿ“ˆ

Analyzes writing style characteristics including:

  • Vocabulary Richness: Measures diversity of word usage
  • Sentence Length Distribution: Analyzes sentence structure patterns
  • Word Frequency Analysis: Identifies repetitive patterns
  • Punctuation Usage: Examines punctuation patterns
  • Capitalization Patterns: Analyzes capitalization frequency
  • Word Length Variance: Measures variation in word lengths

2. Perplexity Analysis ๐Ÿงฎ

Measures how "surprised" a language model is by the text:

  • Higher Perplexity: Suggests human-written content (more unpredictable)
  • Lower Perplexity: Suggests AI-generated content (more predictable)
  • Statistical Modeling: Uses probability distributions to assess text complexity

3. Machine Learning Classification ๐Ÿค–

Uses trained models to classify content:

  • TF-IDF Vectorization: Converts text to numerical features
  • Random Forest Classifier: Ensemble learning for robust predictions
  • Synthetic Training Data: Generated human and AI text samples
  • Probability Scoring: Provides confidence levels for predictions

4. Combined Analysis ๐ŸŽฏ

Integrates all three methods for maximum accuracy:

  • Weighted Combination: Balances different analysis methods
  • Cross-Validation: Reduces false positives and negatives
  • Robust Detection: Handles sophisticated AI-generated text

๐Ÿ“Š Understanding Results

Confidence Levels

  • High Confidence (>70%): Strong indication of content origin
  • Medium Confidence (50-70%): Mixed signals, consider additional context
  • Low Confidence (<50%): Uncertain results, manual review recommended

Visual Indicators

  • Pie Charts: Show probability distribution
  • Gauge Charts: Display perplexity scores
  • Bar Charts: Feature breakdown analysis
  • Color Coding: Green for human, red for AI indicators

Detailed Metrics

  • Human-Written Probability: Percentage indicating human authorship
  • AI-Generated Probability: Percentage indicating AI generation
  • Confidence Level: Overall reliability of the analysis
  • Feature Scores: Individual characteristic measurements

๐Ÿ› ๏ธ Technical Details

Architecture

  • Frontend: Streamlit web application
  • Backend: Python-based analysis engine
  • ML Pipeline: Scikit-learn for classification
  • Text Processing: NLTK for natural language processing
  • Visualization: Plotly for interactive charts

Dependencies

  • streamlit: Web application framework
  • numpy: Numerical computing
  • pandas: Data manipulation
  • scikit-learn: Machine learning algorithms
  • plotly: Interactive visualizations
  • nltk: Natural language processing
  • python-docx: Word document processing
  • PyPDF2: PDF text extraction

Model Training

The system automatically trains a machine learning model using:

  • Synthetic Human Text: Generated samples with natural variations
  • Synthetic AI Text: Generated samples with AI-like patterns
  • Feature Engineering: TF-IDF vectorization of text
  • Model Persistence: Saves trained models for reuse

๐Ÿ”ง Configuration

Analysis Settings

  • Detection Method: Choose analysis approach
  • Detailed Explanation: Toggle detailed feature breakdown
  • Feature Breakdown: Show individual feature scores

Model Management

  • Automatic Training: Models train on first run
  • Model Persistence: Trained models are saved locally
  • Model Status: Real-time model availability indicators

๐Ÿ“ˆ Performance

Accuracy

  • Combined Analysis: Highest accuracy across all methods
  • Cross-Validation: Robust against different text types
  • False Positive Reduction: Minimizes incorrect AI detections

Speed

  • Real-time Analysis: Instant results for most text lengths
  • Optimized Processing: Efficient algorithms for large documents
  • Caching: Model persistence for faster subsequent runs

๐Ÿค Contributing

We welcome contributions! Here's how you can help:

Development Setup

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Add tests if applicable
  5. Submit a pull request

Areas for Improvement

  • Enhanced ML Models: Better training data and algorithms
  • Additional Features: More detection methods
  • UI Improvements: Better user experience
  • Performance Optimization: Faster processing
  • Documentation: More detailed guides

๐Ÿ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

๐Ÿ™ Acknowledgments

  • Streamlit: For the excellent web application framework
  • Scikit-learn: For machine learning capabilities
  • NLTK: For natural language processing tools
  • Plotly: For interactive visualizations
  • Open Source Community: For the libraries that make this possible

๐Ÿ“ž Support

If you encounter any issues or have questions:

  • GitHub Issues: Report bugs and feature requests
  • Documentation: Check this README for usage instructions
  • Community: Join discussions in the repository

๐Ÿ” AI Content Detector Pro - Advanced AI-generated content detection using multiple analysis techniques

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published