Skip to content

Anfal-AR/rf-perception-dominance

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Random Forest and Nested Logistic Regression Analysis: Perception Dominance in Educational AI Integration

DOI Status Analysis ML Visualization

Overview

This repository contains the analysis materials for a published research study comparing statistical and machine learning approaches to predict AI integration among K-12 educators. The study demonstrates that educators' beliefs about AI's impact on student learning dominate all other predictors—a finding that converges across both parametric (logistic regression) and non-parametric (Random Forest) methods.

Key Finding: Positive beliefs about student effects emerged as the dominant predictor of AI adoption (feature importance = 0.415), outweighing workload perceptions by 2.5× and availability barriers by 15×. Random Forest achieved 97.3% test accuracy with perfect sensitivity (100%).

Publication

Title: Random Forest and Nested Logistic Regression Analysis: Uncovering Perception Dominance in Educational AI Integration

Author: Anfal Rababah

DOI: 10.5281/zenodo.17519163

Citation:

Rababah, A. (2025). Random Forest and Nested Logistic Regression Analysis: 
Uncovering Perception Dominance in Educational AI Integration.
https://doi.org/10.5281/zenodo.17519163

Research Questions

RQ1: Perceptual Differences

  • RQ1.a: Are there significant differences in perceptions by gender and subject discipline?
  • RQ1.b: Do educators facing barriers exhibit different perceptions than those without?

RQ2: Predictive Modeling

  • RQ2.a: What combination of factors best predicts AI integration using logistic regression?
  • RQ2.b: Which features show greatest importance in Random Forest, and does it outperform logistic regression?

Methodology

Sample

  • N = 189 K-12 educators from 32 schools in northern Jordan
  • Same dataset as Paper 1 with additional perception variables
  • High adoption rate: 84.7% (n=160) reported AI use

Analytical Framework

Analysis Method Tool Purpose
Perceptual differences Mann-Whitney U JASP Non-parametric group comparisons
Nested model comparison Logistic regression JASP Hypothesis testing, variance explained
Classification Random Forest JASP Predictive accuracy, feature importance
Visualization matplotlib, seaborn Python Publication figures

Model Specifications

Nested Logistic Regression Models:

Model Predictors McFadden R² AUC
M₀ Intercept only 0.000 0.500
M₁ Demographics (3) 0.035 0.620
M₂ M₁ + Barriers (7) 0.168 0.780
M₃ M₁ + Perceptions (6) 0.770 0.987
M₄ Full model (10) 0.796 0.990

Random Forest Configuration:

  • Trees: 67
  • Features per split: 3 (√predictors)
  • Data split: 64% train / 16% validation / 20% test
  • Feature importance: Permutation-based mean decrease in accuracy

Key Results

Perceptual Differences (RQ1)

By Gender: No significant differences (all p > .39)

By Subject Discipline:

Perception Non-Scientific vs Scientific p Effect (rᵣᵦ)
AI Efficiency Higher for non-scientific .046 0.158
Workload Effect Less negative for non-scientific .033 0.169
Student Effect No difference .076 0.116

By Barrier Type:

Barrier Student Effect Efficiency Workload
Cost ns ns ns
Availability p=.001** p<.001*** p=.005**
Skill p=.031* p=.026* ns
Language ns p=.058† p=.037*

Predictive Modeling (RQ2)

Model Comparison: Perceptions added 73.5 percentage points more variance than demographics alone (M₁→M₃: ΔR² = 0.735)

Random Forest Performance:

Metric Training Validation Test
Accuracy 96.8% 97.3% 97.3%
Sensitivity 100%
Specificity 85.7%
AUC 1.000

Feature Importance Rankings:

Rank Feature Importance Interpretation
1 Positive Student Effect 0.415 Dominant predictor
2 Teacher Workload Effect 0.168 2.5× less than #1
3 Availability Barrier 0.027 15× less than #1
4-10 All others <0.01 Negligible

Methodological Finding: Quasi-Complete Separation

The near-deterministic relationship between perceptions and adoption caused quasi-complete separation in logistic regression (SE > 4000 for some coefficients). Random Forest circumvented this issue, providing stable importance rankings even with strong predictor-outcome relationships.

Repository Structure

rf-perception-dominance/
│
├── README.md                    # This file
├── LICENSE                      # MIT License
│
├── data/
│   ├── data_dictionary.md       # Variable definitions
│   └── survey_instrument.md     # Full survey (Arabic + English)
│
├── analysis/
│   ├── jasp/
│   │   ├── mann_whitney_workflow.md    # RQ1 analysis
│   │   ├── logistic_regression_workflow.md  # RQ2a analysis
│   │   └── random_forest_workflow.md   # RQ2b analysis
│   └── results/
│       └── summary_tables.md    # Key statistical outputs
│
├── visualizations/
│   ├── scripts/
│   │   ├── model_comparison.py      # Nested model performance
│   │   ├── feature_importance.py    # RF importance rankings
│   │   └── demographics_perceptions.py  # Descriptive charts
│   └── figures/
│       └── [generated figures]
│
├── paper/
│   └── Rababah-2025-RandomForest-AI-Perceptions.pdf
│
└── requirements.txt             # Python dependencies

Technical Implementation

JASP Analysis (v0.19+)

1. Mann-Whitney U Tests (RQ1)

  • Menu: T-Tests → Independent Samples T-Test
  • Test: Mann-Whitney (non-parametric)
  • Effect size: Rank-biserial correlation

2. Nested Logistic Regression (RQ2a)

  • Menu: Regression → Logistic Regression
  • Models built incrementally (M₀ → M₄)
  • Metrics: McFadden R², AUC, Δχ², sensitivity/specificity

3. Random Forest Classification (RQ2b)

  • Menu: Machine Learning → Classification → Random Forest
  • Data split: Training (64%), Validation (16%), Test (20%)
  • Feature importance: Permutation-based (50 iterations)

Python Visualization

# Example: Feature importance horizontal bar chart
import matplotlib.pyplot as plt
import pandas as pd

importance_data = pd.DataFrame({
    'Feature': ['Positive Student Effect', 'Teacher Workload Effect', 
                'Availability Barrier', 'Compare Efficiency', 'Gender',
                'Language Barrier', 'Subject', 'Experience', 
                'Skill Barrier', 'Cost Barrier'],
    'Importance': [0.415, 0.168, 0.027, 0.009, 0.001, 
                   -0.001, -0.005, -0.007, -0.008, -0.013]
})

Skills Demonstrated

This project showcases:

  • Comparative Modeling: Systematic comparison of parametric vs. non-parametric approaches
  • Machine Learning: Random Forest classification with hyperparameter tuning
  • Statistical Analysis: Nested logistic regression, model selection criteria (AIC, BIC, pseudo-R²)
  • Non-parametric Tests: Mann-Whitney U with rank-biserial effect sizes
  • Handling Separation: Addressing quasi-complete separation in logistic regression
  • Model Evaluation: ROC-AUC, confusion matrices, precision/recall/F1

Practical Implications

The dominance of student-effect beliefs suggests:

Traditional Focus Recommended Shift
Infrastructure investment Demonstrate student learning benefits
Technical training Share concrete success stories
Barrier removal Facilitate peer observation
Generic workshops Evidence-based outcome documentation

Key insight: Professional development should prioritize changing beliefs about AI's impact on students rather than solely addressing access barriers or providing technical training.

Limitations

  • Cross-sectional design (no causal inference)
  • Binary adoption measure (masks implementation quality variation)
  • High base rate (84.7%) may limit generalizability
  • Single-item perception measures
  • Possible reverse causality (adoption → positive perceptions)

Related Work

This is part of a research series on educational technology implementation:

Contact

Anfal Rababah
Independent Researcher, Jordan
📧 Anfal0Rababah@email.com
🆔 ORCID: 0009-0003-7450-8907

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

  • K-12 educators who participated in the survey
  • JASP development team for integrated ML capabilities
  • scikit-learn team for Python ML validation

About

Random Forest vs nested logistic regression comparing predictors of AI adoption among K-12 educators (N=189). Beliefs about student learning dominate all predictors (importance=0.415). 97.3% accuracy, AUC=1.000. JASP for stats/ML, Python for visualization. DOI: 10.5281/zenodo.17519163

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Contributors

Languages