Random Forest and Nested Logistic Regression Analysis: Perception Dominance in Educational AI Integration

Overview

This repository contains the analysis materials for a published research study comparing statistical and machine learning approaches to predict AI integration among K-12 educators. The study demonstrates that educators' beliefs about AI's impact on student learning dominate all other predictors—a finding that converges across both parametric (logistic regression) and non-parametric (Random Forest) methods.

Key Finding: Positive beliefs about student effects emerged as the dominant predictor of AI adoption (feature importance = 0.415), outweighing workload perceptions by 2.5× and availability barriers by 15×. Random Forest achieved 97.3% test accuracy with perfect sensitivity (100%).

Publication

Title: Random Forest and Nested Logistic Regression Analysis: Uncovering Perception Dominance in Educational AI Integration

Author: Anfal Rababah

DOI: 10.5281/zenodo.17519163

Citation:

Rababah, A. (2025). Random Forest and Nested Logistic Regression Analysis: 
Uncovering Perception Dominance in Educational AI Integration.
https://doi.org/10.5281/zenodo.17519163

Research Questions

RQ1: Perceptual Differences

RQ1.a: Are there significant differences in perceptions by gender and subject discipline?
RQ1.b: Do educators facing barriers exhibit different perceptions than those without?

RQ2: Predictive Modeling

RQ2.a: What combination of factors best predicts AI integration using logistic regression?
RQ2.b: Which features show greatest importance in Random Forest, and does it outperform logistic regression?

Methodology

Sample

N = 189 K-12 educators from 32 schools in northern Jordan
Same dataset as Paper 1 with additional perception variables
High adoption rate: 84.7% (n=160) reported AI use

Analytical Framework

Analysis	Method	Tool	Purpose
Perceptual differences	Mann-Whitney U	JASP	Non-parametric group comparisons
Nested model comparison	Logistic regression	JASP	Hypothesis testing, variance explained
Classification	Random Forest	JASP	Predictive accuracy, feature importance
Visualization	matplotlib, seaborn	Python	Publication figures

Model Specifications

Nested Logistic Regression Models:

Model	Predictors	McFadden R²	AUC
M₀	Intercept only	0.000	0.500
M₁	Demographics (3)	0.035	0.620
M₂	M₁ + Barriers (7)	0.168	0.780
M₃	M₁ + Perceptions (6)	0.770	0.987
M₄	Full model (10)	0.796	0.990

Random Forest Configuration:

Trees: 67
Features per split: 3 (√predictors)
Data split: 64% train / 16% validation / 20% test
Feature importance: Permutation-based mean decrease in accuracy

Key Results

Perceptual Differences (RQ1)

By Gender: No significant differences (all p > .39)

By Subject Discipline:

Perception	Non-Scientific vs Scientific	p	Effect (rᵣᵦ)
AI Efficiency	Higher for non-scientific	.046	0.158
Workload Effect	Less negative for non-scientific	.033	0.169
Student Effect	No difference	.076	0.116

By Barrier Type:

Barrier	Student Effect	Efficiency	Workload
Cost	ns	ns	ns
Availability	p=.001**	p<.001***	p=.005**
Skill	p=.031*	p=.026*	ns
Language	ns	p=.058†	p=.037*

Predictive Modeling (RQ2)

Model Comparison: Perceptions added 73.5 percentage points more variance than demographics alone (M₁→M₃: ΔR² = 0.735)

Random Forest Performance:

Metric	Training	Validation	Test
Accuracy	96.8%	97.3%	97.3%
Sensitivity	—	—	100%
Specificity	—	—	85.7%
AUC	—	—	1.000

Feature Importance Rankings:

Rank	Feature	Importance	Interpretation
1	Positive Student Effect	0.415	Dominant predictor
2	Teacher Workload Effect	0.168	2.5× less than #1
3	Availability Barrier	0.027	15× less than #1
4-10	All others	<0.01	Negligible

Methodological Finding: Quasi-Complete Separation

The near-deterministic relationship between perceptions and adoption caused quasi-complete separation in logistic regression (SE > 4000 for some coefficients). Random Forest circumvented this issue, providing stable importance rankings even with strong predictor-outcome relationships.

Repository Structure

rf-perception-dominance/
│
├── README.md                    # This file
├── LICENSE                      # MIT License
│
├── data/
│   ├── data_dictionary.md       # Variable definitions
│   └── survey_instrument.md     # Full survey (Arabic + English)
│
├── analysis/
│   ├── jasp/
│   │   ├── mann_whitney_workflow.md    # RQ1 analysis
│   │   ├── logistic_regression_workflow.md  # RQ2a analysis
│   │   └── random_forest_workflow.md   # RQ2b analysis
│   └── results/
│       └── summary_tables.md    # Key statistical outputs
│
├── visualizations/
│   ├── scripts/
│   │   ├── model_comparison.py      # Nested model performance
│   │   ├── feature_importance.py    # RF importance rankings
│   │   └── demographics_perceptions.py  # Descriptive charts
│   └── figures/
│       └── [generated figures]
│
├── paper/
│   └── Rababah-2025-RandomForest-AI-Perceptions.pdf
│
└── requirements.txt             # Python dependencies

Technical Implementation

JASP Analysis (v0.19+)

1. Mann-Whitney U Tests (RQ1)

Menu: T-Tests → Independent Samples T-Test
Test: Mann-Whitney (non-parametric)
Effect size: Rank-biserial correlation

2. Nested Logistic Regression (RQ2a)

Menu: Regression → Logistic Regression
Models built incrementally (M₀ → M₄)
Metrics: McFadden R², AUC, Δχ², sensitivity/specificity

3. Random Forest Classification (RQ2b)

Menu: Machine Learning → Classification → Random Forest
Data split: Training (64%), Validation (16%), Test (20%)
Feature importance: Permutation-based (50 iterations)

Python Visualization

# Example: Feature importance horizontal bar chart
import matplotlib.pyplot as plt
import pandas as pd

importance_data = pd.DataFrame({
    'Feature': ['Positive Student Effect', 'Teacher Workload Effect', 
                'Availability Barrier', 'Compare Efficiency', 'Gender',
                'Language Barrier', 'Subject', 'Experience', 
                'Skill Barrier', 'Cost Barrier'],
    'Importance': [0.415, 0.168, 0.027, 0.009, 0.001, 
                   -0.001, -0.005, -0.007, -0.008, -0.013]
})

Skills Demonstrated

This project showcases:

Comparative Modeling: Systematic comparison of parametric vs. non-parametric approaches
Machine Learning: Random Forest classification with hyperparameter tuning
Statistical Analysis: Nested logistic regression, model selection criteria (AIC, BIC, pseudo-R²)
Non-parametric Tests: Mann-Whitney U with rank-biserial effect sizes
Handling Separation: Addressing quasi-complete separation in logistic regression
Model Evaluation: ROC-AUC, confusion matrices, precision/recall/F1

Practical Implications

The dominance of student-effect beliefs suggests:

Traditional Focus	Recommended Shift
Infrastructure investment	Demonstrate student learning benefits
Technical training	Share concrete success stories
Barrier removal	Facilitate peer observation
Generic workshops	Evidence-based outcome documentation

Key insight: Professional development should prioritize changing beliefs about AI's impact on students rather than solely addressing access barriers or providing technical training.

Limitations

Cross-sectional design (no causal inference)
Binary adoption measure (masks implementation quality variation)
High base rate (84.7%) may limit generalizability
Single-item perception measures
Possible reverse causality (adoption → positive perceptions)

Related Work

This is part of a research series on educational technology implementation:

Paper 1: AI Implementation Barriers - Cluster Analysis
Paper 2: This repository (Perception Dominance - Predictive Modeling)
Paper 3: [Coming soon]

Contact

Anfal Rababah
Independent Researcher, Jordan
📧 Anfal0Rababah@email.com
🆔 ORCID: 0009-0003-7450-8907

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

K-12 educators who participated in the survey
JASP development team for integrated ML capabilities
scikit-learn team for Python ML validation

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Random Forest and Nested Logistic Regression Analysis: Perception Dominance in Educational AI Integration

Overview

Publication

Research Questions

Methodology

Sample

Analytical Framework

Model Specifications

Key Results

Perceptual Differences (RQ1)

Predictive Modeling (RQ2)

Methodological Finding: Quasi-Complete Separation

Repository Structure

Technical Implementation

JASP Analysis (v0.19+)

Python Visualization

Skills Demonstrated

Practical Implications

Limitations

Related Work

Contact

License

Acknowledgments

About

Uh oh!

Releases

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
analysis		analysis
data		data
paper		paper
visualizations		visualizations
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Random Forest and Nested Logistic Regression Analysis: Perception Dominance in Educational AI Integration

Overview

Publication

Research Questions

Methodology

Sample

Analytical Framework

Model Specifications

Key Results

Perceptual Differences (RQ1)

Predictive Modeling (RQ2)

Methodological Finding: Quasi-Complete Separation

Repository Structure

Technical Implementation

JASP Analysis (v0.19+)

Python Visualization

Skills Demonstrated

Practical Implications

Limitations

Related Work

Contact

License

Acknowledgments

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Contributors

Uh oh!

Languages