Skip to content
View nabeegh-khan's full-sized avatar
  • University of Toronto
  • Toronto, ON
  • Joined Mar 26, 2026

Block or report nabeegh-khan

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
nabeegh-khan/README.md

Hi, I'm Nabeegh Khan 👋

MEng Candidate — Electrical & Computer Engineering, University of Toronto (graduating December 2026)

MEM — Data Analytics & Product Innovation, University of Ottawa

P.Eng | PMP


What I Work On

I build end-to-end machine learning and data analytics pipelines at the intersection of AI, wireless communications, and research methods. My portfolio spans foundation model fine-tuning with LoRA for scenario-adaptive 6G beam prediction, deep reinforcement learning for network optimization, real-world mmWave beam prediction on live V2V measurements, production RAG systems for technical document retrieval, real-time streaming ML pipelines with full MLOps infrastructure, transformer-based NLP for bibliometric analysis, mixed-methods research combining survey statistics with large-scale text analysis, and cloud-based big data engineering on Apache Spark and Microsoft Azure.

Current focus: parameter-efficient foundation model adaptation for 6G wireless systems, production MLOps pipelines, and LLM-powered standards retrieval — with an emphasis on reproducible, end-to-end systems that bridge academic research and engineering practice.


Technical Skills

Machine Learning & AI

Deep Reinforcement Learning · DQN · CNN · LSTM · LSTM Autoencoder · RNN · SVM · Random Forest · BERTopic · Supervised Learning · Reward Regression · scikit-learn · PyTorch · Stable-Baselines3 · sentence-transformers

Foundation Models & Transfer Learning

LoRA/PEFT · HuggingFace Transformers · Large Wireless Model (LWM) · ONNX Runtime · INT8 Quantization · Weights & Biases (W&B) · DeepMIMOv3 · Transfer Learning · Rank Ablation · Model Compression

MLOps & Production ML

MLflow · Apache Airflow · Evidently AI · Docker · FastAPI · Model Serving · Experiment Tracking · Drift Monitoring · CI/CD · Feature Stores

Streaming & Data Engineering

Apache Kafka · Apache Spark Structured Streaming · DuckDB · dbt · Real-Time Pipelines · Sliding Window Features · Confluent Cloud

NLP & Text Analytics

BERTopic · VADER Sentiment Analysis · spaCy · NLTK · Topic Modeling · Qualitative Coding · UMAP · HDBSCAN

Retrieval-Augmented Generation & LLM Engineering

LangChain · LangChain LCEL · ChromaDB · OpenAI Embeddings · GPT-4o-mini · RAG Pipelines · Vector Databases · FastAPI · Streamlit · RAGAS Evaluation · LangSmith · Prompt Engineering

Data Analytics & Statistics

Bibliometric Analysis · Chi-Square · ANOVA · Ordinal Logistic Regression · Cronbach's Alpha · Mann-Whitney U · OLS Regression · Time Series Forecasting · Feature Engineering · pandas · NumPy · scipy · statsmodels

Big Data & Cloud Engineering

Apache Spark (RDD & DataFrame APIs) · Spark SQL · Scala · Databricks · Microsoft Azure Synapse Analytics · Azure Data Lake · Hadoop Ecosystem · NoSQL Databases (MongoDB · Cassandra)

API Integration & Data Collection

REST API pipelines · OpenAlex API · Semantic Scholar API · Reddit .json endpoints · Pagination & Rate Limiting

Wireless & Communications

mmWave Beam Prediction · Beamforming Codebook Optimization · V2V Communications · Massive MIMO · 6G AI-RAN · Power Allocation · Spectral Efficiency · Gymnasium Environments · DeepSense 6G

Visualization & Reporting

Matplotlib · Seaborn · Plotly (interactive) · Tableau · Power BI · Publication-quality figures (300 DPI)

Tools & Platforms

Python · Scala · SQL · Git · Jupyter · Google Colab · Microsoft Azure · AWS · Databricks


Portfolio Projects

Project Description Stack
LWM-LoRA: Scenario-Adaptive mmWave Beam Prediction LoRA fine-tuning of the Large Wireless Model (LWM v1.1, 2.47M-param Transformer) for 64-beam mmWave prediction across 3 DeepMIMO city scenarios (12,658 samples). Custom LoRA injection into 49 attention layers (4.82% trainable params). Rank ablation r∈{2,4,8,16}; r=4 optimal at 76.8% top-1 accuracy (+7.4% vs baseline). Cross-scenario transfer with 20% target data matches full fine-tuning within 0.3%. ONNX INT8 deployment: 5.51× latency reduction, 69.5% size reduction. PyTorch · HuggingFace · LoRA/PEFT · DeepMIMOv3 · ONNX Runtime · W&B
Real-Time Anomaly Detection MLOps Pipeline End-to-end streaming ML pipeline on the Numenta Anomaly Benchmark (NAB): Kafka ingestion → Spark Structured Streaming → DuckDB feature store → LSTM autoencoder training → FastAPI serving → Airflow orchestration → Evidently AI drift monitoring. 270,723 sliding windows across 38 time-series; ROC-AUC 0.64; 0/7 features drifted between train and test distributions. PyTorch · Kafka · Spark · MLflow · FastAPI · Airflow · Evidently AI · DuckDB
3GPP Specification Assistant — Production RAG System End-to-end RAG system for querying 3GPP 5G/6G technical specifications using natural language. Indexes 14 Release 18/19 specs (4,493 pages, 18,187 chunks) with OpenAI embeddings and ChromaDB. Evaluated with RAGAS: faithfulness 0.675, context recall 0.750. Served via FastAPI backend and Streamlit chat interface with LangSmith tracing. LangChain · ChromaDB · GPT-4o-mini · FastAPI · Streamlit · RAGAS
DeepSense 6G Beam Prediction — CNN, LSTM, RNN, SVM, RF & DQN End-to-end mmWave beam prediction on 112,189 real-world V2V measurements (Scenarios 36–39, 60 GHz). Random Forest achieved 22.6% Top-1 / 43.9% Top-3 accuracy — 14.2x above random baseline — outperforming all deep learning models. DQN analysis identified feature compression as the key RL bottleneck for high-cardinality beam selection. PyTorch · scikit-learn · DeepSense 6G · Gymnasium
6G Massive MIMO Resource Allocation — DQN vs Supervised Learning DQN vs supervised learning for dynamic power allocation in a 7-cell, 70-user Massive MIMO environment. 4.4x reward improvement over random baseline; CNN and RNN matched DQN controller performance via reward regression — demonstrating supervised learning as a computationally efficient alternative to RL for 6G AI-RAN. PyTorch · Stable-Baselines3 · Gymnasium
AI-in-Education Bibliometric + NLP Analysis Dual-API pipeline collecting 4,403 papers via OpenAlex & Semantic Scholar. BERTopic discovered 27 research clusters. Chi-square confirmed significant post-ChatGPT topic shift (χ²=323.87, p<0.0001). Exponential publication growth modeled at 30.7%/year (R²=0.844). BERTopic · VADER · OpenAlex API · pandas · scipy
AI in the Classroom — Mixed-Methods Survey + Reddit Analysis Convergent-parallel mixed-methods pipeline integrating two survey datasets (n=625) with 465 Reddit posts. Ordinal regression identified Attitude Toward Use as dominant predictor of AI adoption (OR=8.32, p<0.001). BERTopic + VADER surfaced a utility paradox — surveys show 7.44/10 utility ratings while AI writing tools discourse scored lowest sentiment (0.155). BERTopic · VADER · spaCy · statsmodels · Reddit API
Ontario Electricity Demand Forecasting End-to-end ML pipeline on 109,000+ hourly records from 4 integrated data sources (IESO, Environment Canada, NASA POWER). 13 engineered features including temporal lags, degree days, and cyclical encodings. 50.7% RMSE improvement over baseline (R²=0.9928) using 3-layer neural network across 5 model comparison. scikit-learn · XGBoost · PyTorch · pandas
Big Data Analytics — Apache Spark & Azure Synapse Distributed data processing using Apache Spark RDD and DataFrame APIs (Scala/Databricks) and cloud-scale SQL analytics on Microsoft Azure Synapse Analytics. Covers multi-file text corpus processing, retail transaction analysis across daily partitioned CSVs, and bike rental analytics with multi-table joins on a 24MB real-world dataset. Apache Spark · Scala · Databricks · Azure Synapse · T-SQL

Currently

  • 🔧 Latest build: LoRA-adapted Large Wireless Model for scenario-adaptive 6G beam prediction — custom LoRA injection, rank ablation, cross-scenario transfer, ONNX edge deployment
  • 📡 Publishing reproducible ML pipelines across wireless communications and AI research domains
  • 🔬 Research Analyst at ISTEP, University of Toronto (2025) — co-authored GenAI adoption study (n=124), mixed-methods analysis, UTERC 2025 poster presentation
  • 🎓 MEng Candidate, Electrical & Computer Engineering, University of Toronto — graduating December 2026

Pinned Loading

  1. 6g-lwm-beam-prediction 6g-lwm-beam-prediction Public

    6G mmWave beam prediction using LoRA-adapted Large Wireless Model (LWM) foundation model — DeepMIMO channels, HuggingFace, PEFT, W&B, ONNX

    Jupyter Notebook

  2. real-time-anomaly-mlops real-time-anomaly-mlops Public

    End-to-end streaming ML pipeline: Kafka ingestion, Spark feature engineering, PyTorch LSTM autoencoder for anomaly detection, MLflow experiment tracking, FastAPI serving, Airflow orchestration, and…

    Jupyter Notebook

  3. 3gpp-rag 3gpp-rag Public

    Production RAG system for querying 3GPP 5G/6G technical specifications using LangChain, ChromaDB, and GPT-4o-mini

    Jupyter Notebook

  4. deepsense-6g-beam-prediction deepsense-6g-beam-prediction Public

    Real-world mmWave beam prediction on DeepSense 6G measurements: CNN, LSTM, RNN, SVM, Random Forest, and DQN compared on top-K beam accuracy and received power gap across 45,000+ outdoor vehicular s…

    Jupyter Notebook

  5. 6g-mimo-resource-allocation 6g-mimo-resource-allocation Public

    DQN vs Supervised Learning for power allocation in multi-cell Massive MIMO - 6G AI-RAN

    Jupyter Notebook

  6. ai-education-bibliometric-analysis ai-education-bibliometric-analysis Public

    Bibliometric + NLP analysis of 4,403 AI-in-education papers (2015–2025) via OpenAlex & Semantic Scholar APIs. BERTopic topic modeling, VADER sentiment analysis, chi-square hypothesis testing, and e…

    HTML