VS Code extension to easily view and handle large datasets. Look at JSONL/Parquet/CSV files without crashes + 16 production LLM tokenizers for chat completion data
-
Updated
May 28, 2026 - TypeScript
VS Code extension to easily view and handle large datasets. Look at JSONL/Parquet/CSV files without crashes + 16 production LLM tokenizers for chat completion data
Multi-tenant Postgres-compatible database on object storage. 12× cheaper disk than Postgres, native vector search, per-tenant isolation. Built in Rust.
Fine-tuning DistilGPT2 on the EmpatheticDialogues dataset to create an emotionally intelligent chatbot. Features custom attention calibration and a Streamlit-based interface for wellness support.
Production-style PySpark ETL pipeline processing 100K+ e-commerce records with optimized joins, feature engineering, and scalable Parquet outputs.
Production-grade data layer for NIFTY derivatives market data. Parquet warehouse, DuckDB query engine, MLflow tracking, 4 typed access functions, and a validation module that found 5 real anomalies in the vendor feed.
Add a description, image, and links to the paraquet topic page so that developers can more easily learn about it.
To associate your repository with the paraquet topic, visit your repo's landing page and select "manage topics."