Shush is an app that deploys a WhisperV3 model with Flash Attention v2 on Modal and makes requests to it via a NextJS app
-
Updated
Jun 7, 2024 - TypeScript
Shush is an app that deploys a WhisperV3 model with Flash Attention v2 on Modal and makes requests to it via a NextJS app
Triton implementation of FlashAttention2 that adds Custom Masks.
🎭 Real-time voice-controlled 3D avatar with multimodal AI - speak naturally and watch your AI companion respond with perfect lip-sync
Performance of the C++ interface of flash attention and flash attention v2 in large language model (LLM) inference scenarios.
A flexible and efficient implementation of Flash Attention 2.0 for JAX, supporting multiple backends (GPU/TPU/CPU) and platforms (Triton/Pallas/JAX).
Cross-platform FlashAttention-2 Triton implementation for Turing+ GPUs with custom configuration mode
A cutlass cute implementation of headdim-64 flashattentionv2 TensorRT plugin for LightGlue. Run on Jetson Orin NX 8GB with TensorRT 8.5.2.
Vulkan & GLSL implementation of FlashAttention-2
FlashAttention for sliding window attention in Triton (fwd + bwd pass)
A Gradio-based demo application for comparing state-of-the-art OCR models: DeepSeek-OCR, Dots.OCR, HunyuanOCR, and Nanonets-OCR2-3B.
Tiny VLMs Lab is a Hugging Face Space and open-source project showcasing lightweight Vision-Language Models for image captioning, OCR, reasoning, and multimodal understanding. It offers a simple Gradio interface to upload images, query models, adjust generation settings, and export results in Markdown or PDF.
Toy Flash Attention implementation in torch
Core-OCR is an advanced, experimental Optical Character Recognition (OCR) and document analysis suite designed for highly accurate text extraction, table reconstruction, and complex visual reasoning. Built on the robust Qwen2.5-VL and Qwen2-VL multimodal architectures.
Poplar implementation of FlashAttention for IPU
A Gradio-powered web interface for performing advanced OCR tasks using the DeepSeek-OCR model. This experimental app leverages Hugging Face Transformers to process images for text extraction, document conversion, figure parsing, and object localization.
Systematically train and benchmark Mistral, Qwen2.5, and SmolLM2 on essay grading across 39 experiments through data analysis and engineering, structured preprocessing, instruction tuning, postprocessing, and leakage aware evaluation for robust score and rationale generation
Transcribe audio in minutes with OpenAI's WhisperV3 and Flash Attention v2 + Transformers without relying on third-party providers and APIs. Host it yourself or try it out.
This application allows users to perform various OCR tasks such as converting documents to markdown, extracting text, locating specific text within images, and parsing figures, all through a user-friendly interface. This demo leverages the deepseek-ai/DeepSeek-OCR-2
Reference Flash Attention implementation in PyTorch with V1/V2, GQA/MQA, Triton kernels, benchmark and docs.
Demonstration for the zai-org/GLM-OCR multimodal OCR model. Supports text, formula, and table recognition from uploaded images, with outputs in plain text and markdown formats.
Add a description, image, and links to the flash-attention-2 topic page so that developers can more easily learn about it.
To associate your repository with the flash-attention-2 topic, visit your repo's landing page and select "manage topics."