Build software better, together

arihanv / Shush

Shush is an app that deploys a WhisperV3 model with Flash Attention v2 on Modal and makes requests to it via a NextJS app

machine-learning modal transcription whisper huggingface-transformers shadcn-ui flash-attention-2

Updated Jun 7, 2024
TypeScript

alexzhang13 / flashattention2-custom-mask

Star

Triton implementation of FlashAttention2 that adds Custom Masks.

deep-learning triton attention cuda-kernels attention-mechanism triton-lang flash-attention flash-attention-2

Updated Aug 14, 2024
Python

kiranbaby14 / TalkMateAI

Star

🎭 Real-time voice-controlled 3D avatar with multimodal AI - speak naturally and watch your AI companion respond with perfect lip-sync

websocket nextjs vlm fastapi huggingface whisper-ai flash-attention-2 multimodal-ai kokoro-tts smolvlm

Updated Jul 5, 2025
TypeScript

Bruce-Lee-LY / flash_attention_inference

Star

Performance of the C++ interface of flash attention and flash attention v2 in large language model (LLM) inference scenarios.

gpu cuda inference nvidia cutlass mha multi-head-attention llm tensor-core large-language-model flash-attention flash-attention-2

Updated Feb 27, 2025
C++

erfanzar / jax-flash-attn2

Star

A flexible and efficient implementation of Flash Attention 2.0 for JAX, supporting multiple backends (GPU/TPU/CPU) and platforms (Triton/Pallas/JAX).

pallas jax flash-attention flash-attention-2

Updated Mar 4, 2025
Python

egaoharu-kensei / flash-attention-triton

Star

Cross-platform FlashAttention-2 Triton implementation for Turing+ GPUs with custom configuration mode

Updated Jan 12, 2026
Python

qdLMF / LightGlue-with-FlashAttentionV2-TensorRT

Star

A cutlass cute implementation of headdim-64 flashattentionv2 TensorRT plugin for LightGlue. Run on Jetson Orin NX 8GB with TensorRT 8.5.2.

cuda transformer cutlass cute tensorrt feature-matching multihead-attention superpoint lightglue flash-attention flash-attention-2

Updated Mar 3, 2025
Cuda

etasnadi / VulkanCooperativeMatrixAttention

Star

Vulkan & GLSL implementation of FlashAttention-2

vulkan glsl artificial-intelligence gpu-acceleration attention gpu-computing deel-learning tensor-cores large-language-models llm flash-attention flash-attention-2

Updated Jan 19, 2025
C++

MaxLSB / flash-attn2

Star

FlashAttention for sliding window attention in Triton (fwd + bwd pass)

python deep-learning pytorch triton sliding-window flash-attention-2 flashattention

Updated Jun 25, 2025
Python

PRITHIVSAKTHIUR / Super-OCRs-Demo

Star

A Gradio-based demo application for comparing state-of-the-art OCR models: DeepSeek-OCR, Dots.OCR, HunyuanOCR, and Nanonets-OCR2-3B.

python ocr pillow torch accelerate supervision gradio opencv-python nanonets torchvision sentencepiece huggingface-transformers huggingface-spaces flash-attention-2 hunyuan qwen2-5-vl dots-ocr deepseek-ocr easydict

Updated Nov 28, 2025
Python

PRITHIVSAKTHIUR / Tiny-VLMs-Lab

Star

Tiny VLMs Lab is a Hugging Face Space and open-source project showcasing lightweight Vision-Language Models for image captioning, OCR, reasoning, and multimodal understanding. It offers a simple Gradio interface to upload images, query models, adjust generation settings, and export results in Markdown or PDF.

ocr cuda optical-character-recognition gradio multimodality captioning-images huggingface-transformers vision-transformer hugging-face huggingface-spaces vision-language-model flash-attention-2 vlms qwen2-5-vl

Updated Nov 26, 2025
Python

gietema / attention

Star

Toy Flash Attention implementation in torch

torch flash-attention flash-attention-2 flash-attention-3

Updated Sep 22, 2024
Python

PRITHIVSAKTHIUR / Core-OCR

Star

Core-OCR is an advanced, experimental Optical Character Recognition (OCR) and document analysis suite designed for highly accurate text extraction, table reconstruction, and complex visual reasoning. Built on the robust Qwen2.5-VL and Qwen2-VL multimodal architectures.

python torch pytorch gradio kernels ocr-recognition torchvision huggingface-transformers vision-transformer huggingface-models huggingface-spaces vision-language-model flash-attention-2 qwen2-vl qwen2-5-vl

Updated Mar 23, 2026
Python

graphcore-research / flash-attention-ipu

Star

Poplar implementation of FlashAttention for IPU

deep-learning transformers pytorch ipu graphcore poplar flash-attention flash-attention-2

Updated Mar 12, 2024
C++

PRITHIVSAKTHIUR / DeepSeek-OCR-experimental

Star

A Gradio-powered web interface for performing advanced OCR tasks using the DeepSeek-OCR model. This experimental app leverages Hugging Face Transformers to process images for text extraction, document conversion, figure parsing, and object localization.

torch pytorch accelerate matplotlib addict gradio opencv-python torchvision i64 huggingface-transformers timm bf16 huggingface-spaces flash-attention-2 deepseek-ocr easydict

Updated Nov 4, 2025
Python

IsmaelMousa / automatic-essay-grading

Star

Systematically train and benchmark Mistral, Qwen2.5, and SmolLM2 on essay grading across 39 experiments through data analysis and engineering, structured preprocessing, instruction tuning, postprocessing, and leakage aware evaluation for robust score and rationale generation

modeling evaluation transformers data-engineering data-analysis lora preprocessing postprocessing automatic-essay-scoring instruction-tuning supervised-finetuning flash-attention-2 mistral-7b unsloth qwen2-5 smollm2

Updated Aug 14, 2025
Jupyter Notebook

lalitdotdev / transcribeX

Star

Transcribe audio in minutes with OpenAI's WhisperV3 and Flash Attention v2 + Transformers without relying on third-party providers and APIs. Host it yourself or try it out.

python modal transformers transcription wavesurfer-js nvidia-cuda bun nvidia-gpu virtual-environment fastapi huggingface-transformers flash-attention-2 next14 whisper- whisperv3

Updated Jun 18, 2024
TypeScript

PRITHIVSAKTHIUR / DeepSeek-OCR-2-Demo

Star

This application allows users to perform various OCR tasks such as converting documents to markdown, extracting text, locating specific text within images, and parsing figures, all through a user-friendly interface. This demo leverages the deepseek-ai/DeepSeek-OCR-2

python ocr torch pytorch addict gradio ocr-text-reader torchvision huggingface-transformers tokenizers einops flash-attention-2 deepseek-v3 deepseek-ocr easydict deepseek-ocr-2

Updated Feb 5, 2026
Python

iamrahulreddy / FlashTile

Star

Reference Flash Attention implementation in PyTorch with V1/V2, GQA/MQA, Triton kernels, benchmark and docs.

gpu efficiency pytorch gqa mqa llm flash-attention flash-attention-2

Updated Mar 13, 2026
Python

PRITHIVSAKTHIUR / GLM-OCR-Demo

Star

Demonstration for the zai-org/GLM-OCR multimodal OCR model. Supports text, formula, and table recognition from uploaded images, with outputs in plain text and markdown formats.

python markdown ocr computer-vision pillow torch pytorch accelerate gradio opencv-python peft torchvision huggingface-transformers flash-attention-2 vlms glm-ocr

Updated Feb 5, 2026
Python

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

flash-attention-2

Here are 22 public repositories matching this topic...

arihanv / Shush

alexzhang13 / flashattention2-custom-mask

kiranbaby14 / TalkMateAI

Bruce-Lee-LY / flash_attention_inference

erfanzar / jax-flash-attn2

egaoharu-kensei / flash-attention-triton

qdLMF / LightGlue-with-FlashAttentionV2-TensorRT

etasnadi / VulkanCooperativeMatrixAttention

MaxLSB / flash-attn2

PRITHIVSAKTHIUR / Super-OCRs-Demo

PRITHIVSAKTHIUR / Tiny-VLMs-Lab

gietema / attention

PRITHIVSAKTHIUR / Core-OCR

graphcore-research / flash-attention-ipu

PRITHIVSAKTHIUR / DeepSeek-OCR-experimental

IsmaelMousa / automatic-essay-grading

lalitdotdev / transcribeX

PRITHIVSAKTHIUR / DeepSeek-OCR-2-Demo

iamrahulreddy / FlashTile

PRITHIVSAKTHIUR / GLM-OCR-Demo

Improve this page

Add this topic to your repo