Add LLM benchmarking framework to staging #2405

kubraaksux · 2026-01-19T14:05:51Z

Generic LLM benchmark suite for evaluating inference performance across different backends (vLLM, Ollama, OpenAI, MLX).

Features:

Multiple workload categories: math (GSM8K), reasoning (BoolQ, LogiQA), summarization (XSum, CNN/DM), JSON extraction
Pluggable backend architecture for different inference engines
Performance metrics: latency, throughput, memory usage
Accuracy evaluation per workload type
HTML report generation

This framework can be used to evaluate SystemDS LLM inference components once they are developed.

Generic LLM benchmark suite for evaluating inference performance across different backends (vLLM, Ollama, OpenAI, MLX). Features: - Multiple workload categories: math (GSM8K), reasoning (BoolQ, LogiQA), summarization (XSum, CNN/DM), JSON extraction - Pluggable backend architecture for different inference engines - Performance metrics: latency, throughput, memory usage - Accuracy evaluation per workload type - HTML report generation This framework can be used to evaluate SystemDS LLM inference components once they are developed.

github-project-automation bot added this to SystemDS PR Queue Jan 19, 2026

github-project-automation bot moved this to In Progress in SystemDS PR Queue Jan 19, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add LLM benchmarking framework to staging #2405

Add LLM benchmarking framework to staging #2405

Uh oh!

kubraaksux commented Jan 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Add LLM benchmarking framework to staging #2405

Are you sure you want to change the base?

Add LLM benchmarking framework to staging #2405

Uh oh!

Conversation

kubraaksux commented Jan 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant