LLM Dashboard

Model Evaluation

Last Updated
Nov 19, 2025, 11:08 AM
Functional Areas
Powered by comprehensive
benchmark analysis

Total Models

52

Across all categories

Total Benchmarks

3

Evaluation benchmarks

Providers

16

Model providers

gemini-3-pro

Google
Rank #1
9990.0
Aggregate Score
Benchmarks Evaluated:1
Top Benchmarks:
LMSYS Chatbot Arena
Rank #1
9990.00
98.0%ile

TRAE + Doubao-Seed-Code

Other
Rank #1
78.8
Aggregate Score
Benchmarks Evaluated:1
Top Benchmarks:
SWE-bench Verified
Rank #1
78.80
98.0%ile

openai/gpt-oss-20b

OpenAI
Rank #1
1888026.6
Aggregate Score
Benchmarks Evaluated:3
Top Benchmarks:
downloads
Rank #26
5660121.00
48.0%ile
likes
Rank #26
3952.00
48.0%ile
HuggingFace Trending Models
Rank #26
6.75
48.0%ile

grok-4.1-thinking

xAI
Rank #2
9990.0
Aggregate Score
Benchmarks Evaluated:1
Top Benchmarks:
LMSYS Chatbot Arena
Rank #1
9990.00
98.0%ile

Atlassian Rovo Dev (2025-09-02)

Other
Rank #2
76.8
Aggregate Score
Benchmarks Evaluated:1
Top Benchmarks:
SWE-bench Verified
Rank #2
76.80
96.0%ile

meta-llama/Llama-3.1-8B-Instruct

Meta
Rank #2
1707295.6
Aggregate Score
Benchmarks Evaluated:3
Top Benchmarks:
downloads
Rank #22
5116917.00
56.0%ile
likes
Rank #22
4963.00
56.0%ile
HuggingFace Trending Models
Rank #22
6.71
56.0%ile

grok-4.1

xAI
Rank #3
9980.0
Aggregate Score
Benchmarks Evaluated:1
Top Benchmarks:
LMSYS Chatbot Arena
Rank #2
9980.00
96.0%ile

EPAM AI/Run Developer Agent v20250719 + Claude 4 Sonnet

Anthropic
Rank #3
76.8
Aggregate Score
Benchmarks Evaluated:1
Top Benchmarks:
SWE-bench Verified
Rank #3
76.80
94.0%ile

deepseek-ai/DeepSeek-OCR

DeepSeek
Rank #3
1653691.6
Aggregate Score
Benchmarks Evaluated:3
Top Benchmarks:
downloads
Rank #8
4958304.00
84.0%ile
likes
Rank #8
2764.00
84.0%ile
HuggingFace Trending Models
Rank #8
6.70
84.0%ile

claude-sonnet-4-5-20250929-thinking-32k

Anthropic
Rank #4
9970.0
Aggregate Score
Benchmarks Evaluated:1
Top Benchmarks:
LMSYS Chatbot Arena
Rank #3
9970.00
94.0%ile

ACoder

Other
Rank #4
76.4
Aggregate Score
Benchmarks Evaluated:1
Top Benchmarks:
SWE-bench Verified
Rank #4
76.40
92.0%ile

black-forest-labs/FLUX.1-dev

Black-forest-labs
Rank #4
518885.1
Aggregate Score
Benchmarks Evaluated:3
Top Benchmarks:
downloads
Rank #15
1544807.00
70.0%ile
likes
Rank #15
11842.00
70.0%ile
HuggingFace Trending Models
Rank #15
6.19
70.0%ile

gemini-2.5-pro

Google
Rank #5
9970.0
Aggregate Score
Benchmarks Evaluated:1
Top Benchmarks:
LMSYS Chatbot Arena
Rank #3
9970.00
94.0%ile

Harness AI

Other
Rank #5
74.8
Aggregate Score
Benchmarks Evaluated:1
Top Benchmarks:
SWE-bench Verified
Rank #7
74.80
86.0%ile

MiniMaxAI/MiniMax-M2

MiniMaxAI
Rank #5
302786.3
Aggregate Score
Benchmarks Evaluated:3
Top Benchmarks:
downloads
Rank #12
907022.00
76.0%ile
likes
Rank #12
1331.00
76.0%ile
HuggingFace Trending Models
Rank #12
5.96
76.0%ile

claude-opus-4-1-20250805-thinking-16k

Anthropic
Rank #6
9960.0
Aggregate Score
Benchmarks Evaluated:1
Top Benchmarks:
LMSYS Chatbot Arena
Rank #4
9960.00
92.0%ile

Lingxi-v1.5_claude-4-sonnet-20250514

Anthropic
Rank #6
74.6
Aggregate Score
Benchmarks Evaluated:1
Top Benchmarks:
SWE-bench Verified
Rank #8
74.60
84.0%ile

google/embeddinggemma-300m

Google
Rank #6
149518.6
Aggregate Score
Benchmarks Evaluated:3
Top Benchmarks:
downloads
Rank #13
447327.00
74.0%ile
likes
Rank #13
1223.00
74.0%ile
HuggingFace Trending Models
Rank #13
5.65
74.0%ile

claude-sonnet-4-5-20250929

Anthropic
Rank #7
9960.0
Aggregate Score
Benchmarks Evaluated:1
Top Benchmarks:
LMSYS Chatbot Arena
Rank #4
9960.00
92.0%ile

JoyCode

Other
Rank #7
74.6
Aggregate Score
Benchmarks Evaluated:1
Top Benchmarks:
SWE-bench Verified
Rank #9
74.60
82.0%ile

Qwen/Qwen-Image-Edit-2509

Qwen
Rank #7
114505.2
Aggregate Score
Benchmarks Evaluated:3
Top Benchmarks:
downloads
Rank #19
342717.00
62.0%ile
likes
Rank #19
793.00
62.0%ile
HuggingFace Trending Models
Rank #19
5.53
62.0%ile

gpt-4.5-preview-2025-02-27

OpenAI
Rank #8
9960.0
Aggregate Score
Benchmarks Evaluated:1
Top Benchmarks:
LMSYS Chatbot Arena
Rank #4
9960.00
92.0%ile

🆕 Prometheus-v1.2.1 + GPT-5

OpenAI
Rank #8
74.4
Aggregate Score
Benchmarks Evaluated:1
Top Benchmarks:
SWE-bench Verified
Rank #11
74.40
78.0%ile

moonshotai/Kimi-K2-Thinking

Moonshotai
Rank #8
53385.7
Aggregate Score
Benchmarks Evaluated:3
Top Benchmarks:
downloads
Rank #2
158859.00
96.0%ile
likes
Rank #2
1293.00
96.0%ile
HuggingFace Trending Models
Rank #2
5.20
96.0%ile

claude-opus-4-1-20250805

Anthropic
Rank #9
9940.0
Aggregate Score
Benchmarks Evaluated:1
Top Benchmarks:
LMSYS Chatbot Arena
Rank #6
9940.00
88.0%ile

Tools + Claude 4 Opus (2025-05-22)

Anthropic
Rank #9
73.2
Aggregate Score
Benchmarks Evaluated:1
Top Benchmarks:
SWE-bench Verified
Rank #12
73.20
76.0%ile

datalab-to/chandra

Datalab-to
Rank #9
29347.6
Aggregate Score
Benchmarks Evaluated:3
Top Benchmarks:
downloads
Rank #25
87661.00
50.0%ile
likes
Rank #25
377.00
50.0%ile
HuggingFace Trending Models
Rank #25
4.94
50.0%ile

chatgpt-4o-latest-20250326

OpenAI
Rank #10
9930.0
Aggregate Score
Benchmarks Evaluated:1
Top Benchmarks:
LMSYS Chatbot Arena
Rank #7
9930.00
86.0%ile

🆕 Salesforce AI Research SAGE (bash-only)

Other
Rank #10
73.0
Aggregate Score
Benchmarks Evaluated:1
Top Benchmarks:
SWE-bench Verified
Rank #13
73.00
74.0%ile

zai-org/GLM-4.6

Zhipu
Rank #10
25342.0
Aggregate Score
Benchmarks Evaluated:3
Top Benchmarks:
downloads
Rank #27
74952.00
46.0%ile
likes
Rank #27
1069.00
46.0%ile
HuggingFace Trending Models
Rank #27
4.87
46.0%ile

gpt-5-high

OpenAI
Rank #11
9930.0
Aggregate Score
Benchmarks Evaluated:1
Top Benchmarks:
LMSYS Chatbot Arena
Rank #7
9930.00
86.0%ile

Tools + Claude 4 Sonnet (2025-05-22)

Anthropic
Rank #11
72.4
Aggregate Score
Benchmarks Evaluated:1
Top Benchmarks:
SWE-bench Verified
Rank #14
72.40
72.0%ile

dx8152/Qwen-Edit-2509-Multiple-angles

Qwen
Rank #11
25037.6
Aggregate Score
Benchmarks Evaluated:3
Top Benchmarks:
downloads
Rank #5
74418.00
90.0%ile
likes
Rank #5
690.00
90.0%ile
HuggingFace Trending Models
Rank #5
4.87
90.0%ile

kimi-k2-thinking

Other
Rank #12
9930.0
Aggregate Score
Benchmarks Evaluated:1
Top Benchmarks:
LMSYS Chatbot Arena
Rank #7
9930.00
86.0%ile

✅ OpenHands + GPT-5

OpenAI
Rank #12
71.8
Aggregate Score
Benchmarks Evaluated:1
Top Benchmarks:
SWE-bench Verified
Rank #15
71.80
70.0%ile

PaddlePaddle/PaddleOCR-VL

PaddlePaddle
Rank #12
13083.5
Aggregate Score
Benchmarks Evaluated:3
Top Benchmarks:
downloads
Rank #18
37913.00
64.0%ile
likes
Rank #18
1333.00
64.0%ile
HuggingFace Trending Models
Rank #18
4.58
64.0%ile

o3-2025-04-16

Other
Rank #13
9920.0
Aggregate Score
Benchmarks Evaluated:1
Top Benchmarks:
LMSYS Chatbot Arena
Rank #8
9920.00
84.0%ile

maya-research/maya1

Maya-research
Rank #13
11381.5
Aggregate Score
Benchmarks Evaluated:3
Top Benchmarks:
downloads
Rank #3
33459.00
94.0%ile
likes
Rank #3
681.00
94.0%ile
HuggingFace Trending Models
Rank #3
4.52
94.0%ile

qwen3-max-preview

Qwen
Rank #14
9920.0
Aggregate Score
Benchmarks Evaluated:1
Top Benchmarks:
LMSYS Chatbot Arena
Rank #8
9920.00
84.0%ile

vafipas663/Qwen-Edit-2509-Upscale-LoRA

Qwen
Rank #14
8413.5
Aggregate Score
Benchmarks Evaluated:3
Top Benchmarks:
downloads
Rank #24
25055.00
52.0%ile
likes
Rank #24
181.00
52.0%ile
HuggingFace Trending Models
Rank #24
4.40
52.0%ile

glm-4.6

Zhipu
Rank #15
9890.0
Aggregate Score
Benchmarks Evaluated:1
Top Benchmarks:
LMSYS Chatbot Arena
Rank #11
9890.00
78.0%ile

baidu/ERNIE-4.5-VL-28B-A3B-Thinking

Baidu
Rank #15
5445.4
Aggregate Score
Benchmarks Evaluated:3
Top Benchmarks:
downloads
Rank #4
15849.00
92.0%ile
likes
Rank #4
483.00
92.0%ile
HuggingFace Trending Models
Rank #4
4.20
92.0%ile

gpt-5-chat

OpenAI
Rank #16
9870.0
Aggregate Score
Benchmarks Evaluated:1
Top Benchmarks:
LMSYS Chatbot Arena
Rank #13
9870.00
74.0%ile

tarn59/apply_texture_qwen_image_edit_2509

Qwen
Rank #16
5420.1
Aggregate Score
Benchmarks Evaluated:3
Top Benchmarks:
downloads
Rank #21
16199.00
58.0%ile
likes
Rank #21
57.00
58.0%ile
HuggingFace Trending Models
Rank #21
4.21
58.0%ile

qwen3-max-2025-09-23

Qwen
Rank #17
9870.0
Aggregate Score
Benchmarks Evaluated:1
Top Benchmarks:
LMSYS Chatbot Arena
Rank #13
9870.00
74.0%ile

gemini-3-pro

Google
Rank #17
4995.5
Aggregate Score
Benchmarks Evaluated:4
Top Benchmarks:
overall_rank
Rank #1
1.00
98.0%ile
arena_score
Rank #1
9990.00
98.0%ile
lmsys_rank
Rank #1
1.00
98.0%ile
+1 more benchmarks

claude-opus-4-20250514-thinking-16k

Anthropic
Rank #18
9860.0
Aggregate Score
Benchmarks Evaluated:1
Top Benchmarks:
LMSYS Chatbot Arena
Rank #14
9860.00
72.0%ile

grok-4.1-thinking

xAI
Rank #18
4995.5
Aggregate Score
Benchmarks Evaluated:4
Top Benchmarks:
overall_rank
Rank #1
1.00
98.0%ile
arena_score
Rank #1
9990.00
98.0%ile
lmsys_rank
Rank #1
1.00
98.0%ile
+1 more benchmarks

deepseek-v3.1-terminus

DeepSeek
Rank #19
9860.0
Aggregate Score
Benchmarks Evaluated:1
Top Benchmarks:
LMSYS Chatbot Arena
Rank #14
9860.00
72.0%ile

grok-4.1

xAI
Rank #19
4991.0
Aggregate Score
Benchmarks Evaluated:4
Top Benchmarks:
overall_rank
Rank #2
2.00
96.0%ile
arena_score
Rank #2
9980.00
96.0%ile
lmsys_rank
Rank #2
2.00
96.0%ile
+1 more benchmarks

deepseek-v3.1-terminus-thinking

DeepSeek
Rank #20
9860.0
Aggregate Score
Benchmarks Evaluated:1
Top Benchmarks:
LMSYS Chatbot Arena
Rank #14
9860.00
72.0%ile

claude-sonnet-4-5-20250929-thinking-32k

Anthropic
Rank #20
4986.5
Aggregate Score
Benchmarks Evaluated:4
Top Benchmarks:
overall_rank
Rank #3
3.00
94.0%ile
arena_score
Rank #3
9970.00
94.0%ile
lmsys_rank
Rank #3
3.00
94.0%ile
+1 more benchmarks
Showing 52 models