LLM Benchmark - Search News

Cognite Launches the Cognite Atlas AI™ LLM & SLM Benchmark Report for Industrial Agents

AUSTIN, Texas & OSLO, Norway--(BUSINESS WIRE)--Cognite, the global leader in AI for industry, today announced the launch of the Cognite Atlas AI™ LLM & SLM Benchmark Report for Industrial Agents. The ...

Security

Simbian launches new security benchmark with AI SOC LLM Leaderboard

Simbian today announced the “AI SOC LLM Leaderboard,” a comprehensive benchmark to measure LLM performance in Security Operations Centers (SOCs). The new benchmark compares LLMs across a diverse range ...

SiliconANGLE

Researchers develop new LiveBench benchmark for measuring AI models’ response accuracy

A group of researchers has developed a new benchmark, dubbed LiveBench, to ease the task of evaluating large language models’ question-answering capabilities. The researchers released the benchmark on ...

SiliconANGLE

MLCommons releases new AILuminate benchmark for measuring AI model safety

MLCommons today released AILuminate, a new benchmark test for evaluating the safety of large language models. Launched in 2020, MLCommons is an industry consortium backed by several dozen tech firms.

12d

Quesma Releases OTelBench: Independent Benchmark Reveals Frontier LLMs Struggle with Real-World SRE Tasks

Quesma, Inc. announced the release of OTelBench, the first comprehensive benchmark for evaluating LLMs on OpenTelemetry ...

datanami.com

Anthropic Looks To Fund Advanced AI Benchmark Development

Since the launch of ChatGPT, a succession of new large language models (LLMs) and updates have emerged, each claiming to offer unparalleled performance and capabilities. However, these claims can be ...

VentureBeat

Nvidia, Intel claim new LLM training speed records in new MLPerf 3.1 benchmark

Training AI models is a whole lot faster in 2023, according to the results from the MLPerf Training 3.1 benchmark released today. The pace of innovation in the generative AI space is breathtaking to ...

insideHPC

MLCommons Launches LLM Safety Benchmark

Dec. 4, 2024 — MLCommons today released AILuminate, a safety test for large language models. The v1.0 benchmark – which provides a series of safety grades for the most widely-used LLMs – is the first ...

Business Wire

Cognite lancia Cognite Atlas AI™ LLM & SLM Benchmark Report for Industrial Agents

Il primo report del settore sulle prestazioni del modello linguistico assicura affidabilità, precisione ed efficacia nello sviluppo di soluzioni di IA per l'industria affidabili. AUSTIN, Texas e OSLO, ...

Results that may be inaccessible to you are currently showing.

Hide inaccessible results