Benchmarking LLM modelů

Rubáš, Jan

Benchmarking LLM modelů

DSpace Repository

Language: English čeština

Benchmarking LLM modelů

dc.contributor.advisor	Šenkeřík, Roman
dc.contributor.author	Rubáš, Jan
dc.date.accessioned	2025-12-10T23:10:38Z
dc.date.available	2025-12-10T23:10:38Z
dc.date.issued	2024-10-27
dc.identifier	Elektronický archiv Knihovny UTB
dc.identifier.uri	http://hdl.handle.net/10563/58764
dc.description.abstract	Diplomová práce se zabývá návrhem, implementací a evaluací vlastního nástroje pro benchmarking výstupů velkých jazykových modelů a systémů typu Retrieval-Augmented Generation (RAG). Hlavním cílem bylo porovnat kvalitu odpovědí, latenci a rozsah výstupu různých modelů na základě souboru odborně formulovaných promptů. Nástroj je navržen jako modulární, s možností manuálního i automatizovaného hodnocení, a je provozován plně lokálně bez odesílání dat na vzdálené servery. Součástí práce je také porovnání modelů LLaMA, Mistral a DeepSeek v několikanásobné iteraci, vizualizace výsledků, statistická analýza a vyhodnocení pomocí Mini Areny. Model DeepSeek byl navíc testován v režimu RAG s využitím vektorového indexu dokumentů. Výsledky ukazují rozdíly v kvalitě výstupů a efektivitě mezi jednotlivými modely. Práce přináší praktický nástroj využitelný v akademickém i firemním prostředí.
dc.format	75
dc.language.iso	cs
dc.publisher	Univerzita Tomáše Bati ve Zlíně
dc.rights	Bez omezení
dc.subject	velké jazykové modely	cs
dc.subject	benchmarking	cs
dc.subject	evaluace modelů	cs
dc.subject	Retrieval-Augmented Generation	cs
dc.subject	metriky hodnocení	cs
dc.subject	latence odpovědi	cs
dc.subject	kvalita výstupu	cs
dc.subject	Mini Arena	cs
dc.subject	LLM-as-a-Judge	cs
dc.subject	inference	cs
dc.subject	prompt engineering	cs
dc.subject	lokální nasazení	cs
dc.subject	open-source nástroje	cs
dc.subject	vizualizace výsledků	cs
dc.subject	statistická analýza	cs
dc.subject	large language models	en
dc.subject	benchmarking	en
dc.subject	model evaluation	en
dc.subject	Retrieval-Augmented Generation	en
dc.subject	evaluation metrics	en
dc.subject	response latency	en
dc.subject	output quality	en
dc.subject	Mini Arena	en
dc.subject	LLM-as-a-Judge	en
dc.subject	inference	en
dc.subject	prompt engineering	en
dc.subject	local deployment	en
dc.subject	open-source tools	en
dc.subject	result visualization	en
dc.subject	statistical analysis	en
dc.title	Benchmarking LLM modelů
dc.title.alternative	Benchmarking of LLM models
dc.type	diplomová práce	cs
dc.contributor.referee	Pálka, Jiří
dc.date.accepted	2025-06-19
dc.description.abstract-translated	This thesis presents the design, implementation, and evaluation of a custom benchmarking tool for large language models and Retrieval-Augmented Generation (RAG) systems in the Czech language. The main objective was to compare response quality, latency, and output length across several models based on a curated set of technical prompts. The tool is designed as a modular, locally operated system, supporting both manual and automated evaluation without sending any data to external servers. The work includes a comparison of LLaMA, Mistral, and DeepSeek models through multiple iterations, result visualization, statistical analysis, and performance assessment via a Mini Arena. DeepSeek was also evaluated in RAG mode using a document vector index. The results reveal notable differences in performance and efficiency among the tested models. This thesis provides a practical tool applicable in both academic and industry settings.
dc.description.department	Ústav informatiky a umělé inteligence
dc.thesis.degree-discipline	Softwarové inženýrství	cs
dc.thesis.degree-discipline	Software Engineering	en
dc.thesis.degree-grantor	Univerzita Tomáše Bati ve Zlíně. Fakulta aplikované informatiky	cs
dc.thesis.degree-grantor	Tomas Bata University in Zlín. Faculty of Applied Informatics	en
dc.thesis.degree-name	Ing.
dc.thesis.degree-program	Informační technologie	cs
dc.thesis.degree-program	Information Technologies	en
dc.identifier.stag	71613
dc.date.submitted	2025-06-02