Running LLMs Locally: A Rigorous Benchmark of Phi-3, Mistral, and Llama 3.2 on Ollama

Abstract This report presents a comprehensive evaluation of three small language models (SLMs) – Llama 3.2 (3B), Phi-3 mini, and Mistral 7B – running locally via Ollama. A FastAPI-based benchmarking framework was developed to measure inference speed, resource consumption, and the models' ability to produce valid JSON outputs as defined by Pydantic schemas. A retry mechanism with reprompting was implemented to handle malformed responses. The models were tested on a suite of 30 prompts spanning general knowledge, mathematics, coding, reasoning, and creative writing. Results highlight trade-offs between speed, accuracy, and resource usage, providing actionable insights for deploying local AI assistants in production environments. 1. Introduction Local deployment of small language models offers privacy, low latency, and cost advantages over cloud-based APIs. However, ensuring consistent, structured outputs is essential for integration into applications. This project benchmarks three popula

Running LLMs Locally: A Rigorous Benchmark of Phi-3, Mistral, and Llama 3.2 on Ollama

Related Articles

Brompton Electric T-Line Folding Electric Bicycle Review: Pocket-Sized Pedal Power

Gothub is live

I Built the Tool I Wish I Had When I Started My Open Source Journey

Razer Boomslang 20th Anniversary Mouse Review: For Collectors

How Bug Bounty Hunters Prioritize 10,000 Recon Targets (Without Losing Their Mind)

Related Articles

News
Brompton Electric T-Line Folding Electric Bicycle Review: Pocket-Sized Pedal Power
Wired • 10h ago

News
Gothub is live
Lobsters • 11h ago

News
I Built the Tool I Wish I Had When I Started My Open Source Journey
Medium Programming • 11h ago

News
Razer Boomslang 20th Anniversary Mouse Review: For Collectors
Wired • 11h ago

News
How Bug Bounty Hunters Prioritize 10,000 Recon Targets (Without Losing Their Mind)
Medium Programming • 11h ago