Logging · 001

Engineering's view on local frontier AI — what actually runs on hardware you control.

A weekly engineering journal benchmarking open-weights models on consumer silicon. Real harnesses, real failures, no vendor decks.

./bench/this-week.md

Qwen 3.6-35B-A3B · MoE ● PASS

tok/s · gen 47.5

vram peak 94.0G

ttft 0.18s

Bosgame M5 Strix Halo · 96GB iGPU Vulkan/RADV llama.cpp

Recent dispatches

1 What to Buy for Local LLM Inference: Strix Halo, Mac Studio, DGX Spark, or a GPU Rig A buyer's guide to local LLM hardware, ranked by the one spec that actually decides generation speed: memory bandwidth. Plus where ROCm really stands on Strix Halo. 13 min read · Jun 25
2 AMD Is Selling "First-Class ROCm" on Strix Halo. I've Run the Same Chip for Six Months. On June 8, AMD opened pre-orders for a $3,999 box built around the exact chip I've run in production since the start of the year, marketed on full ROCm support, with one of its own demos running on the exact model my board can't load under ROCm. 8 min read · Jun 18
3 A BIOS Update Won't Fix #6182 — I Tried the Newest One The Bosgame M5's ROCm bug is board-specific, not chip-specific — so firmware is the obvious lever. I flashed Bosgame's newest official BIOS hoping to dodge it. It didn't work, and the negative narrows where the fault actually lives. 4 min read · Jun 11
4 Full Context on a Vulkan-Only Strix Halo: The Decode-Drop Reproduces, but the Sweet Spot Moves kmarble showed ROCm decode collapses 64% at full context on Strix Halo, and ROCm+MTP cures it. My board can't run ROCm. The Vulkan half reproduces the drop — but the MTP sweet spot from last week walks left at depth: by 76k, drafting too deep is slower than no speculation at all. 12 min read · Jun 04
5 MTP Defaults Are a Trap: What 260 Runs Showed About Speculative Decoding on Qwen3.6 Until May 19, the llama.cpp speculative-decoding default was 16. On Qwen3.6's single MTP head, that default cost up to 75% of generation throughput. Here's where the real sweet spots are — and why they're architecture-specific. 11 min read · May 28
6 ROCm 7.x on the Bosgame M5: 14 Configurations, 14 Failures We promised a ROCm 7.x revisit. We got a comprehensive workaround sweep instead. Both are useful. 9 min read · May 22
7 Vulkan/RADV vs ROCm 6.4 on Strix Halo: What 128 Benchmark Runs Actually Showed The headline isn't where Vulkan wins. It's where ROCm doesn't run at all. 9 min read · May 14
8 What 96GB of VRAM on Unified-Memory Hardware Actually Gets You for Local LLM Inference An honest practitioner take from a Bosgame M5 running Strix Halo at full BIOS allocation. 8 min read · May 09