$ day-3 --pm · session recap

Day 3 PM Recap — Local Models, MCP, and the Superpowers Workflow

A long, solo-John afternoon (Anastasia was away — “just me and not Anastasia”) that moved from the economics and mechanics of local and open-weight models through to the two frameworks that make agentic work scale: the Model Context Protocol (MCP) for connecting Claude to outside tools, and the Superpowers plugin for disciplined, reviewable development. Along the way Claude built and published a working MCP server and reviewed its own code.

The organizing idea, returned to repeatedly: knowing the stack is what turns the agent from magic into something you can troubleshoot. “Think of it as a stack. Bash is lower down, and Claude sits on top of it.” Skip to Claude Code Web and you hide that layer — which also removes your ability to change parts of it, feed it gigabytes of local data, or choose which computer does the work. “If you didn’t understand how Claude interacts with a computer, then it’s just magic… AI literacy without computational literacy is weird.”

The model is a swappable engine

Carrying forward the harness/car metaphor, the LLM is the engine — and you can swap it (gas, diesel, electric). Ollama positions itself as the “travel adapter for all the engines,” working across the different agentic harnesses. Anthropic, naturally, would rather you didn’t: “Claude doesn’t want you to do that, because they want you to use Claude” — the desktop app locks you to Claude’s models, while the CLI is what gives you the freedom to swap.

Over lunch John had pointed Claude Code at Minimax M3 (running on Ollama’s cloud) and rebuilt the DHSI repo into a p5.js retro arcade shooter, auto-published to GitHub Pages — in about seven minutes, on a free model. Minimax M3 is open-weight, US-hosted with zero data retention on Ollama’s cloud, has a 1-million-token context window, and scores ~80.5 on SWE-bench (roughly Claude Sonnet 4.6 territory). The economics are the point: about $0.30 / $1.20 per million input/output tokens versus Fable at $10 / $50 — a 30–60× difference. Run out of Claude tokens? Swap to an open model and keep going.

A detour on environmental and ethical cost

A student’s question about the energy cost of all this opened a real debate. The honest answer is it depends on where the compute runs: a student had calculated that AI is “better” in Massachusetts (~20% renewable) than gas-heavy Indiana; Quebec is hydroelectric, Iceland near-ideal (geothermal, natural cooling). The harder argument was that abstaining isn’t automatically the ethical choice — environmental-justice workers, AI mammogram detection, or a programmer’s tool to fight insurance denials may do real good — so “which is the most ethical to use… really depends on your ethics.” Part of the opposition, John argued, is “emotional and non-rational” and seeking a rationalization — and local AI will soon undercut the environmental argument anyway, since it runs on hardware you already own and power you can choose.

How models are made smaller (and why it matters)

A clear tour of the techniques that let big models run on small hardware:

An aside on Anthropic’s guardrails: Claude will downgrade you if you try to use it to build a competing frontier model — both via distillation and via direct help — the same “Mythos”-class concern from the day before.

Hands-on — driving Claude Code with a local model

We launched the agent on a non-Anthropic engine: ollama launch → choose the Claude harness → select Minimax M3. (Repeated warning: “don’t launch OpenClaw, for God’s sake.”) Troubleshooting was real — a German and a Canadian phone number both failed Ollama registration (workaround: sign up with GitHub/Gmail, or just use a local model like Qwen 2.5 Coder), and a shell-path issue needed source ~/.zshrc.

Chatting with a raw model (ollama run) made two things visible: local models always expose their thinking tokens (“they can’t redact it”), and a raw model has no tool use — no web search, everything pulled from its frozen weights (cutoff January 2026). Which led to the key reframe of hallucination as a compression artifact: “these strings… are not actually strings inside that model… it’s just a bunch of values.” Compress terabytes of knowledge and you get artifacts. Grounding — handing the model the facts and asking it to work with them — is what prevents it.

Building an app on the Ollama API (and the CORS lesson)

Because Ollama exposes an API on a port (ollama serve), other programs — even on another computer — can talk to it as a service. We had Claude “create a bare minimum chat web app that uses the Ollama API.” This surfaced a classic beginner wall: a double-clicked file:// page cannot connect to localhost (cross-origin). The fix is to run a local web server — “localhost can connect to localhost, but C drive can’t connect to localhost.” Hence the push to install Python and use its built-in server (python -m http.server): “once you have Python, you can serve anything, anywhere, all the time.” (With extended live cleanup of Conda/Miniforge stomping on PATH — “that’s what Claude’s for… use the better model when you’re dealing with system stuff.”) Google Colab came up again as free research compute — change the runtime to an A100 you’d otherwise pay ~$10k to own, or rent for a few dollars an hour.

A practical context-management note landed here too: /context shows how many tokens you’re using (system prompts + tools alone run ~21K), which is why /clear and /compact matter — /compact “runs the agent on its own conversation and compacts it.”

Day 3 PM proper — “We are on day 3 PM, Superpowers Framework.”

MCP — the USB-C port for agents

MCP (Model Context Protocol) is the protocol that lets Claude “call and interact with other services” and pull a tool into its toolset. Without it, hitting the Ollama API would mean a hand-rolled curl/bash script every time; MCP bundles the interaction so the agent knows what it can do, the input format, and what comes back. The metaphor: if the harness is the computer, MCP is the USB-C port — universal, versus Apple’s proprietary Lightning (“Lightning port sucked… it was absolutely a money grab”). It’s agent-agnostic: Anthropic introduced it, but Codex, Claude, and others all “speak MCP.”

Live build — a Notebook MCP server

To make it concrete, John launched claude --dangerously-skip-permissions, switched to Fable, and asked it to “create a simple MCP server using Python that creates notebooks and reads and writes them with tags and a full search tool.” In about ten minutes it produced — autonomously — a single SQLite database with an FTS5 full-text index kept in sync by triggers (real search, not substring matching), five tools (create notebook, create/read/update note, search), running over MCP’s STDIO transport with a real client connection, hooked up via claude mcp add. It then published the project to GitHub with a getting-started README — no manual web steps. “It’s almost like we write these READMEs now for agents more than we do for humans.” (Students had built their own MCP servers as a class assignment; you can point Claude at a repo URL and say “install the MCP server at this GitHub repository.”)

Skills and the Superpowers framework

Skills are “a library of prompts that tell the agent how to perform certain types of tasks” — the counterpart to MCP’s programs. A skill may bundle a tool (a “docx” skill teaches the agent to reach for Pandoc) — “a cheat sheet for its exam.” Anastasia’s own setup uses skills around retro aesthetics so she can reuse them.

Superpowers is “one particular set of skills I recommend everyone installs right now” — so popular that Anthropic adopted it as a first-party plugin, and it works across harnesses (Codex CLI, OpenCode, …).

We watched two skills run on the afternoon’s projects:

Tools and proof points worth a look

Housekeeping — a correction on billing

A correction to a Day 2 claim: Anthropic is not wholesale switching Claude subscriptions to metered tokens. The real change (the OpenClaw dispute, effective ~June 15): using a different harness against the Claude API on a subsidized subscription gets metered at API rates — but Claude Code in the terminal stays on subscription session limits. Also noted: the Fable model goes away after June 22. “Congratulations on making it through just my day… I hope you all have great plans for Montreal tonight.”


Through-line of the session: the afternoon connected the small to the large. The same moves you make on a laptop — swapping in an open model, grounding it against hallucination, wiring a tool in through MCP, letting Superpowers brainstorm and review the work — are exactly the moves that scale to rented GPUs and twenty parallel agents. And the recurring discipline underneath all of it: constrain and verify. MCP constrains what an agent can touch; Superpowers makes it prove its work; knowing the stack means you can always ask what is it actually doing, and where?