Goodfire's Silico Lets Developers Poke Around Inside LLMs
Goodfire has released Silico, a tool that lets developers inspect and adjust individual neurons inside language models. The goal: debug strange behaviors and steer models toward better outputs before or after training.
What It Actually Does
Silico gives developers neuron-level access to trained models. Zoom in on a single neuron. Adjust the parameters connected to it. Watch what changes.
That's more direct than typical fine-tuning, which treats the model as a black box and hopes gradient descent lands somewhere useful.
Developers can also use Silico during training. If certain neurons are picking up unwanted associations, filter the training data to avoid reinforcing them before the behavior gets baked in.
The Research Behind It
Goodfire's founding insight involves a specific bug. LLMs often say 9.11 is greater than 9.9. The culprit: neurons associated with Bible verse numbering, where chapter.verse formatting makes 9.11 feel "bigger." Retraining can target and suppress those neurons directly for math tasks.
The company used similar techniques to reduce hallucinations. They also found a neuron inside Qwen 3 linked to the trolley problem. Activating it caused the model to reframe outputs as explicit moral dilemmas.
The most striking result: boosting neurons associated with transparency and disclosure flipped a model's answer on an ethics question from no to yes in 9 out of 10 cases.
The implication being that model "values" are, at some level, locatable.
Automation and Limits
Interpretability research has historically been slow, manual work. Silico uses AI agents to automate most of it.
One hard limit: Silico requires access to inner model parameters. That means open-source models only. ChatGPT, Gemini, and other closed models are off the table.
Pricing is determined case-by-case. No public tiers.
Context
MIT Technology Review named mechanistic interpretability one of its 10 Breakthrough Technologies of 2026. Goodfire, Anthropic, OpenAI, and Google DeepMind are all active in the space.
CEO Eric Ho's bet is that interpretability becomes a standard part of the development workflow, not a research curiosity. Silico is the product form of that thesis.
Whether developers adopt it depends on how much they trust neuron-level edits to generalize. Fixing one behavior by adjusting a neuron could theoretically affect something else entirely. That's the thing about black boxes: opening them doesn't always make them less complicated.
Source: Technologyreview