OpenAI Unveils Jalapeño, Its First Custom Inference Chip
OpenAI unveiled its first custom silicon today. The chip is called Jalapeño. It was built with Broadcom.
What It Is
Jalapeño is an inference processor. It is not designed for pre-training. The distinction matters: inference is where models actually run in production, handling user requests at scale.
OpenAI designed the full stack around it: chip architecture, kernels, memory systems, networking, scheduling, and deployment. That is a significant scope for a company that did not make hardware two years ago.
The Numbers
Early results show better performance-per-watt than current state-of-the-art alternatives. "Significantly better" is OpenAI's framing. No specific watt or token-per-second figures were released. Real-world production numbers will be more informative.
OpenAI highlighted low operating cost for real-time coding models specifically. That is a pointed claim. Coding inference is high-volume and latency-sensitive. If Jalapeño holds up there, it reduces a real cost line.
The Partnership
Broadcom has been in this deal since October 2025. Today is the first public silicon from it. Eight months from announcement to unveiling is a reasonable timeline for a first chip. Whether Jalapeño ships at scale is the next question.
The Nvidia Angle
OpenAI's stated goal is reducing dependence on Nvidia for inference workloads. That is not subtle. Nvidia dominates AI inference hardware and charges accordingly. A credible internal alternative changes OpenAI's negotiating position even if Jalapeño never fully replaces GPU infrastructure.
One detail worth noting: OpenAI used its own AI models to help design the chip. Whether that produced meaningful improvements or was a good press angle is hard to assess from the outside.
What Comes Next
Custom silicon from AI labs is not new. Google has TPUs. Amazon has Trainium. Meta has MTIA. OpenAI is late to this particular game. The interesting question is whether Jalapeño performs well enough in production to matter, or whether it becomes the kind of internal project that quietly disappears after a few quarters.
The performance-per-watt claim is promising. The lack of hard numbers is not.
Source: Techcrunch