Weather Report
What models we're running today, how they're configured, and what role each one plays in the factory.
Why do we publish The Weather Report? The Weather Report started out as a casual internal summary of how each provider and model was performing on our most important use cases. We update it frequently and have found it essential to our process.
gpt-5.4 has been formally adopted for planning and architectural critique. It replaces gpt-5.2 in our Sprint Planning consensus pair and Architectural Critique. Gemini can improve consensus in some cases for sprint planning. We're continuing to evaluate gpt-5.4 for implementation tasks but keeping gpt-5.3-codex as our default implementation model for now.
Log
gpt-5.4 has been formally adopted for planning and architectural critique. It replaces gpt-5.2 in our Sprint Planning consensus pair and Architectural Critique. Gemini can improve consensus in some cases for sprint planning. We're continuing to evaluate gpt-5.4 for implementation tasks but keeping gpt-5.3-codex as our default implementation model for now.
No specific changes in defaults, but please note for anyone evaluating Gemini 3.1, the gemini-3.1-pro-preview-customtools may significantly outperform gemini-3.1-pro-preview depending on your harness. We've switched to gpt-realtime-1.5 for our internal use cases but aren't officially defaulting to it yet. Very happy with Sonnet 4.6, it may overtake Opus for some of our everyday use cases.
Happy with gpt-5.3-codex-spark. gpt-5.3-codex continues to be our preferred default implementation model with critiques and suggestions from Opus. Modified: Sprint Planning. Added: UX Ideation, Agentic dialogues, Voice (interactive).
New models this week. We're very happy with gpt-5.3-codex. No problems with Opus 4.6 so far.