o1 Model Not a Chatbot Understanding Its Purpose

o1: Beyond the Chat Model Misconception

The buzz surrounding the o1 model has led to some confusion, primarily because many users initially approached it as a typical chat model. However, a blog post titled 'o1 isn’t a chat model (and that’s the point)' clarified its intended purpose. This distinction even captured the attention of OpenAI CEO Sam Altman and President Greg Brockman, highlighting the importance of understanding o1's unique function.

Initial Frustrations and a Shift in Understanding

Ben Hylak, a former software engineer at SpaceX and interaction designer for Apple VisionOS, initially expressed frustration with o1. He experienced slow, contradictory responses filled with unsolicited architecture diagrams and pros and cons lists. His initial assessment was that o1 was simply 'garbage.' This included 5-minute wait times for responses, self-contradictory and nonsensical outputs, and unrequested diagrams. He even cited an example of asking for refactoring advice, only to receive suggestions to merge files, code that didn't merge files, and unrelated conclusions.

However, not everyone had the same experience. Some users found o1 to be highly effective, leading to further discussions. Through these interactions, Hylak realized his mistake: he was using o1 as a chat model, which was not its intended function. This shift in perspective was welcomed by Altman and Brockman, who pointed out that o1 is a different kind of model requiring a different approach for optimal performance.

o1 as a Report Generator

Instead of a chat model, the article suggests viewing o1 as a 'report generator.' Given sufficient context and clear output requirements, o1 can provide solutions effectively. The key is in how the model is used.

From Prompts to Detailed Briefs

Typical chat models allow users to start with simple questions and add context as needed through iterative back-and-forth interactions. o1, however, does not seek additional context. Instead, users need to provide a lot of context upfront – described as a 'ton' of information, or about ten times the context you would use for a standard prompt. This includes:

All details of attempted solutions
Complete database schema dumps
Explanations of company-specific business, scale, and terminology

It's recommended to treat o1 like a new employee, providing all necessary information from the start.

Focusing on the Desired Output

After providing extensive context, users must clearly define the desired output. Unlike other models where users might specify the persona or thought process, with o1, you should focus solely on 'what' you want, not 'how' the model should do it. This allows o1 to independently plan and execute the required steps, leading to faster and more efficient results.

Strengths and Weaknesses of o1

o1 excels in several areas:

Processing entire files: It can handle large code blocks and extensive context, often completing entire files with minimal errors.
Reducing hallucinations: o1 is accurate in areas like custom query languages (e.g., ClickHouse and New Relic), where other models may mix up syntax.
Medical diagnosis: o1 can offer surprisingly accurate preliminary diagnoses based on images and descriptions.
Explaining concepts: It is skilled at explaining complex engineering concepts through examples.
Generating architectural plans: o1 can create multiple plans, compare them, and list pros and cons.
Evaluation: It shows promise as an effective tool for evaluating results.

However, o1 also has limitations:

Writing in specific styles: It tends to produce reports in an academic or corporate style and struggles with adapting to specific tones.
Building entire applications: While proficient at generating entire files, it cannot build a full SaaS application through iteration. However, it can complete entire features, particularly front-end or simple back-end functionalities.

The Importance of Delay

The article highlights how delay fundamentally alters our perception of products. Hylak likens o1 to email rather than a chat model, due to the delay in its responses. This delay allows for new types of products that benefit from high-latency, long-running background intelligence. The question then becomes: what tasks are people willing to wait 5 minutes, an hour, a day, or even 3-5 business days for?

It is important to note that o1-preview and o1-mini support streaming but not structured generation or system prompts, while o1 supports structured generation and system prompts but not streaming. Understanding these differences will be crucial for developers when designing products in 2025.