OpenAI’s Structured Outputs for API Responses

Mangesh Pise
4 min readAug 10, 2024

Source: https://openai.com/index/introducing-structured-outputs-in-the-api

As per a publication released on August 6, 2024, OpenAI has released additional support for structured outputs, specifically for JSON responses. I am going to briefly touch upon this update and provide a point-of-view on it’s usefulness and scope for improvement.

Why this matters!

Developers often deal with structured formats, such as JSON and YAML. More than just the format it is essential that the structure is well-defined to be used in application development. When working with Large Language Model (LLM) APIs provided by LLM providers, such as, OpenAI, Gemini, Anthropic, etc., we know that the general structure of the API responses is JSON-based. However, the actual LLM response is just a text / string typed field within the overall JSON response. For example, observe GPT-4’s response to the following API call:

curl https://api.openai.com/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-d '{
"model": "gpt-4o-mini",
"messages": [{"role": "user", "content": "Say this is a test!"}],
"temperature": 0.7
}'

Notice the instructions (prompt) within messages[0].content, which does not request any specific format.

{
"id": "chatcmpl-abc123",
"object": "chat.completion",
"created": 1677858242,
"model": "gpt-4o-mini",
"usage": {
"prompt_tokens": 13,
"completion_tokens": 7,
"total_tokens": 20
},
"choices": [
{
"message": {
"role": "assistant",
"content": "\n\nThis is a test!"
},
"logprobs": null,
"finish_reason": "stop",
"index": 0
}
]
}

If you notice, the actual LLM response available at choices[0].message.content, which happens to be a simple text response; it is challenging to use such a response in an actual application because there is no structure to that response (with the exception of a chat application, but remember, not all applications are chat-based applications).

Prompting to the rescue

The way to influence LLM to generate a structured response with specific structural elements, is by supplying specific formatting instructions in the prompt as shown below:

curl https://api.openai.com/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-d '{
"model": "gpt-4o-mini",
"messages": [{"role": "user", "content": "Say this is a test! \nFormat as JSON using this structure: \n{ \"statement\":\"<statement here>\" }"}],
"temperature": 0.7
}'

To which, LLM now generates a response within the choices[0].message.content field as follows:

{
"id": "chatcmpl-abc123",
"object": "chat.completion",
"created": 1723316911,
"model": "gpt-4o-mini",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "```json\n{ \"statement\": \"This is a test!\" }\n```"
},
"logprobs": null,
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 31,
"completion_tokens": 15,
"total_tokens": 46
},
"system_fingerprint": "fp_xxxxxx"
}

While this seems achievable, it is not guaranteed that the actual response will follow the exact srtucture. Over a period of time, LLMs have gotten much better at understanding the formatting requirements and adhering to them.

Chart showing reliability (%) in adhering to instructed format and structure requirements

Irrespective if the structure is understood by LLM and that the format is generated or not, it is still a text / string response and additional handling is required to convert the original response into a JSON response.

OpenAI’s Structured Outputs

With the latest release of gpt-4o-2024–08–06 and gpt-4o-mini-2024–07–18, structured JSON responses can be obtained. Refer this published article to learn more about the actual usage, i.e. — (1) tool calls, and (2) via request parameter.

In my humble opinion, tool calls are overrated and I have my own method to manage tool calls outside of the LLM calls in a structured, economical, and performance concious manner. So, I am especially interested in utilizing this feature further using the response_format parameter in the API call.

As per the article, the request payload is extended with an additional object called response_format, as shown below:

curl https://api.openai.com/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-d '{
"model": "gpt-4o-mini-2024-07-18",
"messages": [{"role": "user", "content": "Say this is a test!"}],
"response_format":{
"type": "json_schema",
"json_schema": {
"name": "test_response",
"strict": true,
"schema": {
"type": "object",
"properties":{
"statement": {
"type": "string"
}
},
"required": ["statement"],
"additionalProperties": false
}
}
},
"temperature": 0.7
}'

This generates the following response where you must observe now that the choices[0].message.content field contains a stringified JSON structure that I expect:

{
"id": "chatcmpl-abc123",
"object": "chat.completion",
"created": 1723322338,
"model": "gpt-4o-mini-2024-07-18",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "{\"statement\":\"This is a test!\"}",
"refusal": null
},
"logprobs": null,
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 17,
"completion_tokens": 9,
"total_tokens": 26
},
"system_fingerprint": "fp_xxxxxx"
}

Scope for Improvement

First, notice the response_format.type field in the API request that is set to “json_schema”. It is immediately followed by the definition of the json_schema.schema, which in my opinion is a complex structure comprised of type, properties, and required fields. I wish there was a way to simplify this further by passing an example structure instead of this complex schema definition. For example, as shown below where one could set the schema value to an example and let the LLM figure out the structure:

... 
"response_format":{
"type": "json_schema",
"json_schema": {
"schema": {
"statement": "<statement here>"
},
"required": ["statement"],
"strict": true
}
...

Secondly, if the LLM knows I am expecting a JSON response, I wish the choices[0].message.content field adapts to generate the JSON format instead of stringified JSON format.

Lastly, I wish the format-enforcing capability could be passed directly through prompting itself. I feel that is most optimal approach from development standpoint instead of relying on specific LLM-provider’s API specifications.

Conclusion

Ultimately, I am just happy to see this feature and mainly to know that OpenAI desires to make their tools development-friendly. I have heightened hopes that soon this capability will be an even playing field with all LLM providers.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

Mangesh Pise
Mangesh Pise

Written by Mangesh Pise

With over 24 years of experience in IT, I specialize in driving business outcomes. Committed to thought leadership, I actively share insights through blogging.

Responses (2)

Write a response