In the rapidly evolving landscape of large language models (LLMs), token efficiency is becoming a serious concern. As developers and researchers keep pushing more structured data into models, the cost and latency tied to token count only grow. That’s where Token-Oriented Object Notation (TOON) (GitHub repository here) comes in. It’s a serialization format built specifically for LLM prompts, aiming to cut down token usage while keeping the data structured and machine-readable. The authors describe TOON as “a compact, deterministic JSON format for LLM prompts,” and their benchmarks show 30–60 percent fewer tokens on large, uniform arrays of objects compared to formatted JSON.
In this post, we introduce TOON, walk through a simple token-count comparison against JSON, look at where the savings come from, and end with a realistic note: TOON isn’t a one-size-fits-all solution.
The Replay is a weekly newsletter for dev and engineering leaders.
Delivered once a week, it's your curated guide to the most important conversations around frontend dev, emerging AI tools, and the state of modern software.
TOON is a serialization format for structured data designed with LLM inputs in mind. It is human-readable, uses minimal syntax (leaning on indentation and compact arrays), and aims to remove the repeated overhead of typical JSON when dealing with large uniform arrays of objects.
Key features include:
\t, pipe |) to further reduce token count when arrays are very largeIn short: if you regularly pass large chunks of tabular or array-structured data to an LLM, TOON offers a compelling alternative to JSON (or YAML) by saving tokens and retaining structure.
Let’s walk through a minimal example to illustrate how token count savings can occur with TOON.
Suppose you have the following JSON data:
{
"users": [
{ "id": 1, "name": "Alice", "role": "admin" },
{ "id": 2, "name": "Bob", "role": "user" }
]
}
In many LLM systems, each token contributes cost or context usage. Because JSON repeats the property names (“id”, “name”, “role”) for each object, there’s overhead.
Using TOON, the same data might be encoded as:
users[2]{id,name,role}:
1,Alice,admin
2,Bob,user
Notice:
2”) is declared once after usersid, name, role are declared once rather than repeatedIt is clear that the intuition behind TOON is simple: spare the repetition of the field names, parentheses, and other symbols; you just pass the data to the LLM.
The Format Tokenization Exploration tool allows you to compare token usage side by side across different serialization formats, including CSV, pretty-printed JSON, compressed JSON, YAML, and TOON (Token-Oriented Object Notation). As the playground shows, for the same dataset, you might see JSON using, say, X tokens, while TOON uses significantly fewer (sometimes ~30-60% less) under the right conditions. The ability to toggle dataset size and complexity makes it immediately clear how format choice directly impacts token cost – a powerful visual for anyone working with LLM-prompt budgets.
Here’s a quick comparison showing how TOON stacks up against other data formats in terms of structure and token usage:
| Format | Example Structure | Approx. Token Count | Relative Size vs. JSON |
|---|---|---|---|
| Pretty-Printed JSON | Human-readable JSON with indentation and repeated keys | 6,360 | 100 % (baseline) |
| Minified JSON | Compact JSON, no spaces or line breaks | 5,420 | ≈ 85 % |
| YAML | Whitespace-based structure still repeats keys | 6,050 | ≈ 95 % |
| CSV | Flat table, minimal structure | 2,360 | ≈ 37 % |
| TOON | Declares fields once, tabular rows below | 2,518 | ≈ 40 % |
The team behind TOON offers not only the specification but also the libraries for a range of languages to handle the TOON encode/decode tasks. In the library, a function is also available to estimate the savings of using TOON instead of JSON. The following code fragment shows an example of its use and the result (we are using the Python version here):
from toon_format import estimate_savings
# Your typical prompt data
prompt_data = {
"context": [
{"role": "system", "content": "You are a helpful assistant"},
{"role": "user", "content": "Analyze this data"}
],
"data": [
{"id": i, "value": f"Item {i}", "score": i * 10}
for i in range(1, 101) # 100 items
]
}
result = estimate_savings(prompt_data["data"])
# Compare formats
# GPT-5 pricing (example: $0.01 per 1K tokens)
cost_per_1k = 0.01
json_cost = (result['json_tokens'] / 1000) * cost_per_1k
toon_cost = (result['toon_tokens'] / 1000) * cost_per_1k
print(f"JSON: {result['json_tokens']} tokens")
print(f"TOON: {result['toon_tokens']} tokens")
print(f"JSON cost per request: ${json_cost:.4f}")
print(f"TOON cost per request: ${toon_cost:.4f}")
print(f"Savings: {result['savings_percent']:.1f}%")
print(f"Savings per request: ${json_cost - toon_cost:.4f}")
print(f"Savings per 10,000 requests: ${(json_cost - toon_cost) * 10000:.2f}")
The results, with the data in the example, are:
JSON: 2703 tokens TOON: 1009 tokens JSON cost per request: $0.0270 TOON cost per request: $0.0101 Savings: 62.7% Savings per request: $0.0169 Savings per 10,000 requests: $169.40
The code uses an estimated cost of $0.01 per 1,000 tokens, which is not a real measurement. This is because perfect visibility or guarantees about the final token count are not possible before the payload reaches the model. The estimate_savings method relies on the tiktoken library, the same tokenizer used by OpenAI; however, other LLMs may use a different tokenizer, meaning actual savings can vary.
While TOON offers meaningful benefits in many scenarios, it’s not a silver bullet. Here are some important caveats to keep in mind:
Because TOON is a relatively new format, today’s LLMs are far more familiar with conventional structures like JSON, which appear heavily in their training data. As a result, you can’t assume a model will recognize or produce TOON without guidance.
In practice, you need to teach the format by example: show a small TOON snippet, name the format explicitly, and clearly state that the model should use it in its response. The authors emphasize that demonstration works better than explanation – LLMs quickly infer the pattern once they see the header, the field list, and a few aligned rows. After generation, it’s good practice to decode or parse the output to confirm it matches the expected structure before using it downstream.
This simple loop – show, request, verify – tends to be the most reliable way to incorporate TOON into real workflows. So, for example, before passing real data with the prompt, you can show a fragment of TOON to the LLM and/or explain how to manipulate it: with a fragment like this, you explain the TOON format and also teach the LLM how to present TOON data (notice toon – at the beginning of the fragment)
```toon
users[3,]{id,name,age}:
1,Alice,30
2,Bob,25
3,Charlie,35
```
Or you can use a brief TOON explanation before using it:
Respond using TOON format (Token-Oriented Object Notation):
- Use `key: value` for objects
- Use indentation for nesting
- Use `[N]` to indicate array lengths
- Use tabular format `[N,]{fields}:` for uniform arrays
Example:
users[2,]{id,name}:
1,Alice
2,Bob
As LLM usage grows and context windows expand, every token counts – both in terms of cost and performance. TOON offers a compelling format for those feeding structured, repetitive data into models: by declaring field names once, flattening rows, and cutting syntactic overhead, it achieves meaningful reductions in token usage while preserving structure.
That said, the format has its sweet spots: when your data is uniform, tabular, and high-volume. If your dataset is deeply nested, irregular, or you’re already using simple CSV, the gains may be minimal or even reversed.
If you’re working with LLM prompts that include large arrays of objects and you’re hitting token-budget constraints, it’s worth exploring TOON: try converting a sample dataset, measure token counts with your target tokenizer, compare accuracy/bandwidth, and decide whether the switch is worth it in your context.

How AGENTS.md and agent skills improve coding agents, reduce mistakes, and make AI IDE workflows more reliable and project-aware.

Build a simple, framework-free Node.js app, and then deploy it to three different services that offer a free tier, Render, Railway, and Fly.io.

Understand best practices for structuring Node.js projects, such as separating roles using folder structures and practicing modular code.

How senior engineers run TypeScript effectively at scale in modern codebases.
Hey there, want to help make our blog better?
Join LogRocket’s Content Advisory Board. You’ll help inform the type of content we create and get access to exclusive meetups, social accreditation, and swag.
Sign up now