Token Economy in Cursor

Preface

This is a notes page, not an article. This means the content may change over time, it's not static, it's a living document.

Keep in mind that I might be wrong. I know it's a terrible way to start a text, but since we don't have access to all of Cursor's internals, some items here may not be 100% accurate. These notes are the result of my research across various sources, with a little bit of guess.

To be fully transparent and respectful of your time: I used an LLM to review the grammar and some information after my research. Whenever a piece of information originates from an LLM, it will be marked with (1).

Tokens and Consumption

The unit of measurement for LLMs is the token. You can think of each token as a "piece of text", the same way words are pieces of a sentence, tokens work similarly.

LLMs process a lot of tokens. The measurement standard we'll use here is one million tokens (1M). To make it more concrete, 1M tokens correspond to (1):

Roughly 750,000 words in Portuguese (my native language).
Almost the entire Harry Potter book series (again, in Portuguese).
30,000 – 50,000 lines of code.

Each LLM has its own pricing per token, but what I find interesting to highlight here is another type of cost (1)(2):

Energy: An average query consumes between 0.3 Wh and 1 Wh (10x more than a Google search). For 1 million tokens (1M MTok), the energy cost reaches 40 kWh.
Water: Consumption of 0.26 ml per query (for cooling). For 1M MTok, that amounts to approximately 0.84 liters.

How to Save Tokens

Inline edits are cheaper than Chat mode. In Chat mode, Cursor tends to be more verbose (output), it explains results, adds context, and may also send your entire codebase (input) to the model.
Improve your context: even in Chat mode, you can save tokens by specifying the files Cursor should consider using the @ key. When you don't specify context, Cursor may send many indexed files (input) to the model, increasing cost.
Clear your context: keep an eye on the circle next to the image icon, it shows the percentage of memory (context) used. When it's too high, consider summarizing what has been done so far and using that as input in a new chat. The reason is that Cursor may send the entire conversation on every new iteration.
Use .cursorignore: this file lets you list directories that Cursor should not index. Part of the indexing also consumes input tokens and may be triggered every time you ask a question without specifying context (see item 3).

These are the ways I've found to save tokens so far. Some rules aren't clearly documented by Cursor, so experiment with these and other strategies and measure the results.

1: I used Gemini to produce these measurements.

2: Based on a prompt of 500 to 1,000 tokens with a "median prompt" as described in this article: https://cloud.google.com/blog/products/infrastructure/measuring-the-environmental-impact-of-ai-inference/.