Skip to main content

How to track token usage

Prerequisites

This guide assumes familiarity with the following concepts:

This notebook goes over how to track your token usage for specific calls.

Using AIMessage.usage_metadata​

A number of model providers return token usage information as part of the chat generation response. When available, this information will be included on the AIMessage objects produced by the corresponding model.

LangChain AIMessage objects include a usage_metadata attribute for supported providers. When populated, this attribute will be an object with standard keys (e.g., "input_tokens" and "output_tokens").

OpenAI​

npm install @langchain/openai @langchain/core
import { ChatOpenAI } from "@langchain/openai";

const chatModel = new ChatOpenAI({
model: "gpt-3.5-turbo-0125",
});

const res = await chatModel.invoke("Tell me a joke.");

console.log(res.usage_metadata);

/*
{ input_tokens: 12, output_tokens: 17, total_tokens: 29 }
*/

API Reference:

Anthropic​

npm install @langchain/anthropic @langchain/core
import { ChatAnthropic } from "@langchain/anthropic";

const chatModel = new ChatAnthropic({
model: "claude-3-haiku-20240307",
});

const res = await chatModel.invoke("Tell me a joke.");

console.log(res.usage_metadata);

/*
{ input_tokens: 12, output_tokens: 98, total_tokens: 110 }
*/

API Reference:

Using AIMessage.response_metadata​

A number of model providers return token usage information as part of the chat generation response. When available, this is included in the AIMessage.response_metadata field.

OpenAI​

import { ChatOpenAI } from "@langchain/openai";

const chatModel = new ChatOpenAI({
model: "gpt-4-turbo",
});

const res = await chatModel.invoke("Tell me a joke.");

console.log(res.response_metadata);

/*
{
tokenUsage: { completionTokens: 15, promptTokens: 12, totalTokens: 27 },
finish_reason: 'stop'
}
*/

API Reference:

Anthropic​

import { ChatAnthropic } from "@langchain/anthropic";

const chatModel = new ChatAnthropic({
model: "claude-3-sonnet-20240229",
});

const res = await chatModel.invoke("Tell me a joke.");

console.log(res.response_metadata);

/*
{
id: 'msg_017Mgz6HdgNbi3cwL1LNB9Dw',
model: 'claude-3-sonnet-20240229',
stop_sequence: null,
usage: { input_tokens: 12, output_tokens: 30 },
stop_reason: 'end_turn'
}
*/

API Reference:

Streaming​

Some providers support token count metadata in a streaming context.

OpenAI​

For example, OpenAI will return a message chunk at the end of a stream with token usage information. This behavior is supported by @langchain/openai >= 0.1.0 and can be enabled by passing a stream_options parameter when making your call.

info

By default, the last message chunk in a stream will include a finish_reason in the message's response_metadata attribute. If we include token usage in streaming mode, an additional chunk containing usage metadata will be added to the end of the stream, such that finish_reason appears on the second to last message chunk.

import type { AIMessageChunk } from "@langchain/core/messages";
import { ChatOpenAI } from "@langchain/openai";
import { concat } from "@langchain/core/utils/stream";

// Instantiate the model
const model = new ChatOpenAI();

const response = await model.stream("Hello, how are you?", {
// Pass the stream options
stream_options: {
include_usage: true,
},
});

// Iterate over the response, only saving the last chunk
let finalResult: AIMessageChunk | undefined;
for await (const chunk of response) {
if (finalResult) {
finalResult = concat(finalResult, chunk);
} else {
finalResult = chunk;
}
}

console.log(finalResult?.usage_metadata);

/*
{ input_tokens: 13, output_tokens: 30, total_tokens: 43 }
*/

API Reference:

Using callbacks​

You can also use the handleLLMEnd callback to get the full output from the LLM, including token usage for supported models. Here's an example of how you could do that:

import { ChatOpenAI } from "@langchain/openai";

const chatModel = new ChatOpenAI({
model: "gpt-4-turbo",
callbacks: [
{
handleLLMEnd(output) {
console.log(JSON.stringify(output, null, 2));
},
},
],
});

await chatModel.invoke("Tell me a joke.");

/*
{
"generations": [
[
{
"text": "Why did the scarecrow win an award?\n\nBecause he was outstanding in his field!",
"message": {
"lc": 1,
"type": "constructor",
"id": [
"langchain_core",
"messages",
"AIMessage"
],
"kwargs": {
"content": "Why did the scarecrow win an award?\n\nBecause he was outstanding in his field!",
"tool_calls": [],
"invalid_tool_calls": [],
"additional_kwargs": {},
"response_metadata": {
"tokenUsage": {
"completionTokens": 17,
"promptTokens": 12,
"totalTokens": 29
},
"finish_reason": "stop"
}
}
},
"generationInfo": {
"finish_reason": "stop"
}
}
]
],
"llmOutput": {
"tokenUsage": {
"completionTokens": 17,
"promptTokens": 12,
"totalTokens": 29
}
}
}
*/

API Reference:

Next steps​

You've now seen a few examples of how to track chat model token usage for supported providers.

Next, check out the other how-to guides on chat models in this section, like how to get a model to return structured output or how to add caching to your chat models.


Was this page helpful?


You can also leave detailed feedback on GitHub.