Skip to main content

Overview

The update() method allows you to modify an existing inference result after it has been created. This is useful when you need to add or update output, latency, token usage, or cost information.

Method Signature

galtea.inference_results.update(
    inference_result_id: str,
    actual_output: Optional[str] = None,
    actual_input: Optional[str] = None,
    retrieval_context: Optional[str] = None,
    latency: Optional[float] = None,
    input_tokens: Optional[int] = None,
    output_tokens: Optional[int] = None,
    cache_read_input_tokens: Optional[int] = None,
    tokens: Optional[int] = None,
    cost: Optional[float] = None,
    cost_per_input_token: Optional[float] = None,
    cost_per_output_token: Optional[float] = None,
    cost_per_cache_read_input_token: Optional[float] = None
) -> InferenceResult

Parameters

inference_result_id
string
required
The ID of the inference result to update.

Output Fields

actual_output
string
The generated output or response from the AI model.
actual_input
string
The input text or prompt for the inference result.
retrieval_context
string
The context retrieved by a RAG system, if applicable.

Performance Fields

latency
float
The time in milliseconds from request to response.

Usage Fields

input_tokens
int
Number of input tokens sent to the model.
output_tokens
int
Number of output tokens generated by the model.
cache_read_input_tokens
int
Number of input tokens read from the cache.
tokens
int
Total tokens used in the model call.

Cost Fields

cost
float
The total cost associated with the model call.
cost_per_input_token
float
Cost per input token sent to the model.
cost_per_output_token
float
Cost per output token generated by the model.
cost_per_cache_read_input_token
float
Cost per input token read from the cache.

Returns

Returns the updated InferenceResult object.

Example

from galtea import Galtea

galtea = Galtea(api_key="YOUR_API_KEY")

# Update an inference result with output and metrics
updated_result = galtea.inference_results.update(
    inference_result_id="inf_abc123",
    actual_output="Here is the response from the model.",
    latency=245.5,
    input_tokens=150,
    output_tokens=75,
    cost=0.002
)

print(f"Updated: {updated_result.id}")
print(f"Output: {updated_result.actual_output}")
print(f"Latency: {updated_result.latency}ms")

Use Cases

Deferred Output Update

Create an inference result first, then update it after processing completes:
import time

# Assume 'session' and 'my_model' are defined
# session = galtea.sessions.create(...)

# Create inference result with just input
user_input = "What is the weather today?"
inference_result = galtea.inference_results.create(
    session_id=session.id,
    input=user_input
)

# Process with your model
start_time = time.time()
response = my_model.generate(user_input)
latency_ms = (time.time() - start_time) * 1000

# Update with output and metrics
galtea.inference_results.update(
    inference_result_id=inference_result.id,
    actual_output=response,
    latency=latency_ms
)

Adding Cost Information

Update an inference result with cost data after receiving billing info:
galtea.inference_results.update(
    inference_result_id=inference_result.id,
    cost=0.0025,
    cost_per_input_token=0.00001,
    cost_per_output_token=0.00003
)

Notes

Only include fields you want to update. Fields not specified will remain unchanged.
  • Pass None explicitly to clear a field’s value
  • The creditsUsed field cannot be modified through this method