Generate Inference Result

Overview

The generate() method is the recommended way to execute your agent when you want automatic trace collection, timing, and inference result creation. It handles the complete lifecycle:

Initializes trace collection
Executes your agent with timing measurement
Creates/updates the inference result with all metadata
Saves collected traces to the platform
Cleans up trace context (prevents memory leaks)

This is the recommended method for production use as it handles all trace lifecycle management automatically.

Method Signature

galtea.inference_results.generate(
    agent: Agent,
    session: Session,
    user_input: str,
    inference_result_id: Optional[str] = None
) -> InferenceResult

Parameters

agent

Agent

required

An instance of a class that extends galtea.Agent. Must implement the call() method.

session

Session

required

The session object to associate the inference result with. Obtain via galtea.sessions.create() or galtea.sessions.get().

user_input

string

required

The user’s input message to send to the agent.

inference_result_id

string

Optional ID of an existing inference result to update. If not provided, a new inference result is created.

Returns

Returns an InferenceResult object with all captured data.

Example

from galtea import Galtea, Agent, AgentInput, AgentResponse, trace, NodeType

galtea = Galtea(api_key="YOUR_API_KEY")

# Define your agent with traced operations
class MyAgent(Agent):
    @trace(name="search", node_type=NodeType.RETRIEVER)
    def search(self, query: str) -> list:
        return [{"doc": "relevant content"}]
    
    @trace(name="generate", node_type=NodeType.LLM)
    def generate_response(self, context: list, query: str) -> str:
        return f"Based on the context: {context}"
    
    @trace(name="main", node_type=NodeType.CHAIN)
    def call(self, input: AgentInput) -> AgentResponse:
        query = input.last_user_message_str()
        context = self.search(query)
        response = self.generate_response(context, query)
        return AgentResponse(
            content=response,
            retrieval_context=str(context)
        )

# Create resources
product = galtea.products.get_by_name("My Product")
version = galtea.versions.create(name="v1.0", product_id=product.id)
session = galtea.sessions.create(version_id=version.id)

# Create agent and run with automatic trace collection
agent = MyAgent()

inference_result = galtea.inference_results.generate(
    agent=agent,
    session=session,
    user_input="What's the product pricing?"
)

print(f"Response: {inference_result.actual_output}")
print(f"Latency: {inference_result.latency}ms")
# Traces are automatically saved to the platform!

What Gets Captured

The generate() method automatically captures and saves:

Data	Source	Description
Input	`user_input` parameter	The user’s message
Output	`AgentResponse.content`	The agent’s response
Retrieval Context	`AgentResponse.retrieval_context`	Context used (for RAG)
Latency	Measured	End-to-end execution time in ms
Usage Info	`AgentResponse.usage_info`	Token counts (if provided)
Cost Info	`AgentResponse.cost_info`	Cost data (if provided)
Traces	`@trace` decorators	All traced operations

Providing Usage and Cost Information

Your agent can return usage and cost information in the AgentResponse:

class MyAgent(Agent):
    @trace(name="main")
    def call(self, input: AgentInput) -> AgentResponse:
        # Your agent logic...
        
        return AgentResponse(
            content="Response content",
            usage_info={
                "input_tokens": 150,
                "output_tokens": 75,
                "cache_read_input_tokens": 50
            },
            cost_info={
                "cost_per_input_token": 0.00001,
                "cost_per_output_token": 0.00003,
                "cost_per_cache_read_input_token": 0.000001
            }
        )

Comparison with Manual Approach

Feature	`generate()`	Manual
Trace initialization	Automatic	`traces.start_collection_context()`
Agent execution	Automatic	Manual call
Timing measurement	Automatic	Manual
Inference result	Auto-created	`inference_results.create()`
Trace saving	Automatic	`traces.save_context()`
Memory cleanup	Automatic	`traces.clear_context()`
Error handling	Built-in	Manual try/finally

Use generate() for production workloads. Use manual trace collection when you need fine-grained control or are debugging.

Error Handling

If the agent raises an exception, generate() ensures trace context is cleaned up:

try:
    result = galtea.inference_results.generate(
        agent=agent,
        session=session,
        user_input="test"
    )
except Exception as e:
    # Trace context is automatically cleaned up
    print(f"Agent failed: {e}")

Create Inference Result - Manual creation
Trace Service - Manual trace management
Simulating Conversations - Multi-turn simulations

SDK

API

​Overview

​Method Signature

​Parameters

​Returns

​Example

​What Gets Captured

​Providing Usage and Cost Information

​Comparison with Manual Approach

​Error Handling

​Related Methods

Overview

Method Signature

Parameters

Returns

Example

What Gets Captured

Providing Usage and Cost Information

Comparison with Manual Approach

Error Handling

Related Methods