Skip to main content

Returns

Returns an InferenceResult object.

Example

inference_result = galtea.inference_results.create(
    session_id="YOUR_SESSION_ID",
    input="What is the capital of France?",
    output="Paris is the capital of France."
)

Parameters

session_id
string
required
The session ID to log the inference result to.
input
string
required
The input text/prompt.
output
string
required
The generated output/response.
retrieval_context
string
Context retrieved for RAG systems.
latency
float
Latency in milliseconds.
usage_info
dict[str, int]
Information about token usage during the model call. Possible keys include:
  • input_tokens: Number of input tokens sent to the model.
  • output_tokens: Number of output tokens generated by the model.
  • cache_read_input_tokens: Number of input tokens read from the cache.
cost_info
dict[str, float]
Information about the cost per token during the model call. Possible keys include:
  • cost_per_input_token: Cost per input token sent to the model.
  • cost_per_output_token: Cost per output token generated by the model.
  • cost_per_cache_read_input_token: Cost per input token read from the cache.
conversation_simulator_version
string
The version of Galtea’s conversation simulator used to generate the user message (input). This should only be provided when logging a conversation that was generated using the simulator.