framework.models.get_chat_completion#

framework.models.get_chat_completion(message, max_tokens=1024, model_config=None, provider=None, model_id=None, budget_tokens=None, enable_thinking=False, output_model=None, base_url=None, provider_config=None)[source]#

Execute direct chat completion requests across multiple AI providers.

This function provides immediate access to LLM model inference with support for advanced features including extended thinking, structured outputs, and automatic TypedDict conversion. It handles provider-specific API differences, credential management, and HTTP proxy configuration transparently.

The function supports multiple interaction patterns: - Simple text-to-text completion for basic use cases - Structured output generation with Pydantic models or TypedDict - Extended thinking workflows for complex reasoning tasks - Enterprise proxy and timeout configuration

Provider-specific features: - Anthropic: Extended thinking with budget_tokens, content block responses - Google: Thinking configuration for enhanced reasoning - OpenAI: Structured outputs with beta chat completions API - Ollama: Local model inference with JSON schema validation - CBORG: OpenAI-compatible API with custom endpoints (LBNL-provided service)

Parameters:

message (str) – Input prompt or message for the LLM model
max_tokens (int) – Maximum tokens to generate in the response
model_config (dict, optional) – Configuration dictionary with provider and model settings
provider (str, optional) – AI provider name (‘anthropic’, ‘google’, ‘openai’, ‘ollama’, ‘cborg’)
model_id (str, optional) – Specific model identifier recognized by the provider
budget_tokens (int, optional) – Thinking budget for Anthropic/Google extended reasoning
enable_thinking (bool) – Enable extended thinking capabilities where supported
output_model (Type[BaseModel], optional) – Pydantic model or TypedDict for structured output validation
base_url (str, optional) – Custom API endpoint, required for Ollama and CBORG providers
provider_config (dict, optional) – Optional provider configuration dict with api_key, base_url, etc.

Raises:

ValueError – If required provider, model_id, api_key, or base_url are missing
ValueError – If budget_tokens >= max_tokens or other invalid parameter combinations
pydantic.ValidationError – If output_model validation fails for structured outputs
anthropic.APIError – For Anthropic API-specific errors
openai.APIError – For OpenAI API-specific errors
ollama.ResponseError – For Ollama API-specific errors

Returns:

Model response in format determined by provider and output_model settings

Return type:

Union[str, BaseModel, list]

Note

Extended thinking is currently supported by Anthropic (with budget_tokens) and Google (with thinking_config). Other providers will log warnings if thinking parameters are provided.

Warning

When using structured outputs, ensure your prompt guides the model toward generating the expected structure. Not all models handle schema constraints equally well.

Examples

Simple text completion:

>>> from framework.models import get_chat_completion
>>> response = get_chat_completion(
...     message="Explain quantum computing in simple terms",
...     provider="anthropic",
...     model_id="claude-3-sonnet-20240229",
...     max_tokens=500
... )
>>> print(response)

Extended thinking with Anthropic:

>>> response = get_chat_completion(
...     message="Solve this complex reasoning problem...",
...     provider="anthropic",
...     model_id="claude-3-sonnet-20240229",
...     enable_thinking=True,
...     budget_tokens=1000,
...     max_tokens=2000
... )
>>> # Response includes thinking process and final answer

Structured output with Pydantic model:

>>> from pydantic import BaseModel
>>> class AnalysisResult(BaseModel):
...     summary: str
...     confidence: float
...     recommendations: list[str]
>>>
>>> result = get_chat_completion(
...     message="Analyze this data and provide structured results",
...     provider="openai",
...     model_id="gpt-4",
...     output_model=AnalysisResult
... )
>>> print(f"Confidence: {result.confidence}")

Using configuration dictionary:

>>> config = {
...     "provider": "ollama",
...     "model_id": "llama3.1:8b",
...     "max_tokens": 1000
... }
>>> response = get_chat_completion(
...     message="Hello, how are you?",
...     model_config=config,
...     base_url="http://localhost:11434"
... )