Agent Playground

The Agent Playground lets you test your agent configuration across multiple AI models simultaneously. Compare response quality, speed, and costs in real-time to find the best model for your use case.

Getting Started

When you open the Agent Playground, you'll see the main testing interface with a single emulator window ready to use.

Agent Playground Overview

The left sidebar contains your Agent Settings where you can configure:

Agent Persona - Define your AI's role and behavior
Restrictions - Set boundaries for your agent
Previous Messages Limit - Control conversation context
Temperature - Adjust response creativity (0-1 scale)
Knowledge Base Nodes - Connect relevant knowledge sources
Tools - Enable API functions and integrations
Presets - Save and quickly load common configurations

Changes to these settings apply to all active emulators, ensuring consistent testing conditions across models.

Selecting AI Models

Click on the model name in any emulator window to open the model selection dropdown.

Model Selection

Available models are organized by tiers and include:

Claude (Haiku, Sonnet, Opus)
GPT-4 and variants
Gemini (Flash, Pro)
Mistral and Meta Llama models
Grok and other providers
and more...

Each model displays its tier and consumption rate, helping you balance performance with cost. Models marked as "New" or "Recommended" indicate optimized options for most use cases.

Testing Multiple Models

Add up to 5 model emulators to test simultaneously. Click Add Emulator to create additional windows.

Multiple Emulators

All emulators share a single input field at the bottom. When you type a message, it sends to all active models at once, ensuring perfectly consistent A/B testing conditions.

Comparing Responses

Watch how different models respond to identical prompts in real-time.

Active Conversations

Each response displays:

Response time in milliseconds
Message content with the model's unique style and tone
Edit and copy options for further analysis

Compare models across:

Response Quality - Accuracy, helpfulness, and relevance
Response Speed - Time to first token and total completion time
Response Style - Tone, formatting, and conversational approach

Configuring Tools

Enable tools and functions to test your agent's capabilities with API integrations.

Configure Tools Modal

Choose from:

API Functions - Custom functions your agent can call
System Functions - Built-in capabilities
Code Functions - Code execution tools
MCP Servers - Model Context Protocol integrations
Intents (Studio) - Pre-configured agent intents

This allows you to test how different models handle function calling and tool usage before deploying to production.

Pro Tip

To test Studio workflows, use the built-in emulator within the Studio or publish the workflow and test it via the shareable link.

Viewing Comparison Metrics

After testing multiple models, click Compare Agents in the metrics section to analyze performance side-by-side.

Model Comparison Metrics

The comparison view shows:

Performance table with messages, average response times, min/max times, and credit usage
Response Time Comparison chart tracking speed trends across messages
Credits Comparison chart showing cost patterns

This helps you identify the optimal balance between speed, quality, and cost for your specific use case.

Viewing Individual Model Metrics

Click View Metrics at the bottom of any emulator window to see detailed analytics for that specific model.

View Metrics Button

The individual metrics modal displays:

Individual Model Metrics

Total Messages - Number of interactions processed
Avg Response Time - Mean time across all responses
Min/Max Response Time - Performance range
Total Credits - Cumulative consumption
Avg Credits/Msg - Cost efficiency per message
Performance charts - Visual trends over time

Use these insights to understand each model's behavior patterns and consistency.

Common Use Cases

A/B Testing - Compare how different models handle your specific prompts and scenarios
Configuration Tuning - Test temperature, persona, and tool combinations to find optimal settings
Cost Optimization - Identify models that deliver quality within your budget constraints
Performance Benchmarking - Measure response times and consistency before production deployment

Pro Tips

Save successful configurations as Presets for quick reuse across testing sessions
Monitor average response times to set realistic expectations for production performance
Compare credits per message to optimize costs without sacrificing quality
Use Clear Chat to reset conversations and test fresh scenarios with different contexts

Testing your agent's performance is a critical step in ensuring it delivers high-quality responses. The Agent Playground provides a powerful tool for comparing multiple AI models side-by-side, allowing you to identify the best model for your use case. By testing multiple models, you can compare response quality, speed, and cost, and make informed decisions about which model to deploy to production.

Agent Playground

On this page