Best Practices

Following best practices for training your ai agent ensures it provides accurate, consistent, and helpful responses. High-quality training data is the foundation of a reliable AI agent.

Practice	Description
Consistency	Use a consistent format for all training data.
Clarity	Ensure questions and answers are clear and to the point.
Comprehensiveness	Cover as many potential user queries as possible in your training data. The more the chatbot is trained on, the better it will be at handling a wide range of user queries.
Accuracy	Ensure that the training data is accurate and up to date.
Periodic Updates	Periodically update the training data to reflect new information.
Regular Testing	Routinely test the chatbot to verify it's providing accurate responses.

Chunking

Big documents are further divided into smaller chunks. The default chunk size is 1024 characters, with an overlap of 200 characters. Use Advanced Training to customize these settings. When creating documents, add one of the following separators after every 1000 characters: \n~~~\n, otherwise other common \n\n\n, \n\n, \n etc will be used as separators.

What is Match Score?

The match score measures how closely the training data aligns with the user's query. It helps identify the most relevant knowledge base node. A higher match score indicates greater relevance with knowledge base node.

The score is calculated by comparing the user's query with the training data, considering factors such as:

Semantic understanding: The score considers the meaning and context of the query, not just exact keyword matches
Vector proximity: Technically, the score often represents the cosine similarity between query and document vectors in the embedding space
Contextual relevance: How well the entire query aligns with the document's overall topic and focus
Relative ranking: The absolute score matters less than how documents rank compared to each other

The match score ranges from 0 to 1 and is generated by embedding models. Different models use different scoring mechanisms, so some may produce consistently higher scores, while others may yield lower scores.

Training View Source

To review your AI's training data, check out the Training View Source docs. This allows you to inspect the match score for the knowledge base nodes used to train your AI.

Additional Training Tips

Be Specific

Focus training on common scenarios.

Include Variations

Vary phrasing for similar questions.

Keep It Clear

Craft responses precisely as desired.

Stay Consistent

Ensure consistent tone and policy.

Consistency

Use a consistent format for all training data.

Review Regularly

Analyze chat logs to refine training.

Training Data Anomaly Detection

As your AI knowledge base grows, keeping information consistent can be a challenge. Conflicting data from different sources can lead to confusing answers. The Training Data Anomaly Detection feature scans your training data to catch these inconsistencies, ensuring your AI agent always has a single, accurate source of truth.

How It Works

The system works in the background to ensure your AI's knowledge remains reliable.

Spotting Conflicts: It continuously monitors your websites, documents, and text for any contradictory information.
Clear Reporting: When an issue is found, you get a clean summary showing exactly which sources are disagreeing.
Side-by-Side Comparison: Review the conflicting segments next to each other to easily spot the difference.
Quick Resolution: Select the correct source or edit the content instantly to fix the discrepancy.

Benefits

Trustworthy Answers: Prevent your AI agent from giving mixed messages to your users.
Effortless Maintenance: No need to manually hunt for outdated info; the system flags it for you when you add new data to override the old.
Confidence at Scale: Add as much training data as you need without worrying about silent contradictions creeping in.

You will see alerts for Anomaly Detection directly in the top of the dashboard section of your agent whenever a conflict needs your attention.