Best Practices
Learn the best practices for training your chatbot
Following best practices for training your ai agent ensures it provides accurate, consistent, and helpful responses. High-quality training data is the foundation of a reliable AI agent.
| Practice | Description |
|---|---|
| Consistency | Use a consistent format for all training data. |
| Clarity | Ensure questions and answers are clear and to the point. |
| Comprehensiveness | Cover as many potential user queries as possible in your training data. The more the chatbot is trained on, the better it will be at handling a wide range of user queries. |
| Accuracy | Ensure that the training data is accurate and up to date. |
| Periodic Updates | Periodically update the training data to reflect new information. |
| Regular Testing | Routinely test the chatbot to verify it's providing accurate responses. |
Chunking
Big documents are further divided into smaller chunks. The default chunk size is 1024 characters, with an overlap of 200 characters. Use Advanced Training to customize these settings. When creating documents, add one of the following separators after every 1000 characters: \n~~~\n, otherwise other common \n\n\n, \n\n, \n etc will be used as separators.
What is Match Score?
The match score measures how closely the training data aligns with the user's query. It helps identify the most relevant knowledge base node. A higher match score indicates greater relevance with knowledge base node.
The score is calculated by comparing the user's query with the training data, considering factors such as:
- Semantic understanding: The score considers the meaning and context of the query, not just exact keyword matches
- Vector proximity: Technically, the score often represents the cosine similarity between query and document vectors in the embedding space
- Contextual relevance: How well the entire query aligns with the document's overall topic and focus
- Relative ranking: The absolute score matters less than how documents rank compared to each other
The match score ranges from 0 to 1 and is generated by embedding models. Different models use different scoring mechanisms, so some may produce consistently higher scores, while others may yield lower scores.
Training View Source
To review your AI's training data, check out the Training View Source docs. This allows you to inspect the match score for the knowledge base nodes used to train your AI.
Additional Training Tips
Be Specific
Focus training on common scenarios.
Include Variations
Vary phrasing for similar questions.
Keep It Clear
Craft responses precisely as desired.
Stay Consistent
Ensure consistent tone and policy.
Consistency
Use a consistent format for all training data.
Review Regularly
Analyze chat logs to refine training.
Training Data Anomaly Detection
As your AI knowledge base grows, keeping information consistent can be a challenge. Conflicting data from different sources can lead to confusing answers. The Training Data Anomaly Detection feature scans your training data to catch these inconsistencies, ensuring your AI agent always has a single, accurate source of truth.
How It Works
The system works in the background to ensure your AI's knowledge remains reliable.
- Spotting Conflicts: It continuously monitors your websites, documents, and text for any contradictory information.
- Clear Reporting: When an issue is found, you get a clean summary showing exactly which sources are disagreeing.
- Side-by-Side Comparison: Review the conflicting segments next to each other to easily spot the difference.
- Quick Resolution: Select the correct source or edit the content instantly to fix the discrepancy.
Benefits
- Trustworthy Answers: Prevent your AI agent from giving mixed messages to your users.
- Effortless Maintenance: No need to manually hunt for outdated info; the system flags it for you when you add new data to override the old.
- Confidence at Scale: Add as much training data as you need without worrying about silent contradictions creeping in.
You will see alerts for Anomaly Detection directly in the top of the dashboard section of your agent whenever a conflict needs your attention.