ChatFlow Logo

How to Train with Websites

Step-by-step guide to indexing website content to train your chatbot

How to Train with Websites

Train your chatbot by crawling and indexing your website content. The AI learns from your pages to answer questions about your products, services, and information.

Prerequisites

  • ChatFlow Account (Enterprise plan)
  • Existing Chatbot - Already created
  • Website URL - The site you want to index

Required Plan: Enterprise

What Website Training Does

When you train with a website:

  • ChatFlow crawls your website pages
  • Extracts text content from each page
  • Indexes the content for AI to reference
  • Chatbot can answer questions about your site

Steps

Step 1: Open the Training Tab

  1. Go to Chatbots in the sidebar
  2. Click on your chatbot
  3. Click the Training tab

Step 2: Click Train with Website

Click the Train Chatbot with Website button.

Step 3: Enter Your Website URL

  1. Enter your website's base URL
    • Example: https://www.yourcompany.com
  2. The system will detect subpages automatically

Step 4: Configure Crawl Settings

Set which pages to include or exclude:

Allow URLs

Pages to include in training:

  • /products/* - All product pages
  • /services/* - All service pages
  • /help/* - Help/support pages

Disallow URLs

Pages to exclude:

  • /admin/* - Admin pages
  • /login/* - Login pages
  • /cart/* - Shopping cart
  • /checkout/* - Checkout pages

Step 5: Analyze the Website

  1. Click Analyze
  2. ChatFlow scans your website structure
  3. Shows estimated number of pages
  4. Lists pages to be crawled

Step 6: Start Training

  1. Review the analysis results
  2. Click Start Training (or confirm)
  3. Wait for processing to complete

Processing time depends on website size:

  • Small site (< 50 pages): 5-10 minutes
  • Medium site (50-200 pages): 15-30 minutes
  • Large site (200+ pages): 30-60 minutes

Step 7: Monitor Progress

The Training tab shows:

  • Pages being processed
  • Status of each page (Pending, Processing, Completed, Failed)
  • Overall progress percentage

Managing Trained Content

View Trained Pages

  1. Go to Training tab
  2. See list of all indexed pages
  3. Click a page to see extracted content

Remove Pages

  1. Find the page in the list
  2. Click Remove (trash icon)
  3. Content removed from AI knowledge

Re-crawl Website

To update content after website changes:

  1. Click Re-crawl or Refresh
  2. System checks for new/updated pages
  3. Updates the AI's knowledge

Verify It's Working

Test in Playground

  1. Go to Playground tab
  2. Ask questions about website content
  3. Verify accurate responses

Example questions:

  • "What services do you offer?"
  • "How much does X cost?"
  • "Where are you located?"
  • "What are your return policies?"

Good Response Indicators

  • Accurate information from your site
  • Specific details (prices, hours, etc.)
  • Relevant page references

Best Practices

What to Include

  • Product/service pages
  • FAQ pages
  • About/company info
  • Pricing information
  • Contact details
  • Help documentation

What to Exclude

  • Admin/internal pages
  • Login/authentication pages
  • Cart/checkout pages
  • User-specific content
  • Duplicate content

Content Quality

  • Ensure website has clear, helpful text
  • Remove outdated information
  • Use descriptive page titles
  • Structure content with headings

Regular Updates

  • Re-crawl after major changes
  • Remove outdated pages
  • Monitor for crawl errors

Troubleshooting

Pages Not Crawling

  • Check URL is accessible
  • Verify robots.txt allows crawling
  • Page may require login
  • JavaScript-heavy pages may fail

Content Not Found

  • Page may have minimal text
  • Content in images not extracted
  • Check page actually has useful content

Crawl Errors

ErrorSolution
404 Not FoundPage doesn't exist
403 ForbiddenCheck access permissions
TimeoutPage too slow, try again
Too LargePage exceeds size limit

Wrong Information in Responses

  • Content may be outdated
  • Re-crawl the website
  • Check source page is correct
  • Add FAQs for specific corrections

Plan Limits

PlanPages Limit
Enterprise500 pages

Combining Training Sources

For best results, combine:

  • Website training - Broad coverage
  • Documents - Detailed guides, manuals
  • FAQs - Specific Q&As
  • Connectors - External data sources

Next Steps