Skip to main content

SEO Configuration - Sitemap & Robots.txt

πŸ“ Files Location​

robots.txt​

  • Location: static/robots.txt
  • Production URL: https://xerago.ai/robots.txt
  • Purpose: Controls search engine crawling behavior

sitemap.xml​

  • Generated automatically by Docusaurus during build
  • Production URL: https://xerago.ai/sitemap.xml
  • Location after build: build/sitemap.xml

πŸ€– robots.txt Configuration​

The robots.txt file is configured to:

  • βœ… Allow all search engines to crawl all pages
  • βœ… Reference the sitemap location
  • βœ… Include placeholders for blocking specific bots (if needed)
  • βœ… Include crawl-delay option (commented out by default)

Current Configuration:​

User-agent: *
Allow: /
Sitemap: https://xerago.ai/sitemap.xml

To Block Specific Pages:​

Uncomment and modify these lines in static/robots.txt:

Disallow: /admin/
Disallow: /private/

To Block Specific Bots:​

Add these lines to static/robots.txt:

User-agent: BadBotName
Disallow: /

πŸ—ΊοΈ Sitemap Configuration​

Configured in docusaurus.config.js:

sitemap: {
changefreq: 'weekly', // How often pages change
priority: 0.5, // Default priority (0.0 to 1.0)
ignorePatterns: ['/tags/**'], // Patterns to exclude
filename: 'sitemap.xml', // Output filename
}

Sitemap Settings Explained:​

  1. changefreq: 'weekly'

    • Tells search engines how often to check for updates
    • Options: always, hourly, daily, weekly, monthly, yearly, never
  2. priority: 0.5

    • Default priority for all pages (0.0 = lowest, 1.0 = highest)
    • Homepage typically gets 1.0, other pages 0.5-0.8
  3. ignorePatterns

    • Excludes specific URL patterns from sitemap
    • Currently excludes tag pages: /tags/**
  4. filename: 'sitemap.xml'

    • Standard filename for sitemaps

πŸ” How to Verify​

After Building for Production:​

  1. Build the site:

    npm run build
  2. Check robots.txt:

    # File should exist at:
    build/robots.txt
  3. Check sitemap.xml:

    # File should exist at:
    build/sitemap.xml
  4. Serve locally to test:

    npm run serve

    Then visit:


πŸš€ Production Deployment​

After deploying to production, verify:

  1. robots.txt is accessible:

    • Visit: https://xerago.ai/robots.txt
    • Should display the robots.txt content
  2. sitemap.xml is accessible:

    • Visit: https://xerago.ai/sitemap.xml
    • Should display XML sitemap with all pages
  3. Submit to Search Engines:

    Google Search Console:

    Bing Webmaster Tools:


πŸ“Š Sitemap Contents​

The sitemap will automatically include:

βœ… All MDX pages in src/pages/

  • / (home)
  • /about-us
  • /blog/*
  • /customer-stories/*
  • /solutions/*
  • etc.

βœ… All documentation pages in docs/

❌ Excluded patterns:

  • /tags/** (tag pages)
  • Any patterns added to ignorePatterns

πŸ› οΈ Customization Options​

Change Update Frequency for Specific Pages:​

You can set different priorities in page frontmatter:

---
title: About Us
description: Learn about Xerago
# SEO customization
sitemap:
changefreq: daily
priority: 0.9
---

Exclude Specific Pages:​

Add to docusaurus.config.js:

sitemap: {
ignorePatterns: [
'/tags/**',
'/admin/**',
'/private/**',
],
}

βœ… Checklist​

  • robots.txt created in static/ folder
  • Sitemap configuration added to docusaurus.config.js
  • Sitemap references correct production URL
  • Test robots.txt after build (build/robots.txt)
  • Test sitemap.xml after build (build/sitemap.xml)
  • Submit sitemap to Google Search Console (after deployment)
  • Submit sitemap to Bing Webmaster Tools (after deployment)
  • Monitor crawl errors in search console

πŸ“ Notes​

  • Docusaurus automatically generates sitemap.xml during the build process
  • The sitemap is regenerated on every build with updated content
  • robots.txt is a static file and won't change unless you edit it
  • Both files will be available at the root of your production site