Domains
Domains allow an AI agent to consume data from a domain
(or website).
Each domain can be associated with one or more AI agents and the domain can be either a top-level domain or a subpath of a domain.
Each webpage scraped counts towards the total number of documents available under each plan.
The pages scraped will be available under the Knowledge Hub
inside the Domain pages
folder.
Menu location
Domains can be created and or deleted from the following menu:
Settings > Domains
More about Domains
The required details for creating a domain are:
- URL (The domain to scrape content from)
- The URL must be a valid domain name without the
https://
orhttp://
prefix. - When the URL is a subpath, the scraping process will be limited to that subpath and its children only
- The URL must be a valid domain name without the
By defining a topic
, the associated AI Agents will have knowledge of the content scraped from the domain.
For security reasons, non-enterprise subscriptions are limited to domains with a publicly exposed sitemap.xml file. If the domain does not have a sitemap.xml file, the domain will not be scraped and the status will change to No sitemap
.
Domain scraping
The scraping process will start once the domain is verified and the status will change to syncing
.
Depending on the size of the domain, the scraping process may take a few minutes to a few hours to complete.
ToothFairyAI will scrape the domain and its children recursively to create a site map of the domain - this ensures that incremental scraping operations are efficient and do not require re-scraping the entire domain over and over again.
Once the scraping process is complete, the status will change to completed
and the domain will be available for use by the associated AI Agents.
Images extraction
If the user wants to extract images from the domain, the Extract images
option should be enabled.
Upon enabling this option, the Images retrieval
instruction field will appear and the user will be required to provide the instructions for the AI Agent to extract the images that are relevant to the domain.
By default this option is disabled.
Sync cycle
The sync cycle is the frequency at which the domain will be scraped for new content. The sync cycle can be set to:
- Manual (default)
- 24 hours (daily)
- 72 hours (3 days)
- Every week (7 days)
By default, the sync cycle is set to manual and the user will need to manually trigger the sync process by clicking on the Sync
button.
Any change to the configuration of the domain will take effect only after the next sync cycle.
Domain verification (only needed for domains with over 500 pages)
To sync websites with over 500 pages, administrators must add a DNS record to a domain to verify ownership; this is a security measure to ensure that only the domain owner can scrape content from the domain. Once the ownership validation process is complete, the website scraping will begin for the full domain.
By adding the records provided by ToothFairyAI to the associated DNS, ToothFairyAI will verify the domain by checking the registered records.
During the verification process, the status will change from:
- verifying
- approved
- syncing
- completed
In case the verification fails, the status will change to failed
or noDomain
if the dns record is not found.