Automated and self-cleaning WebCrawler
F
Fireflies
To enhance our AI agent's capabilities, we need an automated and self-cleaning WebCrawler. This WebCrawler should run at a certain frequency, automatically adding new URLs and removing those no longer present on the sitemap. This feature would ensure our AI agent has the most up-to-date information, improving its efficiency and accuracy.
Milou Wolsing
Merged in a post:
Automatically crawl URL's
F
Freek Vermolen
I would like to automatically crawl my URL's in the Web Crawler
W
Wouter Rosekrans
It would be nice if it was possible to set the frequency with which this update runs. For our situation, it would be ideal if the web crawler could compare the sitemap of the most recent crawl with the current sitemap and then only the changes (remove pages that have been removed from the knowledge base and add new pages).
Fleur Nouwens
Hey Fireflies, thanks for your feedback! Following up on this:
- What specific frequency do you envision for the WebCrawler to run (e.g., daily, weekly)?
- Are there any specific types of URLs or content that should be prioritized or excluded by the WebCrawler?
- How should the WebCrawler handle URLs that are temporarily unavailable or return errors?
F
Freek Vermolen
I think ones a day and mayve max 5 urls
Fleur Nouwens
Hey Freek Vermolen, thanks for your feedback! Following up on this:
- What specific types of URLs do you want the crawler to target?
- How frequently would you like the URLs to be crawled?
- Are there any specific data points or information you want to extract from the crawled URLs?