What? If robots.txt blocked?

Robots.txt files are text-based files that tell search engines what they should and shouldn’t crawl. When you publish a new page or post to your website, search engine bots crawl this content in order to index it in search results. However, if you have some parts of your website that you don’t want to have indexed you can tell search bots to skip them so that they don’t appear on the results page.

Here are a few things you should check when attempting to solve a “sitemap contains URLs which are blocked by robots.txt” error: Check for any Disallow rules within your robots.txt file. ... Once reloaded, click Request Indexing > Crawl only this URL. Clear your website's cache.
 
The bot will crawl your site regardless of what you say in the robots.txt because it is a completely voluntary thing.
So if you do not want the site crawled and exposing directories you do not want to have exposed, CHANGE THE FOLDER PERMISSIONS.
That is the only way that you will safeguard fully a directory from a robot from crawling it.
 
Back
Top