technical seo 6 min read

robots.txt Explained: How to Control Google's Crawlers and Protect Your SEO

A misconfigured robots.txt file can accidentally block Google from indexing your entire site. Here is exactly how robots.txt works and how to set it up correctly.

By SearchRankTool · 10 March 2026

Your robots.txt file is one of the most powerful — and dangerous — files on your website. A single incorrect line can accidentally block Google from crawling your entire site, wiping your rankings overnight. Yet most site owners set it up once and never check it again. This guide explains exactly how robots.txt works, what to block, what to never block, and the critical mistakes that can destroy your SEO.

What Is a robots.txt File?

A robots.txt file is a plain text file placed at the root of your website (e.g. https://yoursite.com/robots.txt) that tells search engine crawlers which pages or directories they are and are not allowed to crawl. It follows a standard called the Robots Exclusion Protocol.

Googlebot — Google's crawler — fetches your robots.txt file before crawling any other page on your site. If your robots.txt is misconfigured or missing, Googlebot still crawls your site but may waste crawl budget on pages you would prefer it to skip.

According to Google's robots.txt documentation, it is one of the most fundamental technical SEO files for controlling how Google accesses your site.

The Basic robots.txt Syntax

A robots.txt file is made up of simple directives in plain text:

User-agent: specifies which crawler the rule applies to. User-agent: * means all crawlers. User-agent: Googlebot means Google specifically.
Disallow: tells the crawler not to visit a specific path. Disallow: /admin/ blocks the admin directory.
Allow: explicitly permits access to a path — useful when a parent directory is blocked but one subdirectory should be crawlable.
Sitemap: tells all crawlers the URL of your XML sitemap.

Example of a basic robots.txt:

User-agent: *
Disallow: /admin/
Disallow: /login/
Disallow: /storage/
Allow: /

Sitemap: https://yoursite.com/sitemap.xml

What You Should Block in robots.txt

Block pages that offer no value to search engine users and would waste Googlebot's crawl budget:

/wp-admin/ or /admin/ — admin dashboards are not for public users
/login/ — login and authentication pages
/storage/ or /private/ — private server directories
/cart/ and /checkout/ — e-commerce transaction pages
/search? — internal search result pages (these create near-infinite duplicate URLs)
/staging/ or /dev/ — development or staging environments
/*.pdf$ — PDF files (if you do not want them indexed)

What You Should Never Block

These are the most common and damaging robots.txt mistakes:

CSS and JavaScript files — Google needs to render these to understand your page layout. Blocking them can cause Google to misinterpret your content.
Your homepage — never block /
Your blog posts and tool pages — any page you want indexed must be crawlable
Your sitemap — never block the sitemap URL
Image directories — unless you deliberately do not want images indexed

The Critical Mistake That Destroys Rankings

The single most catastrophic robots.txt error is this one line:

Disallow: /

This tells every search engine crawler not to crawl any page on your entire site. It takes effect immediately — within hours, Google may begin de-indexing your pages. This single mistake has wiped rankings from websites overnight.

It typically happens when a developer sets Disallow: / on a staging environment to prevent it from being indexed, then accidentally pushes that robots.txt to production.

Always check your live robots.txt after any deployment. Visit https://yourdomain.com/robots.txt directly in your browser and confirm it does not contain Disallow: / for all user agents.

robots.txt vs noindex: The Critical Difference

These two tools are often confused but they do fundamentally different things:

Feature	robots.txt Disallow	noindex Meta Tag
What it does	Prevents crawling	Prevents indexing
Can page still rank?	Yes (if linked to externally)	No
Google reads the page	No	Yes (to see noindex)
Best for	Admin pages, private directories	Duplicate content, thank-you pages

A page blocked by robots.txt can still appear in Google search results if external sites link to it — Google sees the link but cannot read the page. To truly remove a page from search results, you must use the noindex meta tag, not robots.txt.

robots.txt and Crawl Budget

Crawl budget is the number of pages Googlebot will crawl on your site within a given timeframe. For large sites with thousands of pages, using robots.txt to block low-value pages (admin pages, internal search results, duplicate URL parameters) preserves crawl budget for your important content pages.

For most small to medium sites (under 1,000 pages), crawl budget is rarely a limiting factor. However, if you notice important pages taking weeks to be indexed, reviewing your robots.txt for unnecessary crawl budget waste is a useful starting point.

How to Test Your robots.txt

Google Search Console includes a robots.txt tester under Settings → Crawl. You can enter any URL on your site and see whether it would be blocked by your current robots.txt rules.

You can also test your robots.txt manually using our free Robots.txt Generator, which validates syntax and generates a correctly formatted file.

robots.txt Examples by Site Type

Standard blog or content site:

User-agent: *
Disallow: /admin/
Disallow: /login/
Allow: /
Sitemap: https://yoursite.com/sitemap.xml

Laravel/PHP application:

User-agent: *
Disallow: /admin/
Disallow: /storage/
Disallow: /vendor/
Allow: /
Sitemap: https://yoursite.com/sitemap.xml

How Google Treats robots.txt Directives

It is important to understand that Google treats robots.txt as a directive it should respect, but can choose to override in limited circumstances. In particular:

Google will respect Disallow rules and will not crawl those URLs
However, Google may still index a blocked URL if external sites link to it — robots.txt prevents crawling, not indexing
Google may continue to show a blocked page in search results as a URL-only result (without a description) if it was previously indexed or has external links
To fully remove a page from Google, use a noindex meta tag AND allow crawling — Google must be able to crawl the page to read the noindex instruction

robots.txt for Performance and Security

Beyond SEO, robots.txt serves two additional purposes for many sites:

Performance: Blocking aggressive bots that crawl your site unnecessarily can reduce server load. While Google and Bing are well-behaved, some scrapers and AI training bots crawl sites at high frequency. You can block specific user agents:

User-agent: GPTBot
Disallow: /

User-agent: ChatGPT-User
Disallow: /

Privacy: For sites with user-generated content or private sections, robots.txt ensures that logged-in areas, user profile URLs, and internal API endpoints are not indexed. Combine with noindex meta tags for belt-and-suspenders protection on truly private content.

Remember: any file publicly accessible at yourdomain.com/robots.txt can be read by anyone. Do not put sensitive paths in your robots.txt file — this effectively creates a roadmap to private areas for malicious actors. Use proper authentication and server-level access controls for genuinely sensitive content.

Testing and Validating Your robots.txt File

Before deploying a robots.txt change to a live site, always test it. A single misplaced disallow directive can accidentally block your entire site from Google — a mistake that can take weeks to recover from once discovered.

Use Google Search Console's robots.txt tester: Go to Settings → robots.txt in Google Search Console. This tool lets you paste any URL on your site and test whether your current robots.txt file allows or blocks Googlebot from crawling it. You can also edit and test changes before they go live.

Check after every change: Any time you update robots.txt, use the tester to verify that your most important pages are still crawlable. It takes less than 2 minutes and prevents potentially serious indexing mistakes.

Monitor in Google Search Console: After publishing changes to robots.txt, watch the Coverage report in GSC. If you see a sudden spike in "Blocked by robots.txt" errors, you have likely accidentally blocked pages you intended to allow. Revert the change and test again before republishing.

Common robots.txt testing checklist:

Homepage (/) is allowed for all user agents
Key content pages (/blog/, /tools/) are allowed
Admin paths (/wp-admin/, /admin/) are blocked
The Sitemap directive at the bottom points to the correct URL
No wildcard disallows that accidentally block important content

Frequently Asked Questions

Does blocking a page in robots.txt remove it from Google?

No. Blocking a page in robots.txt prevents Googlebot from crawling it, but the page can still appear in search results if external sites link to it. To fully remove a page from Google search, use a noindex meta tag instead.

What happens if I have no robots.txt file?

Googlebot will crawl your entire site without restriction. This is not necessarily harmful for small sites, but it means admin pages and private directories may be crawled. A basic robots.txt blocking admin directories is always recommended.

Can robots.txt hurt my SEO?

Yes — if misconfigured. Accidentally blocking important pages or your entire site can devastate rankings. Always test your robots.txt after any changes and verify live pages are accessible to Googlebot.

How do I generate a robots.txt file?

Use our free Robots.txt Generator to create a correctly formatted robots.txt file. Enter your site URL, select the pages to block, and copy the output directly to your server.

Continue reading — more guides on technical seo

technical seo

How to Create a Robots.txt File for Your Website Free

A robots.txt file controls which pages search engines can crawl. Learn how to create one for free in minutes, what to include, and the common mistakes that block Google from your site.

Read article → technical seo

What Is a Sitemap and Do You Need One

A sitemap helps search engines discover and index your pages. Learn what XML and HTML sitemaps are, when you need one, and how to create and submit yours to Google.