Skip to content
Processing locally — files never leave your device

robots.txt Generator

Generate a properly-formatted robots.txt. Add user-agent rules, allow/disallow paths, link your sitemap, and copy the result.

How to use robots.txt Generator

  1. Add one block per user-agent — start with User-agent: * for rules that apply to all crawlers.
  2. List your Allow and Disallow paths, one directive per line, each beginning with a / from the site root.
  3. Add a Sitemap: line with the full absolute URL of your sitemap so crawlers can discover it.
  4. Review the generated file for typos — a single wrong Disallow line can hide an entire site.
  5. Copy the result and upload it as /robots.txt at the root of your domain.

robots.txt: tell crawlers where they may go

robots.txt is a plain-text file at the root of your domain that tells search-engine crawlers which parts of your site they may and may not request. It is the first file most crawlers fetch. This generator builds a correctly formatted file with user-agent groups, allow and disallow rules, and a sitemap reference — so you control your crawl budget without accidentally hiding pages you want indexed.

What a file looks like

User-agent: *
Disallow: /admin/
Disallow: /cart/
Allow: /admin/public/

User-agent: Googlebot
Disallow: /no-google/

Sitemap: https://example.com/sitemap.xml

Each group starts with one or more User-agent lines followed by its rules. The Sitemap directive is independent of any group and uses an absolute URL.

Crawling is not indexing

This is the most important and most misunderstood point. Disallow controls whether a crawler fetches a URL — it does not control whether that URL appears in search results. A disallowed page that other sites link to can still be indexed as a bare URL with no snippet. If your goal is to keep something out of the index, do the opposite of what most people expect: allow crawling and add a noindex directive, because the crawler has to read the page to see the noindex.

What robots.txt is good for

  • Keeping crawlers out of internal areas (admin, carts, search-result pages)
  • Conserving crawl budget on large sites by blocking low-value parameter URLs
  • Advertising your sitemap location to every crawler at once
  • Setting different rules for different bots via named user-agent groups

What it should never be used for

Do not use robots.txt as a security mechanism — the file is public, so listing a secret path in Disallow actually advertises it. Do not block CSS or JavaScript that Google needs to render the page. And do not rely on it to remove indexed pages; for that, use noindex or the Search Console removal tool.

Testing before you deploy

A misplaced Disallow: / blocks your entire site, so test carefully. Google Search Console includes a robots.txt report that shows the live file Googlebot sees and lets you check whether a specific URL is allowed or blocked. Validate after every change — the cost of a mistake here is your whole site disappearing from search.

Related SEO tools

Frequently asked questions

Where do I put robots.txt?
At the very root of each host: https://example.com/robots.txt. Crawlers only look there — a file in a subdirectory or on a subdomain governs only that host. Each subdomain (and each protocol/port) needs its own robots.txt.
Does Disallow guarantee a page will not be indexed?
No. Disallow only stops crawling. A blocked URL can still appear in search results (often with no description) if other pages link to it. To truly keep a page out of the index, allow crawling and use a noindex meta tag or X-Robots-Tag header instead.
Is Crawl-delay respected by Google?
No. Google ignores Crawl-delay entirely; manage Googlebot's crawl rate in Search Console if needed. Bing, Yandex, and some other crawlers do honour it.
Should I block CSS and JavaScript?
No. Google renders pages like a browser and needs your CSS and JS to understand layout and content. Blocking them can cause Google to misjudge mobile-friendliness and page quality. Leave resource directories crawlable.
What does User-agent: * mean?
The asterisk is a wildcard matching any crawler that does not have its own named block. A crawler uses the most specific matching User-agent group and ignores the rest, so a named Googlebot block overrides the * block for Googlebot.
Can robots.txt block bad bots or scrapers?
Not reliably. robots.txt is an honour-system convention — well-behaved crawlers obey it, but malicious scrapers simply ignore it. To actually block abusive traffic you need server-side measures like firewall rules, rate limiting, or authentication.
How do Allow and Disallow interact?
For Google, the most specific rule (the longest matching path) wins, regardless of order. So Disallow: /folder/ with Allow: /folder/public/ blocks the folder but keeps the public subpath crawlable. Use this to carve exceptions out of a broad block.
Do I even need a robots.txt file?
It is optional. If you have nothing to disallow, you can omit it and crawlers will assume everything is allowed. But a minimal file pointing to your sitemap is good practice, and an empty or all-allow file is far safer than a misconfigured one.

More tools you might find useful in the same flow.

Built by Muhammad Tahir · About