Robots.txt file Structure
Robots.txt is a text file that is used to block our specific pages or resources such as pdf, doc, or some image files from indexing. Robots.txt file tells the search engine bots not to index particular pages.
Major Search Engines (Google, Yahoo, Bing) reads our website robots.txt file and check which page should be indexed and which pages should not be indexed.
Why Use Robots.txt file?
As we already explain robots.txt file is an important file that is used to block particular pages or resources of the website.
Let's see there are 3 main reasons you should use the robots.txt file, which is the following:
Block Non-Public Pages: Sometimes we have a need to block our admin pages which is not for random users such as login page, profile page, site settings, etc. we don’t want to index these pages then here we use the robots.txt file to disallow these pages.
Block Resources: Sometimes we have a need to block our resources such as pdf files, doc files, images, js, CSS, etc. we don’t want the goggle to crawl these resources of our website then we use robots.txt file to disallow these media files from indexing.
Block Particular Character in URL: Sometimes our website got injected from virus then our website create some random URLs and index on search engines. It affects our ranking so to prevent such types of problems we use the robots.txt file to disallow these characters in URLs. It is mostly used in dynamic websites for example which is developed in PHP frameworks such as Codeigniter, laravel, etc.
Robots.txt file for CodeIgniter/Laravel PHP Framework
User-agent: *
Disallow: /index.php/
Disallow:/admin/
Disallow:/media/
Sitemap: your-website-url/sitemap.xml
In this structure, we are disallowing admin pages of our websites, media files of our websites such as CSS, js, image files, and pages that have index.php in URL. At last, we are allowing our sitemap.xml file to index.
Robots.txt file structure for WordPress website
User-agent: *
Disallow: /wp-admin/
Disallow: /wp-content/uploads
Sitemap: your-website-url/sitemap.xml
In this structure, we are disallowing admin pages of our websites and media files of our websites such as CSS, js, image files. At last, we are allowing our sitemap.xml file to index.