MarketGoo

What is the robots.txt file? (2018)

What is the robots.txt file? (2018)

in Academy on September 4th, 2018

Updated June 2018.

The name robots.txt sounds a little out there, especially when you’re new to SEO. Luckily, it sounds way weirder than it actually is. Website owners like you use the robots.txt file to give web robots instructions about their site. More specifically, it tells them which parts of your site you don’t want to be accessed by search engine creepy crawlers. The first thing a search engine spider like looks at when it is visiting a page is the robots.txt file.

 

a robots.txt robot
Ignore the crazy eyes, this robot is all good! Animation by Matt Barnes

 

Why is the robots.txt file important?

 

The robots.txt file is usually used to block search engines like Google from ‘seeing’ certain pages on your website – either because you don’t want your server to be overwhelmed by Google’s crawling, or have it crawling unimportant or duplicated pages on your site.

You might be thinking that it is also a good way to hide pages or information you’d rather be kept confidential and you don’t want to appear on Google. This is not what the robots.txt file is for, as these pages you want to hide may easily appear by circumventing the robots.txt instructions, if for instance another page on your site links back to the page you don’t want to appear.

While it is important to have this file, your site will still function without it and will still usually be crawled and indexed. An important reason it’s relevant to your site’s SEO because improper usage can affect your site’s ranking.

What’s improper usage?

  • An empty robots.txt file
  • Using the wrong syntax
  • Your robots.txt is in conflict with your sitemap.xml file (your robots.txt file contradicts your sitemap – if something is in your sitemap, it should not be blocked by your robots file). 
  • Using it to block private or sensitive pages instead of password protecting them
  • Accidentally disallowing everything
  • Your robots.txt file is over the 500 kb limit 
  • Not saving your robots.txt file in the root directory

 

robots.txt file
         Illustration by Justas Galaburda

 

What does the task look like on MarketGoo?

robots

Within MarketGoo, the task falls within the “Review Your Site” category. The task is simple, because if we detect a robots.txt file on your site, we will just make sure you know what it’s for and that it should be properly set up.

 

robots.txt on Weebly

 

If you’re on Weebly, your website automatically includes a robots.txt file that you can use to control search engine indexing for specific pages or for your entire site. You can view your robots file by going to www.yourdomain.com/robots.txt or yourdomain.weebly.com/robots.txt (using your website name instead of  ‘yourdomain’).

The default setting is to allow search engines to index your entire site. If you want to prevent your entire website from being indexed by search engines, do the following:

  1. Go to the Settings tab in the editor and click on the SEO section
  2. Scroll down to the “Hide site from search engines” toggle
  3. Switch it to the On position
  4. Re-publish your site

If you only want to protect some of your pages from being indexed, do the following:

  1. Go to the SEO Settings menu
  2. Check that the “Hide site from search engines” toggle is set to Off.
  3. Go to the Pages tab and click on the page you want to hide
  4. Click on the SEO Settings button
  5. Click the checkbox to hide the page from search engines
  6. Click on the back arrow at the top to save your changes

You can change this as many times as you want, but remember that search engines take a while to figure it out and reflect it in their results.

There are some things that have been blocked and you can’t change on Weebly – like the directory where uploaded files for Digital Products are stored. These will not have any negative effect on your site or its search engine ranking.

Note: Google Search Console may give you a warning about ‘severe health issues’ regarding your Weebly site’s robots file. This is related to the blocked files described above, so don’t worry.

 

robots.txt on Wix

 

If you are on Wix, yo should know that Wix automatically generates a robots.txt file for every site created using its platform. You can view this file by adding ‘/robots.txt’ to your root domain (www.domain.com/robots.txt) (replacing domain.com with your actual domain name). If you see what’s in your robots .txt file, you will realise that Wix has added files related to the structure of the sites, such as ‘noflashhtml’ and ‘backhtml’ links. Since they do not contribute to your site’s SEO, they do not need to be crawled.

  • It is not possible to edit the robots.txt file of your Wix site. However, you may add a ‘noindex’ tag (so it doesn’t appear in search results) to an individual page of your Wix site.

If you don’t want a specific page of your site to appear in search engine results, you can hide it in the Page SEO section:

  1. Click the Pages Menu from the top bar of the Editor
  2. Click the page you want to hide
  3. Click the Show More icon
  4. Click Page SEO
  5. Click the toggle next to Hide this page from search results. This means that people cannot find your page when searching keywords and phrases in search engines.
  6. Click Done

If you choose to password protect a page, this too prevents search engines from crawling and indexing that page. This means that password protected pages do not appear in search results.

 

robots.txt on Squarespace

 

This is yet another platform that automatically generates a robots.txt file for every site. Squarespace uses the robots.txt file to tell search engines that part of a site’s URL is restricted. They do this because these pages are only for internal use, or because they are URLs that are showing duplicate content (which can negatively affect your SEO). If you use a tool like Google Search Console, it’s going to show you an alert about these restrictions that Squarespace has set in the file.

Squarespace shows us as an example, that they ask Google not to crawl URLs like /config/, which is your Admin login page, or /api/ which is the Analytics tracking cookie. This makes sense. 

Additionally, if you see the following in your robots.txt file, this is also something normal for Squarespace to do in order to prevent duplicate content (which can appear in these pages):

  • /*?author=*
  • /*&author=*
  • /*?category=*
  • /*&category=*
  • /*?month=*
  • /*&month=*

robots.txt on WordPress

 

If you’re on WordPress, your robots.txt file usually is in your site’s root folder. You will need to connect to your site using an FTP client or by using the cPanel file manager to view it. You can open it with a plain text editor like Notepad.

If you do not have a robots.txt file in your site’s root directory, then you can create one:

  1. Create a new text file on your computer and save it as robots.txt
  2. Upload it to your site’s root folder

 

To modify this file, you can install a plugin:

  1. From your WordPress dashboard, select ‘Plugins‘ from the left menu.
  2. Hover over plugins and click on add new
  3. Search for ‘robots
  4. Look for and select WP Robots Txt
  5. Hover over settings and click on ‘reading’
  6. Scroll down and you will see the robots.txt content. This is where you can modify your robots.txt file.

 

Best Practises

  • If you want to prevent crawlers from accessing any private content on your website, then you have to password protect the area where they are stored.  Robots.txt is a guide for web robots, so they are technically not under the boligation to follow your guidelines.  
  • Another way to prevent certain URLs from showing up in Google is to use the <noindex> tag but still give crawlers access to the directory. Don’t use your robots file to hide private pages from appearing in search results!
  • Google Search Console offers a free Robots Tester, which scans and analyzes your file. You can test your file there to make sure it’s well set up. Log in, and under “Crawl” click on “robots.txt tester.” You can then enter the URL and you’ll see a green Allowed if everything looks good.  
  • You can use robots.txt to block files such as unimportant image or style files. But if the absence of these makes your page harder to understand for the search engine crawlers, then don’t block them, otherwise Google won’t entirely understand your site like you want it to. 
  • All bloggers, site owners and webmasters should be careful while editing the robots.txt file; if you’re not sure, err on the side of caution!

I just want to know whether my site has a robots.txt or not!

Just go to your browser and add “/robots.txt” to the end of your domain name! So, if your site is myapparelsite.com, what you type into the browser will be www.myapparelsite.com/robots.txt, and you’ll see something that looks like this (this example is for a WordPress site):

WP robots file

 

If you’re a MarketGoo user, MarketGoo will tell you automatically whether it detects it or not.

Want to get started optimizing your site? Login Now! (or signup for MarketGoo to try it!)

robot_txt
Illustration by Zach Roszczewski

Get your Free SEO report - our FREE tool that detects your site’s issues and gives you an SEO plan to follow.

Use it to start improving your traffic and grow your business online!

GET YOUR FREE REPORT!

Marketing & Communications