Custom 404 pages and Google
Everyone wants to display friendly error pages for visitors to thier website. This is easy to do using a variety of methods, but are the common methods always the best decision? There are an abundance of pages that discuss how to make custom 404 error pages. While this practice is an excellent way to help direct misguided visitors to the correct area of your website, it can have some un-wanted side effects, particularly with search engine indexing.
It's simple enough to use the Apache ErrorDocument Directive to send people a friendlier page than the standard "File Not Found" message; but this can send the wrong message to search spiders and anything else not intended to actually display content to a viewer.
Using the ErrorDocument directive to send 404s to a page on your site is great for the user, but it actually sends a "302 - Temporarily Moved" response to most spiders.
We noticed that Google still had outdated URLs from our site in it's listings and were curious why. When Google finds a link that would typically 404 but is handled by the ErrorDocument directive alone the server sends a 302 response. This leads Google to believe that the URL is still valid. According to RFC 2616's definition for a 302 Response, "The requested resource resides temporarily under a different URI." Since the redirect might be altered on occasion, the client should continue to use the Request-URI for future requests." So according to the RFC Google should keep the bad URL in the index since the redirect is not permanent.
My theory is that because these old or invalid URIs are in the index, it may contribute to *decreasing* the possible rank of your website. It seems that Google may look at a large number of 302s as a link farm, and the more outdated links that lead to 404s the more negative impact this effect may have.
Solution
Luckily for you, some PHP magic makes a better way to handle this is very simple.
Step 1
Create a page to display for viewers when they get a 404 error (if you don't already have one). For best usability the page should look like the rest of your site and have the following elements:
- Simple statement that the requested URL was not found on the site,
- List of common URL mistakes for your site (look at your logs for tips)
- Search box so that users can query your site's search engine.
Name the file "404content.php" and upload it to the server.
Step 2
Next we need to create a file to accept and handle the 404 requests. For simplicity we'll name this file 404handler.php and put it in our root directory. The contents of the file should be as follows:
<?php
/* Simple 404 handler */
header("HTTP/1.0 404 Not Found");
include("/path/to/your/404content.php");
?>
How it Works:
The header function sends the 404 response, then include pulls the content of the page you want displayed and feeds it to the browser for viewing.
Step 3
Finally use the Apache ErrorDocument directive in either the servers configuration file (usually httpd.conf), or more commonly in a file called .htaccess in the root directory of the website. It should be as follows:
ErrorDocument 404 /404handler.php
Note: This can also be used for any Error Code by replacing 404 with it's value in the examples above.
Once you have done all of the above, watch your access and error logs for any issues that may arise. Also watch the Googlebot and how it moves around your site.
This should take care of pruning all the outdated links from the search engines and help optimize the way they rank your website.
Was it good for you?
Post to Twitter Post to Digg Post to del.icio.us Stumble It Post to Mixx