Thursday, August 18, 2005 Custom 404 in PHP tweako

Managing a web server can be daunting task, especially when you are serving multiple types of dynamic pages. For the City of Tucson, our servers are providing the standard HTML and XHTML, then running on top of that we have PHP and PERL CGI scripts. As an added bonus we're running ColdFusion on a integrated JRun Server, which also serves up JSP pages. And lastly, we have Macromedia's outstanding FLEX application server running in a Tomcat Server (which also serves up JSP pages).

No matter how well you setup your server and design your site and applications, someone somewhere at sometime will have a link to a file that isn't there. So one of the many standard things a web server needs to do is serve up 404 pages, or Page Not Found error pages.

Now the default 404 error that comes with Apache is not very useful, it basically tells you that indeed, the page is not there. Most users who encounter a default 404 Page Not Found error simply click the Back button on their web browser and return from whence they came. Now if you design a good site, you should have no dead links of your own, and that means the back button is taking the user AWAY from your site.

Now a good 404 page keeps the user in your site, and lets you, the webmaster, know what happened so you can prevent it from happening again. One good way to do this is to provide a site map or site navigation links on the 404 page, that way the user can simply click on a link within your site and maybe find the page they were searching for. Another good thing to have, especially for larger sites, would be a Search field so they can search your site for what they are looking for.

These things are easy to do, and while these things help the user, how do you identify to yourself when a 404 occurs so you can correct it? The simplest way is an email notice. For this email to be useful, it needs to include parts of the environment such as which page was requested, when the error occurred, and maybe even the browser used. This tutorial covers how to do this with PHP pages. If you have a problem using it let me know. If you prefer to use JSP, I have that tutorial here: Custom JSP 404 page.

The Page

The first thing we need to do is to setup our skeleton page, which is easy enough:

404.php
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">
<head>
   <title>404: Page Not Found</title>
   <meta http-equiv="Content-type" content="text/html; charset=ISO-8859-1" />
</head>
<body>
<h1>404: Page Not Found</h1>
</body>
</html>

You can copy and paste this code, it will run as-is and validates as strict XHTML1.1.

Next we want setup our addresses, subject and message, then send it off. This page assumes you have your own mail server on your web server. Be sure to adjust the addresses to match your needs.

404.php
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">
<head>
   <title>404: Page Not Found</title>
   <meta http-equiv="Content-type" content="text/html; charset=ISO-8859-1" />
</head>
<body>
<h1>404: Page Not Found</h1>
<?php
   $from_header = "From: web_server@myhost.com\r\n";
   $to = "my_address@myhost.com";
   $subject = "404 Error";
   $today = date("D M j Y g:i:s a T");
   $ip = getenv ("REMOTE_ADDR");
   $requri = getenv ("REQUEST_URI");
   $servname = getenv ("SERVER_NAME");
   $pageload = $ip . " tried to load http://" . $servname . $requri ;
   $httpagent = getenv ("HTTP_USER_AGENT");
   $httpref = getenv ("HTTP_REFERER");
   $message = "$today \n\n$pageload \n\nUser Agent = $httpagent \n\n$httpref ";
   mail($to, $subject, $message, $from_header);
?>
</body>
</html>

The resulting email will identify the date and time the request occurred, what IP address requested it, which page they requested, what client they used and any referrer and will look something like this:

Thu Aug 18 2005 03:34:18 pm MST

127.0.0.1 tried to load http://127.0.0.1/404.php

User Agent = Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.10) Gecko/20050716 Firefox/1.0.6

And the last thing you need to do to utilize your custom page in Apache is to modify your http.conf, found by default on RedHat Enterprise Linux in /etc/httpd/conf/. After the commented out ErrorDocument entries add:

ErrorDocument 404 /404.php