Next time you fire up your FTP client or log in to your Web server, take a moment and dig around for your log files. On most Web servers, you will find a directory usually in your root directory, or the parent directory just above it named "logs" or "stats". Inside, you will most likely see a file with a .log, .web, or .clf extension. Since Web logs are essentially text files, some will even have a .txt file extension. Download the log file, save it to a local drive, and have a look.
Most servers generate CLF (Common
Log Format) files, but they also come in other flavors, like ELF (Extended Log Format) and DLF (Combined Log Format). Some servers produce files with different extensions in different formats, but most of the log file types out there are formatted much like CLF files. For this reason, we'll use the structure of a CLF file for our example.
In Common
Log Format files, each line represents one request. So if a user comes to
your site and is served a page with three images, it shows up as four
lines of text in your CLF file one request each for the three images and
one request for the HTML file itself.
CLF files are standardized, so they almost always look the same. A normal
CLF file logs the data in this format:
user's computer ident userID [date and time] "requested file" status
filesize
The fields are separated by spaces. Some fields, such as the date and
request information, are defined with punctuation. If any of the fields are
non-existent during the session logged, the server puts a hyphen in the
place of the non-active field. Let's look at these fields one by one.
- The remote host information shows the IP address and, in some cases, the
domain name of the client computer requesting the file.
- The ident information is logged if your server is running IdentityCheck,
an antiquated directive that was once used for thorough server logging. It
was phased out of general use because it required the identification process
to run every time a file is served. Because this process can sometimes take
5 or 10 seconds, most sites turn IdentityCheck off so that their pages load
more quickly.
- If your site requires a password upon login, the userID that the user
entered is logged in this field. If you don't have any user login features
on your site, this field is no big deal.
- The date field is straightforward the date and time of the request is
logged here.
- The request field logs the type of request made by the user, as well as
the path and name of the requested file.
- The status field contains a three-digit code that tells you if the file
was transferred successfully or not. These codes are standard HTTP codes.
- The filesize field is also straightforward it lists the number of
bytes transfered when the requested file was served.
For the following example, I've extracted one line from a log file that
records the activity on my own personal website, snackfight.com. My hosting
company serves my site using Apache, and they've tweaked a few options to
provide me with more comprehensive data. (Apache's mod_log_config module allows you to customize the string
that's fed into the logs.) I've divided this logged request into its separate
parts for clarity normally, all of this data would be dumped onto one
single line in the log file.
adsl-63-183-164.ilm.bellsouth.net - - [09/May/2001:13:42:07 -0700]
"GET /about.htm HTTP/1.1" 200 3741
"http://www.e-angelica.com"
"Mozilla/4.0 (compatible; MSIE 5.0; Windows 98)"
The first part of the request shows the user's local domain. I can see
that this is a DSL subscriber on the BellSouth network. The two hyphens that
follow are where the IndentityCheck and UserID information would normally
show up, but since my site does not utilize either of these processes, I get
nothing but hyphens. Next, in brackets, is the date, then the time (in 24-hour format), followed by the user's time zone code.
The request field, displayed within quotes, shows that the user asked the
server to GET a page. Other request types are POST, DELETE, and HEAD, though you don't see those nearly as often. Following the request type is the path and
name of the file. In this case, the user was requesting the "about.htm" file
in the root directory of snackfight.com. Also, you can see that the protocol
used here was the good old Hypertext Transfer Protocol, version 1.1.
The status field shows a status code of 200, meaning that everything went
through just peachy. A status code of 404, as you may know, means that the
file was not found on the server. Immediately following the status code is
the file size of "about.htm". It's 3,741 bytes. Hey, not bad! I'll bet it
loaded nice and quick.
Referers
The next two fields are especially interesting. These are custom fields
that my hosting company has added to its logging so that I can get a better
idea of who's visiting my site. The first field, in quotes, is the referer
field. This is where my user clicked on a link in order to arrive at the
page he was just served. I can see that this particular user is a fan of the
e-angelica site,
because that's where he came from to arrive at my site. In some cases,
referers are logged in their own log file. These referer logs usually use
the same format and can also be viewed or run through an analyzer. For the
full skinny on referer logs, check out Jeff's article.
The last field, also in quotes, shows some information about the user's
browser and platform, in this case, Internet Explorer 5.0 on a
Windows 98 machine. Oh, how original!
And that's about it! It's a lot of information, I know, and your log file may store even more goodies (an in-depth explanation of the syntax used in log files can be found in
the massive spec for HTTP 1.1, which is also useful as a
reference when looking up header fields and server status codes.)
All this data is little overwhelming, no? Especially in its raw state.
If you're not exactly thrilled about the idea of picking through thousands of lines of text and status
codes to determine whether or not your users are being served in the most
efficient manner, there are several software packages
on the market that you can use to generate reports without getting your
hands dirty (and without opening your text editor). But which one is right for you? Well, that all depends on what you're looking for.
next page»