Electric Type

Multimedia

About Us

News

Help

Who's Linking to You?
by Jeff Burchell 22 Nov 1996

Jeff Burchell Jeff Burchell is HotWired's resident Unix guy. It's rarely his fault.

Page 1

Q:  Is there any way to tell what other sites are linking to my pages?
- Curious in Castro Valley

Search engines

To see which pages in the HotBot database are linked to www.eff.org, you just have to type link:www.eff.org into the search box. It's that easy.

Referer logs

A referer log keeps track of what page a user was reading immediately before coming to your site. Usually, this means that there's a link to your site from that page. Most Web servers keep referer logs, though the log's exact syntax varies from server to server. (Editor's note: Apparently, the engineer who coined the phrase "referer log" didn't know how to spell it.)

I'm going to explain the referer log generated by the default logging module of Apache, the server software we use at HotWired.

A referer log looks like this:

http://www.blah.com/index.html -> /story/index.html

http://www.svelt.com/burn/ -> /icns/wow.gif

http://www.mom.com/ippy/ -> /index.html

http://www.meep.com/trash/ -> /so/cool.html

That's nice, but what does it mean?

The syntax of a referer log reads like this:

    <pointing page> -> <page pointed to>

So, http://www.svelt.com/burn/ -> /icns/wow.gif means there's a link on the page http://www.svelt.com/burn/ that points to /icns/wow.gif.

So, what are some neat tricks I can do?

Well, if you have access to the referer log, then you probably have access to a Unix box, with its array of text utilities. Here are a couple of common referer-log munging techniques. Each of these is a command (or several commands piped together) that should be typed on a Unix command line.

For the purposes of this demonstration, assume the referer log's filename is ref_log.

What pages link to me?

The command: sort ref_log | cut -d- -f 1 | uniq will return a list of every site mentioned in your referer log.

What it does:

    sort ref_log
    Alphabetizes the list (needed for uniq, later).

    cut -d- -f 1
    Drops everything after the -> in the log, so you just get a list of who is linking to you, and not the pages they're linking to.

    uniq
    Unique. Deletes duplicate lines from an already sorted list.

How many times has someone been referred from a particular site?

The command: grep www\.meep\.net ref_log | wc -l will return the number of times the site www.meep.net appears in your referer log.

What it does:

    grep www\.meep\.net ref_log
    picks out lines in the file that contain "www.meep.net" (you need to put a backslash in front of the "." character in grep).

    wc -l
    Counts the number of lines (one hit = one line).

Who is linking to a page other than /index.html, and where are they linking to?

The command: grep -v \ /index.html$ | sort | uniq | less

What it does:

    grep -v \ /index.html$
    Get lines that don't end in "index.html." The v means get lines that don't match. The $ means carriage return, indicating that you're only looking at the ends of lines.

    sort | uniq
    Put the list in order, and throw away duplicates.

    less
    Look at the list one page at a time.

That's just the beginning of what you can do to manipulate your referer log. Combinations of these commands can be used to produce almost any kind of output.


Dynamic HTML  

Frames  

HTML Basics  

Stylesheets  

Tables  

XML  

Javascript  

Database Connections  

Intro To Perl  

HTML 4.0  

User Blogs

Screen Shots

Latest Updates

Contact Us

Valid HTML 4.01!
Valid CSS!

Breadcrumb

© ElectricType
Maintained by My-Hosts.com
Site map | Copyright | Disclaimer
Privacy policy | Acceptable Use Policy
Legal information.