I have never been enthusiastic about using SiteMeter to monitor my blog. It provides some real time statistics, but not very accurate ones. I will keep metering with it nonetheless because it is free and monitoring my blog is one of my favorite ways to waste time.
Every hit to this blog is recorded in the web server log. I thought it might be interesting to expose relevant page requests in my web server log in real time as a “Recent Visitors” application. The content of the actual web server log is, unfortunately, nothing but a lot of plain text, and ugly text at that. It is not easy to turn it into anything meaningful to an end user. Fortunately, there are many programs out there that will slice and dice web server logs. One of them (Awstats) comes free from my web host. If you have a web site, you probably have access to a similar program. Unfortunately, the public cannot generally access the statistics these programs create. Moreover, these programs are typically run only once a day. This makes it hard to see recent page requests in real time.
I noticed that SiteMeter can discern with some reasonable accuracy the geographical location of most people hitting my blog. It uses geo-location technology owned by MaxMind to translate your Internet Protocol (IP) address to a geographical location. To me this software is magical stuff. With some time on my hands this weekend, I put together a little geeky “Recent Visitors” application for this blog. It provides some real time visibility into what visitors are reading, where they are reading it from, and when they read it.
While I cannot afford the commercial version of MaxMind’s geo-location technology, there is also a “good enough” version called GeoLite City that is free for the download. MaxMind also publishes a variety of Application Programming Interfaces (APIs) that let programmers query their city database using the programming language of their choice. Since I need to put their database on my web server, I needed a programming language that operates on my web server. Since PHP is the easiest for me to program on the server side, I chose their PHP API.
My blog gets thousands of hits a day, most of which are not meaningful. I want to know what recent blog entries were read successfully. To do this I had to translate the URL requested into the corresponding blog entry title. This is not an easy thing to do. To translate something like “/2004/06/life_in_the_cou.html” to a human readable blog entry name took some work. It required parsing out the relevant portion of the resource name (“life_in_the_cou”) then querying the MySQL database hosting my blog. I searched for this file name in the mt_entries table and returned the entry title, which was “Life in the Courtyard”. From this I could both show what was being read and link to it.
To show only relevant content, I had to filter out the obvious noise in the web server log such as robots, crawlers, requests for images and dynamic pages, “file not found” errors and related web server error codes. After a lot of playing around, it worked!
Any geeks out there can look at my PHP source code (dead link removed). It is currently ugly code and it will be cleaned up in time. It assumes your Apache web log format is readable, and that its format matches mine. It also assumes you have MovableType weblog that stores content in a MySQL database. However, this application demonstrates that even a MovableType weblog can expose its visitor information in real time, albeit in a somewhat jury-rigged manner.
Right now, I am exposing the Recent Visitor’s Log (link removed) on a separate page. Eventually I intend to integrate the information into my Main Index page. To do this I will have to embed a window inside the Main Index web page. It appears that MovableType does not allow embedded CGI applications inside a page. Perhaps they will support this in a future version.
No, I am not offering any support for this code if you choose to borrow it. If you use it, you will likely have to do quite a bit of tweaking, so you best know PHP pretty darn well. However, as long as I am maintaining it I will publish the latest source code at this URL (link removed) for those who are interested.
Leave a Reply