Web Server Logs Analyze Using the XML Technology

 

 

Author: Tayeb L.
E-mail: Tayeb.Lemlouma@inrialpes.fr
July 2002.

 

source in PDF
PDF

 

We introduce in this report an approach to write and analyze server logs using the XML technology. The approach consists to transform the server log to an XML structure that we have defined and then to apply the log analyze. The analyze is done using XSLT and allows to have a clear idea about the server log in the form of a valid HTML page generated from the XML log file.

How it works?

The principle is very simple, it works as follows:

- First we transform, using a Java program, the textual format of the server log to an XML file. The supported textual format of the server log is the one compatible with the Apache 1.3.20 log format. This last is simple, it includes a set of string lines where each line corresponds to a visitor hit. Each line includes: the visitor IP address, the time and the date of the visit, the client request, the status code of the server reply, the file size of the requested content, the referrer URL, and the user agent type. In the following we give an example of an Apache 1.3.20 server log:

  193.105.113.102 - - [01/Jul/2002:17:31:17 +0200] "GET /people/Tayeb.Lemlouma/Papers/Programmation%20logique%20avec%20contraintes.pdf HTTP/1.1" 206 1024 "-" "Mozilla/4.0 (compatible; MSIE 5.01; Windows NT)"
193.105.113.102 - - [01/Jul/2002:17:31:18 +0200] "GET /people/Tayeb.Lemlouma/Papers/Programmation%20logique%20avec%20contraintes.pdf HTTP/1.1" 206 2395 "-" "Mozilla/4.0 (compatible; MSIE 5.01; Windows NT)"
193.105.113.102 - - [01/Jul/2002:17:31:19 +0200] "GET /people/Tayeb.Lemlouma/Papers/Programmation%20logique%20avec%20contraintes.pdf HTTP/1.1" 206 66568 "-" "Mozilla/4.0 (compatible; MSIE 5.01; Windows NT)"
208.13.106.20 - - [01/Jul/2002:17:40:56 +0200] "GET /people/Tayeb.Lemlouma/MULTIMEDIA/CCPP/UPS-Package/UPSProfiles.html HTTP/1.0" 200 3332 "-" "Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 5.0) Fetch API Request"
80.15.59.139 - - [01/Jul/2002:17:41:07 +0200] "GET /people/Tayeb.Lemlouma/Papers/AdHoc_Presentation.pdf HTTP/1.1" 200 18535 "http://www.google.fr/search?q=%22applications+militaires%22+fr%C3%A9quence&hl=fr&lr=&ie=UTF-8&oe=UTF8&start=20&sa=N" "Mozilla/4.0 (compatible; MSIE 6.0; Windows 98)"
64.51.19.178 - - [01/Jul/2002:18:09:15 +0200] "GET /people/Tayeb.Lemlouma/MULTIMEDIA/CCPP/UPS-Package/UPSProfiles.html HTTP/1.0" 200 3332 "-" "Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 5.0) Fetch API Request"
64.51.19.178 - - [01/Jul/2002:18:19:08 +0200] "GET /people/Tayeb.Lemlouma/NegotiationSchema/index.htm HTTP/1.0" 304 - "-" "Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 5.0) Fetch API Request"
 
Figure 1. An example of a server log file

In order to avoid the size explosion of the generated XML, we have chosen a simple format that contains only the required information. In the following we give an example of the generated XML from the precedent log file:

  <?xml version="1.0"?>
<ServerLog>
<Visitor IP="193.105.113.102" accessDate="01/Jul/2002:17:31:17 +0200" request="GET /people/Tayeb.Lemlouma/Papers/Programmation%20logique%20avec%20contraintes.pdf HTTP/1.1" statusCode="206" fileSize="1024" referrer="-" userAgent="Mozilla/4.0 (compatible; MSIE 5.01; Windows NT)" />
<Visitor IP="193.105.113.102" accessDate="01/Jul/2002:17:31:18 +0200" request="GET /people/Tayeb.Lemlouma/Papers/Programmation%20logique%20avec%20contraintes.pdf HTTP/1.1" statusCode="206" fileSize="2395" referrer="-" userAgent="Mozilla/4.0 (compatible; MSIE 5.01; Windows NT)" />
<Visitor IP="193.105.113.102" accessDate="01/Jul/2002:17:31:19 +0200" request="GET /people/Tayeb.Lemlouma/Papers/Programmation%20logique%20avec%20contraintes.pdf HTTP/1.1" statusCode="206" fileSize="66568" referrer="-" userAgent="Mozilla/4.0 (compatible; MSIE 5.01; Windows NT)" />
<Visitor IP="208.13.106.20" accessDate="01/Jul/2002:17:40:56 +0200" request="GET /people/Tayeb.Lemlouma/MULTIMEDIA/CCPP/UPS-Package/UPSProfiles.html HTTP/1.0" statusCode="200" fileSize="3332" referrer="-" userAgent="Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 5.0) Fetch API Request" />
<Visitor IP="80.15.59.139" accessDate="01/Jul/2002:17:41:07 +0200" request="GET /people/Tayeb.Lemlouma/Papers/AdHoc_Presentation.pdf HTTP/1.1" statusCode="200" fileSize="18535" referrer="http://www.google.fr/search?q=%22applications+militaires%22+fr%C3%A9quence&amp;hl=fr&amp;lr=&amp;ie=UTF-8&amp;oe=UTF8&amp;start=20&amp;sa=N" userAgent="Mozilla/4.0 (compatible; MSIE 6.0; Windows 98)" />
<Visitor IP="64.51.19.178" accessDate="01/Jul/2002:18:09:15 +0200" request="GET /people/Tayeb.Lemlouma/MULTIMEDIA/CCPP/UPS-Package/UPSProfiles.html HTTP/1.0" statusCode="200" fileSize="3332" referrer="-" userAgent="Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 5.0) Fetch API Request" />
<Visitor IP="64.51.19.178" accessDate="01/Jul/2002:18:19:08 +0200" request="GET /people/Tayeb.Lemlouma/NegotiationSchema/index.htm HTTP/1.0" statusCode="304" fileSize="-" referrer="-" userAgent="Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 5.0) Fetch API Request" />
</ServerLog>
 
Figure 2. The generation of the XML log file

- After transforming the server log file to XML, we use the XSLT language to analyze the XML content. One of the proposed processing is to organize the log information in elements concerning each visitor with giving its IP address, the number of hits, the date of the first access, the first visit server resource and the referrer visitor URL. The following XML file represents an example of the generated XML form:

  <?xml version="1.0" encoding="UTF-8"?>
<analysResult> <accessNumber>0</accessNumber>
<totalAccessNumber>7</totalAccessNumber> <VisitorIP>193.105.113.102</VisitorIP>
<VisitorAccessNumber>3</VisitorAccessNumber>
<firstAccessDate>01/Jul/2002:17:31:17 +0200</firstAccessDate>
<firstVisitorRequest>GET /people/Tayeb.Lemlouma/Papers/Programmation%20logique%20avec%20contraintes.pdf HTTP/1.1</firstVisitorRequest>
<referrer>-</referrer> <VisitorIP>208.13.106.20</VisitorIP>
<VisitorAccessNumber>1</VisitorAccessNumber>
<firstAccessDate>01/Jul/2002:17:40:56 +0200</firstAccessDate>
<firstVisitorRequest>GET /people/Tayeb.Lemlouma/MULTIMEDIA/CCPP/UPS-Package/UPSProfiles.html HTTP/1.0</firstVisitorRequest>
<referrer>-</referrer> <VisitorIP>80.15.59.139</VisitorIP>
<VisitorAccessNumber>1</VisitorAccessNumber>
<firstAccessDate>01/Jul/2002:17:41:07 +0200</firstAccessDate>
<firstVisitorRequest>GET /people/Tayeb.Lemlouma/Papers/AdHoc_Presentation.pdf HTTP/1.1</firstVisitorRequest>
<referrer>http://www.google.fr/search?q=%22applications+militaires%22+fr%C3%A9quence&amp;hl=fr&amp;lr=&amp;ie=UTF-8&amp;oe=UTF8&amp;start=20&amp;sa=N</referrer> <VisitorIP>64.51.19.178</VisitorIP>
<VisitorAccessNumber>2</VisitorAccessNumber>
<firstAccessDate>01/Jul/2002:18:09:15 +0200</firstAccessDate>
<firstVisitorRequest>GET /people/Tayeb.Lemlouma/MULTIMEDIA/CCPP/UPS-Package/UPSProfiles.html HTTP/1.0</firstVisitorRequest>
<referrer>-</referrer> </analysResult>
 
Figure 3. An XSLT analyze of the server log file

The above XML form is generated from the XML log file (Figure 2) using the following XSLT style sheet:

  <?xml version="1.0" encoding="iso-8859-1"?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:template match="/">
<analysResult>
<xsl:text>&#xA;&#xA;</xsl:text>
<accessNumber><xsl:value-of select="count(ServerLog/Visitor[@IP='4.33.55.30'])" /></accessNumber>
<xsl:text>&#xA;</xsl:text>
<totalAccessNumber><xsl:value-of select="count(ServerLog/Visitor)" /></totalAccessNumber>
<xsl:text>&#xA;</xsl:text>
<xsl:for-each select="ServerLog/Visitor">
<xsl:variable name="value" select="@IP"/>
<xsl:if test="count(preceding::Visitor[@IP=$value]) = 0">
<xsl:text>&#xA;</xsl:text>
<VisitorIP><xsl:value-of select="@IP"/></VisitorIP>
<xsl:text>&#xA;</xsl:text>
<VisitorAccessNumber><xsl:value-of select="count(/ServerLog/Visitor[@IP=$value])"/></VisitorAccessNumber>
<xsl:text>&#xA;</xsl:text>
<firstAccessDate><xsl:value-of select="@accessDate"/></firstAccessDate>
<xsl:text>&#xA;</xsl:text>
<firstVisitorRequest><xsl:value-of select="@request"/></firstVisitorRequest>
<xsl:text>&#xA;</xsl:text>
<referrer><xsl:value-of select="@referrer"/></referrer>
<xsl:text>&#xA;</xsl:text>
</xsl:if>
</xsl:for-each>
<xsl:text>&#xA;</xsl:text>
</analysResult>
</xsl:template></xsl:stylesheet>
 
Figure 4. The XSLT style sheet used in the server log analyze

- An other possible transformation of the XML server log is to analyze the log and output the result in the form of an HTML page that can be easily visualized. The following figure shows a possible analyze of the XML log file in the form of an HTML page:

  <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<META http-equiv="Content-Type" content="text/html; charset=UTF-8">
<title>Web Analys Result</title>
<meta content="text/html; charset=iso-8859-1" http-equiv="Content-Type">
</head>
<body text="#000000" bgcolor="#FFFFFF">
<h1 align="left">
<b>Server Web analyze</b>
</h1>
<table border="0" width="97%">
<tr bgcolor="#FFFF00">
<td width="9%">
<div align="left">
<b><font face="Arial">Visitor IP address</font></b>
</div>
</td><td width="12%"><b><font face="Arial">Hits Number</font></b></td><td width="30%"><b><font face="Arial">Date of the First Access</font></b></td><td width="6%"><b><font face="Arial">First Request</font></b></td><td width="8%"><b><font face="Arial">Referrer</font></b></td>
</tr>
<tr bgcolor="#EFEFEF">
<td bgcolor="#EFEFEF" width="9%"><b><font size="2" face="Arial" color="#990000">193.105.113.102</font></b></td><td width="12%"><font size="2" face="Arial">3</font></td><td width="30%"><font size="2" face="Arial">01/Jul/2002:17:31:17 +0200</font></td><td width="6%"><font size="2" face="Arial">GET /people/Tayeb.Lemlouma/Papers/Programmation%20logique%20avec%20contraintes.pdf HTTP/1.1</font></td><td width="8%"><font size="2" face="Arial">-</font></td>
</tr>
<tr bgcolor="#EFEFEF">
<td bgcolor="#EFEFEF" width="9%"><b><font size="2" face="Arial" color="#990000">208.13.106.20</font></b></td><td width="12%"><font size="2" face="Arial">1</font></td><td width="30%"><font size="2" face="Arial">01/Jul/2002:17:40:56 +0200</font></td><td width="6%"><font size="2" face="Arial">GET /people/Tayeb.Lemlouma/MULTIMEDIA/CCPP/UPS-Package/UPSProfiles.html HTTP/1.0</font></td><td width="8%"><font size="2" face="Arial">-</font></td>
</tr>
<tr bgcolor="#EFEFEF">
<td bgcolor="#EFEFEF" width="9%"><b><font size="2" face="Arial" color="#990000">80.15.59.139</font></b></td><td width="12%"><font size="2" face="Arial">1</font></td><td width="30%"><font size="2" face="Arial">01/Jul/2002:17:41:07 +0200</font></td><td width="6%"><font size="2" face="Arial">GET /people/Tayeb.Lemlouma/Papers/AdHoc_Presentation.pdf HTTP/1.1</font></td><td width="8%"><font size="2" face="Arial">http://www.google.fr/search?q=%22applications+militaires%22+fr%C3%A9quence&amp;hl=fr&amp;lr=&amp;ie=UTF-8&amp;oe=UTF8&amp;start=20&amp;sa=N</font></td>
</tr>
<tr bgcolor="#EFEFEF">
<td bgcolor="#EFEFEF" width="9%"><b><font size="2" face="Arial" color="#990000">64.51.19.178</font></b></td><td width="12%"><font size="2" face="Arial">2</font></td><td width="30%"><font size="2" face="Arial">01/Jul/2002:18:09:15 +0200</font></td><td width="6%"><font size="2" face="Arial">GET /people/Tayeb.Lemlouma/MULTIMEDIA/CCPP/UPS-Package/UPSProfiles.html HTTP/1.0</font></td><td width="8%"><font size="2" face="Arial">-</font></td>
</tr>
</table>
<p>
<font size="2">analyze done using log2XML utility. <br>Author: Tayeb Lemlouma, <br>
Jully 2002.</font>
</p>
</body>
</html>
 
Figure 5. An HTML form of an XSLT analyze of the server log (HTML format)

How to run the application?

1- Download the different resources: Log2XML.class, loganalyzer.xsl, loganalyzer2.xsl
2- To transform the server log file, access_log.1, (which must be compatible with Apache/1.3.20 log format), run: java Log2XML access_log.1
3- The generated file 'output.xml' represents the XML server log. It can be so processed and used to do many analyzes:
4- To transform the XML server log to an XML file in the form of the figure 3, apply the XSLT style sheet: loganalyzer.xsl
5- To transform the XML server log to an HTML page in the form of the figure 4, apply the style sheet: loganalyzer2.xsl

Download

first: Log2XML.java The source code that transforms the server log (compatible with the Apache/1.3.20 log format) to XML
second: Log2XML.class The class file
third: loganalyzer.xsl The XSLT style sheet that transforms the XML log file to a generic XML forme
forth: loganalyzer2.xsl The XSLT style sheet that transforms the XML form to an HTML page

Valid XHTML 1.0!