How do you keep track of the sites your employees are accessing? How do you know that they're not accessing objectionable sites on the Net and wasting your precious bandwidth? Do you know what time of the day your bandwidth is utilized maximum? And when is it used least?
While you can use some bandwidth management tool to tell you all this, but did you know that the log files on your servers capture all this information, and more? For instance, they can even tell you what browser clients are used the most to access your website, the unique users who visited your website in last 10 hours, and from which country. All this sounds exciting but the problem is that log files are nothing but long, unending and cryptic lines of text- making it very difficult to read them.
|
Moreover, every application creates its own log files. Most administrators, therefore, prefer a commercial package that can give them all the information they need. Here, we'll resolve your problem of reading logs in two easy ways. We'll talk about how to set up your own log server so that all the logs are in one place. Then, we'll go a step further and tell you how to create reports in whatever format you like (pdf, html, etc) to interpret these logs. We'll do all this using some very efficient Linux-based open-source software. We have not selected these software just because they're free but because of their tremendous capabilities. So let's get into some action.
Getting started
To begin with, you have to decide which servers to fetch the logs from, to your centralized log server. You may think that to get all the logs to a centralized server you will have to do lot of configuration on all your servers like configuring rsync or syslogd. But all you have to do is create a folder structure in your log server, which has sub-directories with the names of all the servers from where you want to poll logs and then another level of sub-folder for different type of logs. For example, if you want to poll proxy and iptables logs from your gateway server, then you will have directory structure like this '/logdump/gateway/proxy' and /logdum/gateway/iptables'. Now what you have to do is to automate the copying process so that all the log files get copied into your server from everywhere in a fix period of time. For this, go to the following folder and make a shell script for each server
#cd /etc/cron.hourly
And make a shell script like this
#/!bin/bash
scp —B —i
/logdump/gateway/proxy.
|
And save the file with a name 'gateway_proxy_log.sh'. Here you have to do one more thing-that is, to let the '-B' switch of scp work properly. You have to create an RSA public key for a user and replicate the identity file to all the servers under
/home/
#ssh-keygen —t rsa
Now create a shell script for each server from where you want to fetch the logs. After you are done with this, you will have hourly updateable logs from all the servers deposited centrally in one place.
You can use rsync or any other polling mechanism for this job. The method we have mentioned above is the one that we applied in our office and it is working well for us. To understand how to use rsync for this, you can visit
https://www.pcquest.com/ content/search/showarticle.asp?artid=47999 .
While working on this story what we figured out was that, the unique selling proposition of setting up a log server is security, but if we are configuring rsync or syslogd on top of the server, then it is quite possible that in case of someone compromising your server, he can very easily get to know where your log server is and then attack it. So the best log server will be something that leaves minimum trace of it on the server from where it is fetching the logs. We called it stealth log server. So instead of using the standard rsync and syslogd techniques we used scp (secure copy) to poll all the logs to the centralized server.
Start analyzing
Now comes the important part of analyzing and presenting the logs in whatever format and style you want. For this, the first thing you will need is to understand what type of log you are going to analyze. After that you have to select which software you want to use to analyze them. We will be talking about three different log-analyzing software, namely webalizer, awstats and lire and understand which one is best suited for what kind of environment.
Webalizer
The easiest one to install and configure is webalizer. In fact you do not even need to install it. Most distributions today have this installed as default and if you are running a Web server on top of a recent distribution, then all you have to do is to visit
http://localhost/usage and you will get the log of the Web server.
|
But this software has a limitation. It works only for Web, FTP and proxy logs. So only Apache and Squid will work in this case. This software is by default configured for reading Apache logs from /var/log/httpd. So if you want it to analyze your proxy (Squid) logs from some other location, you have to first open the file '/etc/webalizer.conf'. Now search for the keyword 'LogFile' and change the value of it to the path where your logs exist. In the example we have taken, the log path is /logdump/gateway/squid/access.log. Now search for another keyword 'LogType' and change the value from 'clf' to 'squid'. Then it will start analyzing the logs.
Awstats
This is another Web-based analyzing tool meant for Web and mail log watching. Installing it is also very easy. All you have to do is get the awstats-6.5.tar.gz from this month's PCQEssential CD and install it by issuing the following command.
#tar —zxvf awstats-6.5.tar.gz
#cd awstst-6.5
# ./configure
#make && make install
To configure it, create a configure file having all the details about the location and type of your logs. For that go to '/etc/awstats' and copy the file awstats.module.conf to a new file with a unique name for the server whose logs you want to analyze (say, awstats.squid1.conf). Open this file and edit the 'LogFile' variable and specify the location of the log that you want to monitor. To change the log type, change 'LogType' variable with the values given below.
W --- Web Log file
S --- Streaming Log file
M --- Mail Log file
F --- Ftp log file
|
Now to access the report open up a browser and enter the address 'http://localhost/awstats/awstats.pl?config=filename'. Here replace filename with the middle part of the configure file's name. For instance, if the configure file's name is awstats.squid1.conf, then here the link will be 'http://localhost/awstats/awstats.pl?config=squid1'. The best thing about this software is that you can create as many configuration files as you want for analyzing as many server logs. And all you have to do it is to create different configuration files for them.
Lire
Lire gives compressive reports for 39 different log types and can give you outputs in eight different formats. Installing it is not very difficult if you have the 'how-to' instructions for it. Let's proceed with lire installation. Get all the necessary components from this month's PCQEssential CD. Lire has a dependency of DBI perl Module, DBD::SQLite2 perl module, Curses and Curses-UI, libntk-perl and ploticus. You have to install all these before you install Lire. So, install one-by-one all the .tar.gz files inside the Lire folder in the PCQEssential CD. Just unpack the packages and run the make command set to compile. While installing, remember that you have to install Curses before installing Curses-UI. Other packages can be installed in any order. Now install the last two rpm files available-Ploticus and the main application Lire. Here remember one thing that if you install Lire before Ploticus, you won't get any graphs in the Lire reports. So first install ploticus-2.31-1.i686.rpm and then install lire-2.0.1-1.noarch.rpm.
|
Using Lire
Those who want to monitor different logs in different formats, issue the following command:
#/usr/local/bin/lr_log2report --output
You can replace
#/usr/local/bin/lr_log2report --output pdf squid_access /logdump/gateway/squid/access.log
/logreports/squid1
Anindya Roy