Parse logfiles for entries from IP lists

I sometimes have to parse log files for different IP addresses and then group them by network owner. This becomes tedious If the number of IP addresses is rather long. The script below can help with automating this manual task.

It reads a log file and looks for a match based on keys in an iplist. Afterwards the result is summarized and grouped by a specified field. For example, say you have the log file

192.168.1.1 - - [1/Apr/2010:1:1:39 +0200] "GET /favicon.ico HTTP/1.1"
192.168.1.3 - - [1/Apr/2010:1:1:39 +0200] "GET /favicon.ico HTTP/1.1"
192.168.1.1 - - [1/Apr/2010:1:1:39 +0200] "GET /favicon.ico HTTP/1.1"
192.168.1.2 - - [1/Apr/2010:1:1:39 +0200] "GET /favicon.ico HTTP/1.1"
192.168.1.3 - - [1/Apr/2010:1:1:39 +0200] "GET /favicon.ico HTTP/1.1"
192.168.1.2 - - [1/Apr/2010:1:1:39 +0200] "GET /favicon.ico HTTP/1.1"
192.168.1.3 - - [1/Apr/2010:1:1:39 +0200] "GET /favicon.ico HTTP/1.1"

and you would like to have all the entries for IPs 192.168.1.2 and 192.168.1.3. Instead of grepping the content for every IP manually you can use the script below. Put all the IPs in an iplist similar to this

1234 | 192.168.1.1 | MyNet
4567 | 192.168.1.2 | MyNet
8901 | 192.168.1.3 | MyNet
2345 | 192.168.1.4 | MyNet

<?php
/**
 * 
 * Parse a log file and group by entries from another file
 *
 * This script reads a log file and then groups the entries
 * according to keys found in an iplist
 * There's no input validation so make sure neither the 
 * log file or iplist contain malicious code
 *
 * This script is useful if you want to group log file entries
 * based on AS number or network name.
 *
 * 		Koen Van Impe				cudeso.be
 *		20100525
 *
 **/

// Configuration array
$config = array(	// file containing the IPs
					"iplist" => "BE.txt",
					// logfile with the individual entries
					"logfile" => "Log_BE.txt",
					// what field to use as a separator in iplist
					"separator" => "|",
					// position of the IP (0-based)
					"ippos" => 1,
					// position of the groupby field (0-based)
					"groupby" => 0,
					// newline after a logfile
					"newline" => false
				);
				
// Array for the resultset
$result = array();
$matchcount = 0;

// walk through the IP list
if (file_exists($config["iplist"])) {
	$file_handle = fopen($config["iplist"], "r");
	while (!feof($file_handle)) {
		$fields = explode("|", fgets($file_handle));
		$key = (string) trim($fields[$config["groupby"]]);
		if (strlen($key) > 0) {
			$data = trim($fields[$config["ippos"]]);
			$result[$key][] =  $data;
		}
	}
	fclose($file_handle);
	
	// read the log file
	if ((file_exists($config["logfile"])) && count($result) > 0) {
		$logfile = file($config["logfile"]);

		echo "Parsing ".$config["logfile"]."n".
				"for matches in ".$config["iplist"]."n".
				"on field pos #".$config["ippos"]."n".
				"group by field pos #".$config["groupby"]."nnn";
		// walk through the resultset; scan the
		// log file for every entry
		// three foreachs ... optimization 
		foreach ($result as $key => $value) {
			echo "n******************n$keyn******************n";
			foreach ($logfile as $line) {
				foreach ($value as $match) {
					// is position 0 and is not BOOLEAN 
					if ((strpos($line, $match) === 0) or
					// position bigger than 0
						(strpos($line, $match) > 0)) {
							
							// we have a match
							echo "$line";
							if ($config["newline"]) echo "n";
							$matchcount++;
					}
					else $misscount++;
				}
			}
			echo "nnnn";
		}
		
		echo "nn$matchcount relevant entries found in ".$config["logfile"];
	}
}


?>

Phishing notice from Deutsche Bank

A couple of days back I received an e-mail from Deutsche Bank. I’m not a customer from DB. About a year ago I applied for some information and I guess my email addresses ended up in their mailinglist.

The mailing warns customers that there is a phishing attack ongoing. According to the mail, once infected, a virus on your computer lures you to a fake page where you are asked to enter your details.

So far so good. It seems like a good practice that banks try to warn their customers.

The mail contains a couple of links that should point you to sites that allow you to check if you are infected or not. Unfortunately the links point to another website. That website seems to have nothing to do with DB. It is a website for a “relationship marketing suite”. It is understandable that DB uses an external company to handle their mailings but I don’t get it … The message to their customers is “be on your guards” and then they ask you to click on a link that has nothing to do with DB?