Malware scanning of web directories with OWASP WebMalwareScanner

Scanning website directories

One of the recent incidents I had to handle involved a compromised webhost. This allowed me to do some Exploring webshells on a WordPress site. In the aftermath of the investigation I searched for tools that could have improved my tasks (evaluating which files might have been compromised).

One of the approaches I had in mind was take a hash of every file and then verify that hash with Virustotal. This would have worked in theory but in practice most of the malicious web code that gets installed gets tuned a little bit to the attacker likening. A minor change but enough to alter the resulting hash and so making verification with Virustotal impossible. Another approach would have been to upload every single file to Virustotal for scanning and awaiting the results. The dataset I had to verify contained thousands of files so this approach was not really feasible.

But why stop here? Most of the malicious files contain some “strings” than can be identified by signatures. This is something similar as the way virus scanners work on end users desktops. As it turns out, there’s an OWASP project that does just that.


The WebMalwareScanner is a Python script that scans a set of files for known signatures (including Yara rulesets) and returns a report of its findings.

Install WebMalwareScanner

The installation of WebMalwareScanner requires a number of Python packages. Note that I’m not going to use the GUI included by WebMalwareScanner, I use the command line interface and output. This post is also tuned for installing on Ubuntu (14).

First we have to install wxPython

sudo apt-get install python-wxgtk2.8

Then we need CEF Python. The download instructions can be found at which basically requires you to download the .deb file (I’m using Ubuntu) from a Dropbox share.

mv python-cefpython3_31.2-1_amd64.deb\?dl\=0 python-cefpython3_31.2-1_amd64.deb
sudo dpkg -i python-cefpython3_31.2-1_amd64.deb
sudo apt-get -f install
sudo dpkg -i python-cefpython3_31.2-1_amd64.deb

Notice the apt-get -f install. This will install all the missing dependencies for CEF Python.

Because the scanner relies on signatures and some of these signatures are Yara rules we also have to install Yara.

sudo apt-get install yara

Once this is done we have to get the code for WebMalwareScanner from Github.

git clone

This is all that’s necessary to get the scanner installed on Ubuntu. Note that this is without the GUI.

Scan a web directory for malicious files

Of course, the first thing you’d like to do is scan a directory content for possible malicious files. This is done by invoking the Python script

python /mnt/hgfs/htdocs /data/reports/htdocs

Depending on the size of the directory the scan might take a while but output should look something similar to this.

>> Starting OWASP Web Malware Scanner version 1.0...
>> Loading signature database... (100%)
>> Loaded 577813 malware hash signatures.
>> Loaded 426 YARA ruleset databases.
>> Scanning /mnt/hgfs/htdocs for malwares... (100%)
>> Scanning /mnt/hgfs/htdocs for insecure permissions... (100%)

The output of the scan will result in a text file containing entries of files that require manual verification. A sample output is something similar to

[2016-09-06 23:10:12] Starting OWASP Web Malware Scanner version 1.0...
[2016-09-06 23:10:17] Loaded 577813 malware hash signatures.
[2016-09-06 23:10:17] Loaded 426 YARA ruleset databases.
[2016-09-06 23:17:50] Scan result for file /mnt/hgfs/htdocs/administrator/components/com_admin/models/help.php : misc shells

[2016-09-06 23:17:50] Scan result for file /mnt/hgfs/htdocs/libraries/vendor/leafo/lessphp/lessify : PM Email Sent By PHP Script

[2016-09-06 23:17:50] Scan result for file /mnt/hgfs/htdocs/templates/t3_blank/less/themes/dark/variables-custom.less : CRDF.Malware-Generic.1592130909

Lots of hits

When I used the OWASP Web Malware Scanner I received a lot of hits on the scanned files.

A majority of these hits were false positives. The directories that I scanned included for example phpmyadmin (a well known mysql web administration tool). Although it’s normal that the features of phpmyadmin set of some alarms, the amount of alarms generated by the scanner was high, to the point of becoming useless. Of course this isn’t because of the scanner itself, but more because of the signature rules it relied on.

One of the changes that I did was tweaking the ruleset. First of, I’m scanning web directories, primarily used by popular CMS systems. I don’t need any Android rules to trigger. Starting from the WebMalwareScanner root directory you can remove the Android rules.

rm signatures/rules/Android*

Then one of the other rules that was generating a lot of noise was the Sanesecurity_Spam_5892. I removed this by deleting the rule.

vi rules/scam.yar

  rule Sanesecurity_Spam_5892
        $a0 = { 20736f66747761726520 }


Removing these rules gave me a set of hits that were much more sane and eased further manual processing. Of course there’s always a risk, removing a rule can make you miss just that one file. Be cautious about this.


The WebMalwareScanner project from OWASP is promising in feature set but it also fails where some virus scanners fails : wrong (or more appropriate, non-relevant) signatures.

Personally I don’t think you’ll get a lot of useful (in the sense of actionable) results by using the scanner with the default set of signatures. The scanner becomes very useful though if you give if it your own set of rules. If you write your own set of Yara rules (or get them from a threat intelligence feed) and then scan the directories you’re interested in for these rules you will get very usable results.

One of the most interesting sources that you can use to get hashes (and code) for PHP shells (the type of malware that typically get left behind on your system after a break-in) is a Github repository : Ideally you tune your Yara rules based on the scripts found in this directory.

2 thoughts on “Malware scanning of web directories with OWASP WebMalwareScanner

  1. Hey there,

    I’m glad to have found this article, it provides me with some nice feedback about the project.

    You are right about how the rules are yeilding too many false positives, I will try to clean up the yara rules in the next few days.

    I will also try to generate new yara rules based on the bartblaze repository of backdoors you linked, this should get some better results out of the scanner.



  2. Allowing to create baselines from web presences one manages in order to compare them whenever a problem occurs could be very helpful.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.