Do Tor exit nodes alter your content? (or is Tor safer than Vodafone?)

Do Tor exit nodes alter your content?

The short answer : no, TOR exit nodes do not alter your content.

A recent post by @adrellias got my attention.


Twitter @adrellias

The link in the post refers to an article where a user spots a case of content (Javascript) injection by Vodafone. The details can be found in the blogpost Am I hacked? Oh, it’s just Vodafone. Needless to say this is very bad behavior by Vodafone.

Vodafone eavesdrops on your conversation, causing this to be a privacy issue. Also, the methodology used by Vodafone leads to a dangerous attack vector. If someone finds a way to alter the injected code this could lead to for example mass distribution of unwanted content, even malware.

Remember TalkTalk? ISPs are also targets of attackers.

The author of that post concludes with “In a little while we’ll all be on Tor.”.

I wanted to check if something similar happens on the Tor network. I wanted to verify if Tor exit nodes alter the HTML content.

Some remarks if you consider using Tor for daily, frequent internet use :

  • You do not control, neither can easily identify who manages the end-point. It would be a very bad idea to transmit credentials through the Tor network;
  • You have no control over the chosen end-point;
  • Similarly, you do not know who looks at your traffic. This can also be a privacy issue. See the Tor documentation. You can also refer to the Tor Legal FAQ : “Do not examine anyone’s communications without first talking to a lawyer.”;
  • Even if you use encryption, you have to be prudent to use Tor for accessing your bank account, I personally don’t use Tor for banking transactions. The transmission might be encrypted but you have to make sure that your requests go to the intended resource ;
  • My test covers only a subset of all of the available Tor exit nodes.

Interested in the graphs? Scroll down to Mapping the Tor exit nodes.

The Tor network

Tor protects you by bouncing your communications around a distributed network of relays run by volunteers all around the world: it prevents somebody watching your Internet connection from learning what sites you visit, and it prevents the sites you visit from learning your physical location (from https://www.torproject.org/).

I posted on the use of Tor before

The setup

Goal

The basic goal was to

  1. Setup a web page on a web server.
  2. Retrieve that web page through a Tor proxy connection.
  3. Obtain a new IP through Tor (basically getting a new identity).
  4. Retrieving the same HTML file.
  5. Then comparing the output.

Proxy and Tor setup

For this test I used my previously described setup with Privoxy and Tor on an Ubuntu system.

Test web page

The test web page was stored on a cloud hosted machine. I included some content that would make it more enticing for an intruder to look at the content. The page contained

  • meta data from Bank of America, Banco Bradesco and BNP Paribas Fortis;
  • a login form from Bank of America and Banco Bradesco;
  • some keywords referring to adult content


HTML page

Retrieve the web page

I then wrote a short bash script that restarts the Tor service (getting a new IP, there are better ways to accomplish this but it worked), sleeps a while (to make sure the Tor tunnel is started) and then retrieves the page. I used a fake user agent for more cloaking. Once the web page was retrieved I used curl to obtain the IP address of the Tor node.

This last step basically gives away that this is “unusual” internet-behavior (normally you do not use curl to get a web page). But because the HTML content was already downloaded I did not really care about this.

#!/bin/bash

export http_proxy="http://127.0.0.1:8118"

for (( ; ; ))
do
   sudo /etc/init.d/tor restart
   sleep 15
   FNAME=`hexdump -n 16 -v -e '/1 "%02X"' /dev/urandom`

   wget -a torget.log --inet4-only --no-cache --user-agent="Mozilla/5.0 (Windows NT 6.1; WOW64; rv:42.0) Gecko/20100101 Firefox/42.0" -O $FNAME http://REDACTED/mytrpt.html
   curl -s http://ifcfg.me |cut -d " " -f 5 > $FNAME.ip   
   sleep 3
done

The above resulted for every retrieval in two files : one file containing the HTML and one file (ending in .ip) containing the IP address.

I had this script run for a couple of hours and then had a look at the results.

Analyze the results

Scripting the hash

I wrote a second bash script to analyze the retrieved files. It removed all files with no content (size 0, this can be the result of for example an unavailable exit node), calculated the MD5 hash and then looked at how many unique MD5 hashes occurred.

#!/bin/bash

echo "Delete HTML files that return empty"
EMPTYFILES=$(ls -l | grep " 0 Dec" |grep -v .ip| awk '{print $9}')
for F in $EMPTYFILES
do
 rm $F
 rm $F.ip
done

echo "Delete HTML files that have no IP (curl-error?)"
EMPTYFILES=$(ls -l | grep " 0 Dec" |grep .ip| awk '{print $9}'| cut -d \. -f 1)
for F in $EMPTYFILES
do
 rm $F
 rm $F.ip
done

# Can do md5sum for a dir but more difficult to get only md5 on HTMLfiles
echo > md5sum.log
HTMLFILES=$(ls -l |  grep -v .ip | grep -v .sh | grep -v .log | awk '{print $9}')

echo "Running MD5"
for f in $HTMLFILES
do
 md5=`md5sum $f | awk '{print $1}'`
 echo $md5 >> md5sum.log
done

echo "Counting elements"
cat md5sum.log | grep . | wc -l

echo "Unique elements"
cat md5sum.log | grep . | sort | uniq | wc -l
echo "->"
cat md5sum.log | grep . | sort | uniq

Results

In total the HTML page was completely (meaning not empty) returned for a total of 1568 times. The resulting md5sum.log file had 1 unique entry.

Counting elements
1568
Unique elements
1
->
bda6944a104b0854b7c15a1906d7fdd5

Because all of the returned HTML content was identical the conclusion is that, at least for a basic HTML page, the Tor exit nodes do not alter the returned content.

This does not mean it does NOT happen. I only looked at a subset of exit nodes. The Tor Metrics page gives an overview of running Tor nodes. You can download the IP-list via http://torstatus.blutmagie.de/. My test also consisted of retrieving a single HTML page on a test domain. Pages coming from popular sources might render different results.

Mapping the Tor exit nodes

Because the test already gave me a list of Tor exit nodes I decided I could as well graph these results. I used a third script to run through the *.ip files, extract the IPs and enrich the file with info from Team Cymru. The request for the IP (via curl) failed on a couple of occasions, this resulted in HTML being present in the *.ip files. I detected these via the head-tag and removed these files from the results.

#!/bin/bash

echo "Remove IPs with Proxy errors"
HTMLIP=$(fgrep "<head>" *.ip | cut -d \: -f 1)
for f in $HTMLIP
do
 rm $f
done

echo "Extract IPs"
IPFILES=$(ls *.ip)
echo > ip.log
echo > ip.uniq.whois.log
echo "begin" > ip.uniq.log
echo "verbose" >> ip.uniq.log
for f in $IPFILES
do
 `cat $f >> ip.log`
done

cat ip.log | sort | uniq | wc -l
cat ip.log | sort | uniq >> ip.uniq.log
echo "end" >> ip.uniq.log

echo "Enrich with Cymru"
cat ip.uniq.log | nc whois.cymru.com 43 > ip.uniq.whois.log

The above gives a file with unique IP addresses (ip.uniq.log) and a file with Geo-information (ip.uniq.whois.log). From the latter I can extract the countries and their occurrences with

cat ip.uniq.whois.log | awk '{print $7;}' | sort | uniq -c | sort -nr

Geographic location of the Tor exit nodes

In total there were 369 unique IPs.

The majority of the Tor exit nodes is situated in the US 79 (21%), France 40 (11%), Germany 39 (11%) and Holland 36 (10%).

There were no exit nodes located in Belgium that popped up during this test.

In a previous post I visualised IP data with CartoDB. I now did the same with the list of unique exit node IPs. This resulted in this map at CartoDB.

Tor exit nodes

The map is published at CartoDB or (if you allow iframes) below.

Geo location differences

A word on the difference in geo location. In the data enriched via Team Cymru I had 11 IPs located in Great Britain. The CartoDB representation showed less IPs in Great Britain. Querying the IPs in CartoDB showed that the “GB”-IPs were be located in US, Mexico, Serbia and others. I do not have an explanation for the difference. Personally I put more trust in the accuracy of the data coming from Team Cymru.

Conclusion

Based on this short test it seems that Tor exit nodes do not alter the returned HTML content. It is not possible to deduct if any eavesdropping took place at the exit node.

Also this test retrieved a web page from a test domain. Pages coming from popular domains (Google, Facebook, etc.) might give different results.

Is Tor safer than Vodafone? When it concerns un-encrypted traffic you should not make a difference between how much you trust your “normal” ISP and how much you trust a Tor exit node maintainer. Neither of them should look at your content data but this does not mean they will refrain from doing so. Additionally, some Tor exit nodes are blocked (or throttled) at content delivery networks. This might negatively influence your Internet experience.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.