Top 1000 Websites Blocking VPN & TOR Users

One of the tips that security professionals love to give is to use a VPN on public wifi networks.   This is great advice and  (I personally like PrivateInternetAccess and NordVPN). Recently I noticed nike.com blocks traffic from TOR and VPN providers:
Screen Shot 2016-07-06 at 6.36.19 AM
That got me wondering what other websites were  blocking traffic from these sources so I decided to test the Alexa Top 1000 websites.
First I needed to get a list of the Top 1000 websites.   To do this I used this line of command line kung fu that grabs a CSV of the top 1 million websites and puts the top 1000 in a urls.txt file:
curl -s -O s3.amazonaws.com/alexa-static/top-1m.csv.zip ; unzip -q -o top-1m.csv.zip top-1m.csv ; head -1000 top-1m.csv | cut -d, -f2 | cut -d/ -f1 > urls.txt
Here is the output from this command.
I now needed to automatically take a screenshot of 1000 websites.   I had started to write my own terrible python script using selenium until Chris Truncer pointed me to his amazing project called EyeWitness.
The command I used was:
./Eyewitness.py --web -f urls.txt
Screen Shot 2016-07-06 at 8.45.38 AM
During my first test using  PrivateInternetAccess I found  11 of 1000* blocked access with a 401/404:
hilton.com
nike.com
craigslist.org
tickermaster.com
tradeadexchange.com
blog-newstime.com
brightonclick.com
adnetworkperformance.com
kissanime.to
neobux.com
loading-delivery2.com
With craigslist.org, nike.com, ticketmaster.com and hilton.com being the most inpactful websites on that list:

I then ran the test again through tor (using the tor container I built) and found 40 of 1000* blocked access with a 401/404: :
adnetworkperformance.com
nordstrom.com
overstock.com
asos.com
prjcq.com
avito.ru
quikr.com
bestbuy.com
retailmenot.com
blog-newstime.com
secureserver.net
brightonclick.com
shopclues.com
craigslist.org
ticketmaster.com
expedia.com
tradeadexchange.com
foxnews.com
trulia.com
garmin.com
tube8.com
groupon.com
usbank.com
ticketmaster.com
irs.gov
usps.com
justdial.com
walmart.com
kohls.com
wayfair.com
lowes.com
hilton.com
whitepages.com
macys.com
xbox.com
newegg.com
zara.com
nike.com
zhihu.com
With many more asking for a captcha before gaining access:
http.amazon.com
Epilogue:  I play defense in my day job.  I understand the need stop malicious traffic from reaching your website.  This isn’t an indictment just an academic exercise although if more and more websites take this  approach tools like TOR and commercial VPNs will become less useful.
Final Notes: 
I was surprised at how many porn websites are in the top 1000 overall websites.
It takes 1.8 gigs of storage to screenshot the top 1000 websites.
*Your results will vary on what is blocked based on exit node,  VPN, time you test and what color shirt you have one.

Site Footer