Blog Posts

Anti-Vaxxers

In the last couple of years the Anti-Vaccination crowd in the United States has started to make inroads with more and more people deciding that the perceived risk of the vaccination outweighs the known risk of the disease.

When you ask them why they dont vaccinatie they always have anecdotal evidence of how the vaccination could hurt them,  how they know of someone else who 5 years ago got a vaccination and it made them *really sick*  or they have an amazing supplement that they take that does much better than the vaccination would do.


I am not talking about parents who are put their children at risk of getting measles, I am talking about IT shops who are putting their companies, customers and data at risk by not taking proven preventative measures to secure their systems.

After 15 years in security I have heard all the excuses for not vaccinating systems:

It *might* break something.
We have a $500,000 Next-Generation  ██████ Box (Unconfigured).
We have not a had a *serious* outbreak yet.

The problem is when you bring proven and tested solutions like the CIS Critical Security Controls and the anti-vaxxers bring an anecdote you are going to lose.  My favorite mentor told me a long time ago you “you can’t debate an anecdote and win“.

This is normally where I like to end my blog post with a great solution we can all use. The problem is there isn’t a good solution to make people vaccinate their children and there isn’t a solution to make  people to vaccinate their systems.

Until then I am just happy I dont have to deal with polio or WannaCry.

Continue Reading

Finding and Mapping Domains With R

As I continue to try to learn R,  I am trying to build tools that other people might find useful. Tonight with the help of Bob Rudis I built a script that will find domains with a keyword in it from DomainPunch, do a geoip lookup and map it if it is online.

Since it is time to start thinking about defcon this summer I decided to use it as my keyword for the demo.

Here are all 544 live IPs with “defcon” in it mapped:

Link to the full screen map.
Here is a CSV of the data.

Here is the source code:

As a reminder if you want to play along at home there is an RStudio docker container so all you need to do is:

docker run -d -p 8787:8787 -e USER=<username> -e PASSWORD=<password> rocker/rstudio

Learning R is turning out to be more fun than I thought it would be so expect some more blog posts!  Here is a picture semi related to this blog post to make it look pretty when I share it on social media.  

Continue Reading

Finding Additions To The Umbrella DNS Popularity List

Since I have started looking at the Umbrella DNS Popularity List I was interested in seeing how much the data changes day to day.  I fired up RStuido and wrote some terrible code but finally got it to work with some help.

Yesterday there were 80937 new DNS names on the list that were not on the list the day before.
(Update: Here is a CSV of the 169366 domains that were not on the list April 1st but was on the May 1st list.)

Here are the new additions on a map:

Link to the full screen map.

Here is a CSV of the data with GEOIP information added. 

Here is code I ended up with if you want to build your own:

Up next is to run these domains through Virustotal to see if any of them are bad.

Here is a picture semi related to this blog post to make it look pretty when I share it on social media. 

Continue Reading

Big Data’ing The Umbrella DNS Popularity List

Recently I started looking at the Umbrella DNS Popularity List and did a blog post about it here. The data seemed valuable and lacking at the same time so I spent my *limited* free time this week learning about R and RStudio.

Protip:  If you want to play along at home there is an RStudio docker container so all you need to do is:

docker run -d -p 8787:8787 -e USER=<username> -e PASSWORD=<password> rocker/rstudio

Getting today’s list loaded into R is as simple as:

# Get Todays List
if (file.exists(fn)) file.remove(fn)
temp <- tempfile()
download.file("http://s3-us-west-1.amazonaws.com/umbrella-static/top-1m.csv.zip",temp)
unzip(temp, "top-1m.csv")
today <- read_csv("top-1m.csv", col_names = FALSE)
unlink(temp)

Now you have the Top 1 million DNS requests from Umbrella ready to be “big data’ed”.

At the start of this project I wanted to do the following:
Search the DNS names for keywords. (Done).
Map all the DNS records on a map. (Done, Kinda).
Compare today’s and yesterday’s records for new DNS records.
Check all the DNS records against Censys and record open ports, and software.
Check all the DNS records against VirusTotal and see if any of them are known bad.
Check all the DNS records against SSLLabs and record SSL grade.
Take a nap.

My limited results so far follow with hopefully more to come.

Search The DNS Names

I wanted to do this to be able to search the list for a keyword and build a table and map of the data.  This was fairly easy and with help of leaflet and datatables here is the output of searching today’s data for cisco.

Here is the map:

Here is a link to the data. 

Here is the R code I wrote:

Map All The DNS Records On A Map.

I got started on this and quickly realized that looking up the GEOIP information and mapping a million DNS records was going to take a week so I decided to do the Top 25,000 as a POC and come back and do all 1,000,000 later (maybe).

Here is the 25,000 Map:

Here is the R code I wrote:

I also built a map with the Top 100K on it but it is huge (Load at your own risk).

…More to come.

I will be spending some more time on this over the next couple of weeks but cant think @EngelhardtCR and @hrbrmstr enough for all the help they have been over the last week as.   They are true data scientist and I am just a hacker with a blog.  : )

If you have any questions or suggestions please let me know on twitter at @jgamblin.

Here is a picture semi related to this blog post to make it look pretty when I share it on social media. 

Continue Reading

Exploring Cisco’s Top 1 Million Domains Data

Cisco offers a daily list of the million most queried domain names from Umbrella (OpenDNS) users.    I had some time this weekend so decided to spend some time playing around with the data to see what I could find so I spun up a lightsail server and got to work.

Grabbing the file is as simple as:
wget http://s3-us-west-1.amazonaws.com/umbrella-static/top-1m.csv.zip

You can retrieve a specific date like this:
wget http://s3-us-west-1.amazonaws.com/umbrella-static/top-1m-yyyy-mm-dd.csv.zip
(Looks like 2017-01-20 is the earliest they have online).

Once you get that downloaded and unzipped (unzip top-1m.csv.zip) you can start exploring.

You can pull out the top 10 domains with this command:
head -n 10 top-1m.csv

1,google.com
2,www.google.com
3,microsoft.com
4,facebook.com
5,doubleclick.net
6,g.doubleclick.net
7,clients4.google.com
8,googleads.g.doubleclick.net
9,apple.com
10,fbcdn.net

(Full Output)

You can search for keywords with this command:
cat top-1m.csv | grep "opendns"

437,opendns.com
719,hydra.opendns.com
720,sync.hydra.opendns.com
1314,disthost.opendns.com
2756,api.opendns.com
4565,cacerts.opendns.com
5569,ipf.opendns.com
5699,block.opendns.com
7024,updates.opendns.com
8482,bpb.opendns.com

(Full Output)

To count the domain levels use this command:
awk -F, '{count=split($2,a,"."); print count}' top-1m.csv | sort | uniq -c | awk '{print $2,$1}' | sort -k1,1n

1 1086
2 263509
3 469756
4 193802
5 54281
6 13698
7 2952
8 689
9 172
10 16
11 26
12 2
13 1
14 1
15 1
16 1
17 1
18 1
19 1
20 1
21 1
22 1
23 1

(Full Output)
Notice anything strange here? Hint: A domain name requires at least two levels to be valid.

To find the broken DNS names in this list this command works:
cat top-1m.csv | awk -F, 'BEGIN {file="top-1m.csv" ; while ((getline line < file) > 0) {if (line ~ /#/) continue; tld[tolower(line)] = 1}} {foo=split($2,a,"."); if (foo == 1) {if (!(a[1] in tld)) {print $0}}}'  

1200,home
1490,local
2082,za
3916,lan
6350,url
10173,belkin
10869,uop
11187,localdomain
12887,localhost

(Full Output)

Find domains added to the list for today.
I  wrote a script to download the last two days of files and compare them for new domains:

You can find the output for April 24, 2017 here.

Overall I am really impressed with this data and will be using it to do more research and to track trends across the internet.  They have some more to do but it is an amazingly valuable free tool.

Also recently I have feel in love with sprunge to push data to an ad free “pastebin” from the command line:

cat file.txt | curl -F 'sprunge=<-' http://sprunge.us

Continue Reading

Site Footer