Big Data’ing The Umbrella DNS Popularity List

Recently I started looking at the Umbrella DNS Popularity List and did a blog post about it here. The data seemed valuable and lacking at the same time so I spent my *limited* free time this week learning about R and RStudio.
Protip:  If you want to play along at home there is an RStudio docker container so all you need to do is:

docker run -d -p 8787:8787 -e USER=<username> -e PASSWORD=<password> rocker/rstudio

Getting today’s list loaded into R is as simple as:

# Get Todays List
if (file.exists(fn)) file.remove(fn)
temp <- tempfile()
download.file("http://s3-us-west-1.amazonaws.com/umbrella-static/top-1m.csv.zip",temp)
unzip(temp, "top-1m.csv")
today <- read_csv("top-1m.csv", col_names = FALSE)
unlink(temp)

Now you have the Top 1 million DNS requests from Umbrella ready to be “big data’ed”.
At the start of this project I wanted to do the following:
Search the DNS names for keywords. (Done).
Map all the DNS records on a map. (Done, Kinda).
Compare today’s and yesterday’s records for new DNS records.
Check all the DNS records against Censys and record open ports, and software.
Check all the DNS records against VirusTotal and see if any of them are known bad.
Check all the DNS records against SSLLabs and record SSL grade.
Take a nap.
My limited results so far follow with hopefully more to come.

Search The DNS Names

I wanted to do this to be able to search the list for a keyword and build a table and map of the data.  This was fairly easy and with help of leaflet and datatables here is the output of searching today’s data for cisco.
Here is the map:

Here is a link to the data. 
Here is the R code I wrote:
https://gist.github.com/jgamblin/7615b81cedd10e44d4f2220347b69cb0

Map All The DNS Records On A Map.

I got started on this and quickly realized that looking up the GEOIP information and mapping a million DNS records was going to take a week so I decided to do the Top 25,000 as a POC and come back and do all 1,000,000 later (maybe).
Here is the 25,000 Map:
Here is the R code I wrote:
https://gist.github.com/jgamblin/ccf3390bc5d2ce922cd5df38a40617b4
I also built a map with the Top 100K on it but it is huge (Load at your own risk).

…More to come.

I will be spending some more time on this over the next couple of weeks but cant think @EngelhardtCR and @hrbrmstr enough for all the help they have been over the last week as.   They are true data scientist and I am just a hacker with a blog.  : )
If you have any questions or suggestions please let me know on twitter at @jgamblin.
Here is a picture semi related to this blog post to make it look pretty when I share it on social media. 

Exploring Cisco’s Top 1 Million Domains Data

Cisco offers a daily list of the million most queried domain names from Umbrella (OpenDNS) users.    I had some time this weekend so decided to spend some time playing around with the data to see what I could find so I spun up a lightsail server and got to work.
Grabbing the file is as simple as:
wget http://s3-us-west-1.amazonaws.com/umbrella-static/top-1m.csv.zip
You can retrieve a specific date like this:
wget http://s3-us-west-1.amazonaws.com/umbrella-static/top-1m-yyyy-mm-dd.csv.zip
(Looks like 2017-01-20 is the earliest they have online).
Once you get that downloaded and unzipped (unzip top-1m.csv.zip) you can start exploring.
You can pull out the top 10 domains with this command:
head -n 10 top-1m.csv

1,google.com
2,www.google.com
3,microsoft.com
4,facebook.com
5,doubleclick.net
6,g.doubleclick.net
7,clients4.google.com
8,googleads.g.doubleclick.net
9,apple.com
10,fbcdn.net

(Full Output)

You can search for keywords with this command:
cat top-1m.csv | grep "opendns"

437,opendns.com
719,hydra.opendns.com
720,sync.hydra.opendns.com
1314,disthost.opendns.com
2756,api.opendns.com
4565,cacerts.opendns.com
5569,ipf.opendns.com
5699,block.opendns.com
7024,updates.opendns.com
8482,bpb.opendns.com

(Full Output)

To count the domain levels use this command:
awk -F, '{count=split($2,a,"."); print count}' top-1m.csv | sort | uniq -c | awk '{print $2,$1}' | sort -k1,1n

1 1086
2 263509
3 469756
4 193802
5 54281
6 13698
7 2952
8 689
9 172
10 16
11 26
12 2
13 1
14 1
15 1
16 1
17 1
18 1
19 1
20 1
21 1
22 1
23 1

(Full Output)
Notice anything strange here? Hint: A domain name requires at least two levels to be valid.

To find the broken DNS names in this list this command works:
cat top-1m.csv | awk -F, 'BEGIN {file="top-1m.csv" ; while ((getline line < file) > 0) {if (line ~ /#/) continue; tld[tolower(line)] = 1}} {foo=split($2,a,"."); if (foo == 1) {if (!(a[1] in tld)) {print $0}}}'  

1200,home
1490,local
2082,za
3916,lan
6350,url
10173,belkin
10869,uop
11187,localdomain
12887,localhost

(Full Output)

Find domains added to the list for today.
I  wrote a script to download the last two days of files and compare them for new domains:
https://gist.github.com/jgamblin/184590e2ba64371730e435ab2977e4cf

You can find the output for April 24, 2017 here.

Overall I am really impressed with this data and will be using it to do more research and to track trends across the internet.  They have some more to do but it is an amazingly valuable free tool.
Also recently I have feel in love with sprunge to push data to an ad free “pastebin” from the command line:

cat file.txt | curl -F 'sprunge=<-' http://sprunge.us

Burp Settings File

I am a huge fan of Tim Tomes and his Burp Suite Configuration Suggestions blog post.   The problem is that I only use Burp a couple times a month and end up facing this screen and have to re-configure burp on every launch:

So I built burpsettings.json that:

  • Disables Browsers XSS Protection
  • Disables Burp Collaborator Server
  • Disables Intercept by Default
  • Changes Scan Mode to Thorough
  • Turns Off Anonymous Feedback

This will help make my burp startup time a lot faster and I thought I would share the config file so it could help someone else also.

Newly Registered Domain Name Keyword Search

Today I was asked if it was possible to generate a list of domain names registered everyday with a keyword in the record (company name, city, trademark, etc).   There are a few paid services that do this and domainpunch.com has a web based tool that will do this but I wanted to automate it so I could use it with a slackbot so I put together this 4 line bash script:
https://gist.github.com/jgamblin/a353c8553e5dda51784d5b0614358aed
Usage:
./domains.sh keyword
Output:
This is super simple script but as they say “simplicity is the ultimate sophistication“.

Automating DigiCert Certificate Issuance

I am a big fan of DigiCert for TLS Certificates and CA/WebPKI services.   While they have amazing customer support and are an amazing company to work with, there are not a lot of automation scripts to interact with their API available. So over the weekend and with a lot of help from Clint Wilson I built a shell script that:

  • Creates a CSR/Key pair using OpenSSL.
  • Uses the Digicert API to:
    • Request a TLS certificate.
    • Approve the certificate.
    • Download the certificate in:
      • .zip
      • p7b
      • pem
      • pem (with no root)

Here is the script in action:

Here is the code:
https://gist.github.com/jgamblin/bd04b9ef8fe3660f4a247cc7d2109df0
I have tested it on OSX,  Ubuntu and CentOS7 and it is fairly cross platform friendly.  Extending this script to install it should be easy but we already had the automation built to do that so it was not necessary.
Let me know on twitter if you have questions.

Leadership Quotes From My Mentors Dad

An amazing mentor and leader I work with has been talking to me recently about what real leadership looks like and shared with me a list of quotes he keeps on his desk that his dad who had a leadership role in the military collected and gave to him.  He gave me a copy and said I was free to share them.

My [Dad’s] Rules Of Leadership:
  • Develop a vision and live it.
  • Dont lie for your people and dont lie to your people.
  • Beware of RUMINT.  It’s faster than you are.
  • Dont back away from the hard decisions,  especially personnel decisions.
  • Bad news never goes down easy and it won’t get easier with time.  It’s best to get it over with.
  • Support your subordinate supervisors when they take the high road.
  • Tell people exactly what you expect of them, including the obvious.
  • Involve your people in decisions and action planning.
  • Give them credit when things work.  Give them top cover when things go awry.
  • Trust the experts.  That’s what you pay for them for.
  • Avoid Bullshit.  You may get past the fans, but you won’t get past the players.
  • Knowledge may be power, but knowledge shared is powered squared.
  • Set an example by taking on the hard jobs.
  • Listen-Decide-Explain-Act.
  • Old ship-driving rule: When you get in extremis, DO SOMETHING.  The worse thing you can do is nothing.  Make decisions smartly and dont vacillate.  If you are wrong, admit it, back up and turn right.
  • No, you are not always right.  Get over it.  You are not as smart as you think you are and you may not be as smart as others think you are either.
  • Never underestimate the power of the expression “Thank You”.
  • Don’t fight with your friends. You haven’t got the time.

These quotes are amazing and I will be reflecting on this list for the rest of my career and am really happy to be able to share them.
Here is a “leadership” picture so shared links look better:

10 Questions You Should Ask Every Leader

I am reading a book called “The Art of Authenticity” and in the book over a couple of chapters it talks about understanding what makes strong leaders and deciding who you should follow.
I have pulled these 10 questions out of those chapters:

  • ​What was your first leadership role?
  • When you think about the process of becoming the leader that you are today, what experiences stand out for you as turning points? ​​​
  • How do you choose people to hire for your team?
  • Do you look from resume virtues or eulogy virtues first when you make a hiring choice?
    (Resume virtues are the skills you bring to work,  Eulogy virtues are things people say about you when you die.)
  • What kinds of behaviors irritate you in colleagues?
  • Whom do you admire?
  • How would you describe yourself as a leader?
  • What kinds of situations bring out the best in you?
  • What kinds of situations bring out the worst in you?
  • What is the hardest thing you have ever done as a leader?

​I will be spending the next couple of weeks with my mentors and leadership team finding out their thoughts on these questions and this is one of those non-technical things I felt inclined to share.  I am sure this book will spawn a few more these short posts.

Easily Check Certificate Transparency Log

Certificate transparency logs are an amazing way to get a good overview of your certificate landscape, detect fraud (bad guys also use TLS) and find shadow IT and unknown cloud services. The problem is that there are not many good places to search these logs.  The best I have found is from Symantec, although it is slow and errors out often but it works for what I need.
The best way to get the data from this service I found is with this simple bash script I put together that runs a curl command and downloads a .csv.
Running is is as simple as:
./ctlog.sh yourorgsname
https://gist.github.com/jgamblin/8b34ba91825a8c2859720033bfe81da8
The output should look like this:
(If it is blank the service likely timed out and you will need to rerun it.)
Unless you are really on top of your game you are likely to find a valid certificate you didn’t know about.

Early Lessons Learned in Car hacking

Ever since Charlie Miller hacked a Jeep while it was driving on the interstate I have wanted to learn more about Car Hacking but really had not had a chance to get started with it until a month ago when I ordered a Carloop and was ready to get hacking:

… or so I thought.  Turns out car hacking is hard… like, really-really hard. While I have not “hacked” anything yet I have learned some early lessons:

Once you get the basic setup down you will spend a lot of time in your driveway and garage doing this:

“Car Hacking” is fairly new and you will likely not find a lot of information about your car online and will have to decode (and hopefully share) a lot of the information you find.  Reddit and Twitter have some fairly active discussion groups.
Car Hacking so far has been an amazingly fun project and there are amazing new tools coming out all the time.  I just backed Macchina on KickStarter this week and would like to pick up a canb.us.  I am sure my car hacking tool kit will continue to grow.
I will be blogging more about my adventures into car hacking over the next couple of months as I learn more and have more to share.

Getting The Most Out Of RSA

The RSA conference starts next week and lets be honest it is becoming known as a stuffy management conference with very little useful technical information but if you know where to look you can take some deep dives.  I have put together a quick guide of some amazing talks and events I am looking forward to.

Talks:

BSidesSF –  Coming into town a few days early just to attend this conference.  There is so much good stuff on the schedule but I do not want to miss the Advanced Internet dataset combinations for #ThreatHunting & Attack Prediction talk.
Google Cloud Talks –  If you have cloud “stuff” in your company you need to swing by and catching some of these talks.  I am really looking forward to the Container Security Panel and while not technical Humanising DDoS: the technical and emotional impact of large-scale attacks on an organisation looks ridiculously intriguing.
IOActive –  IOActive always does an amazing job with their IOASIS and talks.  I am really looking forward to the Implementing Inexpensive Honeytrap Techniques  and the Hardcore Cloud Forensics talks.
DevOOPS: Attacks and Defenses for DevOps Toolchains –  This talk by Ken and Chris is the one RSA talk I will not miss.

Events:

I ♥ Cisco Umbrella Soirée – My friends at OpenDNS always do an amazing job with their RSA party and I cant wait to see what they do on Valentines day with 20,000 geeks stuck in San Francisco. 
Forescout
 – One, two, three and to the Snoop Doggy Dogg is at the door Ready to make an entrance so back on up.   Snoop provided the soundtrack to my 7th grade basketball team and I am really looking forward to seeing him in person.

Tenable -Tenable is having an 80’s party on Sunday and to quote Jay-Z:
Wanna bring the 80’s back?
That’s okay with me, that’s where they made me at.
BJJ Smackdown – For $50 you can be punched in the face by Jeremiah Grossman and maybe pickup some BJJ skills.
Rsaparties.io – Has a list of about 500 more parties you can attend.
If I am missing something I should be at or if you want to say hi next week you can catch me on twitter at @jgamblin.

Site Footer