Tutorial 3
Date/Time: Thursday, October 18th, 2012 9:00am - 12:00pm
Duration: 3 hours
Title: Large-Scale DNS Data Analysis
Presenter: David Dagon, Georgia Institute of Technology
Abstract:
DNS data is increasingly used in security analysis, intrusion detection, and research. Even small DNS collection systems can generate enormous amounts of DNS traffic, requiring tera-scale storage. As a result, researchers looking at DNS traffic must often develop real-time, in-line analysis tools.
This tutorial will offer pragmatic advice and examples of DNS data in network measurement, security analysis, and threat identification. The focus will be on tool creation and modification (e.g., creating re-usable frameworks for real-time analysis), rather than any individual research topic (e.g., machine learning, measurement, or botnet remediation).
Participants are assumed to have strong skills in C and Python programming, familiarity with large-scale 'NoSQL' storage systems, and some familiarity with DNS resolver configuration (e.g., BIND, Unbound). For portions of the tutorial, participants will interact with local systems deployed on a LAN, and so should bring a suitable notebook or system. Depending on external network conditions, experiments will be run on existing DNS information sharing systems such as SIE. To simplify network access and to speed up development exercises, participants will be given a virtual machine image. The tutorial is designed for FreeBSD and Debian systems, but participants are free to bring their own gear and adapt. Components of this tutorial include:
- Fundamentals. This tutorial presumes some familiarity with DNS, but will include a brief overview of the domain name system, key RFCs, existing commercial and research-oriented information exchanges. The summary will cover both the academic and operational literature, and note current open topics in academic research.
- Policy. Consideration will be given to human subjects requirements for university research, user privacy, and license agreements for access to individual zones, IANA and RIR databases, and the like. Discussion will also focus on RFC 1262 and notification/opt-out requirements for Internet-scale DNS measurement projects.
- Passive DNS. Participants will extend existing tools that build and utilize passive DNS databases. Simple motivating examples will include building graphs of malicious domain networks, time series analysis, zone enumeration and walking, and flux network detection./li>
- Authority DNS Analysis. Participants will use and extend existing DNS analysis tools (pcaputils, dnscap, dnspython, scapy, and nmsg-based tools) to analyze authority DNS logs from a botnet C&C server. The exercise will identify recursive farms, closed recursive networks, and 'DNS radiation' on the Internet. Participants will also analyze TLD zones to identify spam networks.
- Large-scale DNS Measurement. Participants will build and execute a internet-scale DNS measurement system to identify recursives, authorities and secondaries, fingerprint resolvers, and find DNS path options. Using the completed passive DNS exercise, participants will identify zone cuts, and map policy hierarchies within networks.
In short, this is a pragmatic, tools-oriented tutorial. It is designed for researchers interested in working with large-scale DNS data. While motivating examples focus on DNS abuse, security and botnets, the tools are of general use for those performing surveys, measurement, or other tasks involving tera-scale DNS data. Bios:
David Dagon is a researcher at Georgia Tech, focusing on DNS security, botnets, and malware. He has written extensively on numerous DNS-based security topics, including DNS poisoning, DNS forgery resistance, and vulnerabilities in resolver architectures. He is the creator of the DNS-0x20 protocol, a DNS security measure now in wide use by DNS resolvers on the Internet. He is the co-founder of Damballa, an Atlanta-based security company that leverages DNS intelligence to protect enterprises.
Last modified: 2012-09-06 11:52:26 EDT