(cross-post Experiment, Three)
As promised, I post a report of the DNS Ops workshop I attended last week. The workshop has been very interesting, though a few talks were a bit too technical for me, which I only have a partial knowledge of DNS operations. Following, then, you will find a non-comprehensive list of “impressions” rather than a detailed report.
—
A Statistical Approach to Typosquatting
Of course 😉 I will start from my talk, which reports the preliminary results of the research on typosquatting I have been conducting recently. The slides can be found here (and here as well, as I gave the same talk at the Centr technical meeting in May).
The talk seems to have generated a bit of interest in the audience, though I think it suffered a bit from the fact that these are “early results” and much work still needs to be done before we can claim we really understand what typosquatting is (at least from a technical point of view). The talk also raised a bit of questioning about Nominet’s involvement in typosquatting. Just to be clear, at the moment Nominet is interested in my work only from a research point of view and is not taking any position in favour or against any registrar, registrant or any other party that might think to be the object of my work.
DNS monitoring, use and misuse
According to Sebastian Castro (CAIDA), in 2007 only 510 unique IP addresses generated 30% of the traffic at the root servers and 144 of them (called Heavy Hitters) sent more than 10 queries/sec and in 11 cases more than 40 queries/sec.
This are impressive numbers which might tell something about the kind of traffic that daily takes place in the Internet.
Later on, Shintaro Nakagami from NTT Communications, one of the major ISPs in Japan, reported that only 15% of the queries hitting their name servers were legitimate. This doesn’t mean that the other are necessarily malicious, for example, many of them are simply malformed queries or are generated by misconfigured web servers, however…
Finally, Young Sun La (NIDA, Korea) showed an impressive tool that they use at NIDA for monitoring queries to the .kr name servers in real time. It even sends sms’ to sysadmins if an urgent problem arises. Have a look at the slides for an idea of how it works. I might have heard that the software will be released for download, but I might have misunderstood.
Heatmaps
How do you conveniently represent the IPv4 space? With a Hilbert Curve, for example, or, as Roy Arends (Nominet) suggests, with a Z-order curve. The resulting graph is more intuitive to read and can easily be extended to work in a 3D space.
Check out his interactive tool (from Nominet website) and his slides. In particular, go to slide number 9 and watch the heatmap of… women below 30 and earning more 100000$/year in Manhattan!!
Privacy issues in DNS
Karsten Nohl (University of Virginia) talked about the privacy issues related to the use of DNS caches. When users query the DNS they leave pieces of information in many caches and they have to trust several entities, ISPs, registries, backbone operators, etcc, that their information will not be released, sold, etc.
DNS operators cache the results of user queries, i.e., the IP corresponding to certain URLs in order to retrieve them more efficiently. This information is anonymous, i.e., they do not register the IP who made the query (in theory), but in practice certain URLs are used only by one (or a small subset of) person(s). At present, it is relatively easy for a malicious party to trace the online behaviour of some user by querying specific DNS servers only and check whether a specific URL is present in their cache.
Such an attack can be used to identify the individuals that access a specific web site: knowing the IP gives the geographic localisation of a user, but knowing his/her online behaviour might disclose much more personal information. Alternatively, it might be possible to track a specific user.
This scenario might become even more critical with the large-scale deployment of RFIDs. RFIDs have unique identifiers but are too small to store information (e.g., product information, price, etc) and they will use the DNS to look up for this data. Then, RFIDs (which have unique identifiers) will be indexed by the DNS and it will be easy to identify single users.