Tag Archives: English


Qualcuno di voi sa gia’ cosa e’ successo. Un cambio di piattaforma, una riordinata ai miei post e l’apertura di una sezione in inglese del blog.

La migrazione e’ quasi completa e passata la sbornia post-vacanziera ricomincero’ a scrivere come e piu’ di prima… cioe’ non troppo frequentemente, ma chissa’ che il vento autunnale non mi porti anche un po’ di ispirazione in piu’! 😉

ICANN and new TLDs

(cross-post Experiment, Three)

I finally decided I will not write about ICANN’s latest decision of liberalising the “market” of generic TLDs. I will point you instead to Nominet’s company blog: Nominet is the .uk registry and this post is a very interesting insight in ICANN’s process and decision.

Web 2.0 and Databases, can the two worlds meet?

(cross-post Experiment, Three)

A few weeks ago, I had an interesting conversation with Paolo on why web 2.0 tools are still struggling to find their way in the academic world. Back in September last year I attended the panel What Web 2.0 Has To Do With Databases?, which investigated the reasons why the database community has left behind in the research in the field of web 2.0.

Following Paolo’ suggestion, I post the notes I took at the time. Having clear in mind that the two topics are different, I think they are somehow correlated, because those people that consider blogs, wiki, etc., a “waste of time” are also the ones that are missing the opportunity in doing research in such an interesting field.

  • Sihem Amer-Yahia (Yahoo!)
  • Alon Halevy (Google)
  • AnHai Doan (University of Wisconsin)
  • Gerhard Weikum (Max-Planck Institute for Informatics, Germany)
  • Gustavo Alonso (ETH, Zurich)

Abstract can be found here.
Here is Alon Halevy’s post on the panel: read, in particular these two comments (1, 2) which, in my opinion, summarise quite well the situation.
Is the database community ready to accept the new challenges that are coming from the Web 2.0 world? The risk of “missing the train” is very high, considering that the commercial interest on these technologies is leaving academic research behind.


  • Web 2.0 is about people, unstructured data, imprecise queries, information retrieval.
  • Web 2.0 is not about structure and quality.

Unstructured data and applications are pervasive, they are everywhere and companies greatly exploit them, but:

  • A “holistic approach” is lacking (all current solutions are ad-hoc solutions)
  • The “structured methodology”, typical of the database community, should be brought into the Web 2.0.

Database people were not fully convinced by Web 2.0 and the two worlds seemed quite distant. In general, they do not believe that databases as we know them (their structure, methodologies, best practices, etc.) will ever lose their cenrtrality in any information management application. Even web 2.0 is only a “cool application” that will eventually be substituted by something else, whereas databases will still be in place.

This is quite a conservative point of view and even those who say that “traditional DBMS’ are dead” (Michel Stonebraker among others, but he’s not the only one) seem, in practice, to be a bit sceptical about the loss of centrality of the databases.

Everybody seemed to agree that tight schema integration is a buzz word that does not work in the real world, and this despite the fact that it has been studied for several years both in the industry and in the academia.

Web 2.0 seems the good compromise to have “real” integration, though this happens at the data level (and should probably be called “data reconciliation” instead). From the schema point of view, someone argued a real integration is not possible because there are no strong stakeholders demanding for it (these will not be neither the people on the street nor Google or Yahoo).

Google pushes forward the concept of a dataspace (btw, Halevy’s dataspace) that includes all users’ data. The physical system is left in the background, almost a legacy from the past: data matters, databases are needed for storage, reliability, etc. (are we talking about cloud computing?).

Someone’s comment: companies are keen of groups that do research on Web 2.0 and even encourage them to do it. However, Web 2.0 is about people and data: if the big companies do not release the data they have, how can the DB community research on it (and what should they analyse?)?

The two worlds seemed very distant and the main reason probably relies in the different backgrounds: database are structure, metodology and algorithms. Web 2.0 is based on randomness (well, some form of), no predefined schema and, among all, unpredictable social interactions that are kept away from databases. It is no surprise that the communication between the two is particularly difficult.

Italian TLD and malicious web sites

(cross-post Experiment, Three)

Mapping the Mal Web, Revisited (McAfee, June 4).

A new security report from McAfee has just been released on the spread of malicious web sites among different TLDs. Very informative and detailed, the report integrates last year report. Some of the key findings:

  • .ro (Romania) and .ru (Russia) are the most risky European TLDs, i.e., the probability of finding a malicious web site is higher if surfing one of those TLDs.
  • Risk related to .biz (business) and .cn (China) is also increasing (if compared to last year)
  • .it (Italy) has worsened, but is still “a safe place”
  • .hk (Hong Kong) is the riskiest TLDs

The “Hong Kong” case, in particular, is worth a closer attention:

Bonnie Chun, an official [from the .hk] TLD, acknowledged that they had made some decisions that inadvertently encouraged the scammers:
1 . “We enhanced our domain registration online process thus making it more user-friendly. Instances include the capability for registering several domains at one time, auto-copying of administrative contact to technical contact and billing contact, etc. Phishers usually registered eight or more domains at one time.
2 . We offered great domain registration discounts, such as buy-one, get-two domains.
3 . Our overseas service partners promoted .hk domains in overseas markets.”

In a previous post I talked about the recent increased phishing activity in the .uk registry, which, in that particular case, has taken advantage from Nominet’s automatic registration process.

Other country, other problem: the .it registry will implement automatic registration procedures by the end of the year; and I read, a couple of weeks ago on Swartzy’s blog, that the IIT/CNR is also launching an advertisement campaign for .it domains.

I am curious to see if, in analogy to what happened in Hong Kong, we will see an increase of the malicious activity in the .it TLD.

DNS Ops Workshop

(cross-post Experiment, Three)

As promised, I post a report of the DNS Ops workshop I attended last week. The workshop has been very interesting, though a few talks were a bit too technical for me, which I only have a partial knowledge of DNS operations. Following, then, you will find a non-comprehensive list of “impressions” rather than a detailed report.

A Statistical Approach to Typosquatting
Of course 😉 I will start from my talk, which reports the preliminary results of the research on typosquatting I have been conducting recently. The slides can be found here (and here as well, as I gave the same talk at the Centr technical meeting in May).

The talk seems to have generated a bit of interest in the audience, though I think it suffered a bit from the fact that these are “early results” and much work still needs to be done before we can claim we really understand what typosquatting is (at least from a technical point of view). The talk also raised a bit of questioning about Nominet’s involvement in typosquatting. Just to be clear, at the moment Nominet is interested in my work only from a research point of view and is not taking any position in favour or against any registrar, registrant or any other party that might think to be the object of my work.

DNS monitoring, use and misuse
According to Sebastian Castro (CAIDA), in 2007 only 510 unique IP addresses generated 30% of the traffic at the root servers and 144 of them (called Heavy Hitters) sent more than 10 queries/sec and in 11 cases more than 40 queries/sec.

This are impressive numbers which might tell something about the kind of traffic that daily takes place in the Internet.

Later on, Shintaro Nakagami from NTT Communications, one of the major ISPs in Japan, reported that only 15% of the queries hitting their name servers were legitimate. This doesn’t mean that the other are necessarily malicious, for example, many of them are simply malformed queries or are generated by misconfigured web servers, however…

Finally, Young Sun La (NIDA, Korea) showed an impressive tool that they use at NIDA for monitoring queries to the .kr name servers in real time. It even sends sms’ to sysadmins if an urgent problem arises. Have a look at the slides for an idea of how it works. I might have heard that the software will be released for download, but I might have misunderstood.

How do you conveniently represent the IPv4 space? With a Hilbert Curve, for example, or, as Roy Arends (Nominet) suggests, with a Z-order curve. The resulting graph is more intuitive to read and can easily be extended to work in a 3D space.

Check out his interactive tool (from Nominet website) and his slides. In particular, go to slide number 9 and watch the heatmap of… women below 30 and earning more 100000$/year in Manhattan!!

Privacy issues in DNS
Karsten Nohl (University of Virginia) talked about the privacy issues related to the use of DNS caches. When users query the DNS they leave pieces of information in many caches and they have to trust several entities, ISPs, registries, backbone operators, etcc, that their information will not be released, sold, etc.

DNS operators cache the results of user queries, i.e., the IP corresponding to certain URLs in order to retrieve them more efficiently. This information is anonymous, i.e., they do not register the IP who made the query (in theory), but in practice certain URLs are used only by one (or a small subset of) person(s). At present, it is relatively easy for a malicious party to trace the online behaviour of some user by querying specific DNS servers only and check whether a specific URL is present in their cache.

Such an attack can be used to identify the individuals that access a specific web site: knowing the IP gives the geographic localisation of a user, but knowing his/her online behaviour might disclose much more personal information. Alternatively, it might be possible to track a specific user.

This scenario might become even more critical with the large-scale deployment of RFIDs. RFIDs have unique identifiers but are too small to store information (e.g., product information, price, etc) and they will use the DNS to look up for this data. Then, RFIDs (which have unique identifiers) will be indexed by the DNS and it will be easy to identify single users.