The fact that we are dealing with non-relational data makes it difficult to come up with a comprehensive definition of privacy in social networks. In general, one would like to say that properties of individual nodes should be privacy-sensitive and thus difficult to learn from the sanitized network, while aggregate properties should be learnable. But what counts as a "property of an individual node?" A natural candidate is any property about a -neighborhood for some small (for instance, a property that a user has 3 different paths of length 2 to a known Al-Qaeda operative). Unfortunately, there does not seem to be an elegant way of choosing because social-network graphs have a very small diameter due to the "six degrees of separation" phenomenon .
A related approach is differential privacy , which in the social-network context would require that the graph look roughly the same if any single node is removed. It is not obvious how to define node removal, and far from clear how to achieve differential privacy on graph-structured data, because aggregate properties of a graph can change substantially with the removal of a single node.
To keep the model simple and tractable, we do not use richer formalisms which may be suitable for some situations. For example, a multi-graph is a better model for social networks representing phone calls between individuals. We ignore the complex structure of node and edge attributes that may be relevant to privacy, such as "X knows Y through Z." We only use "public" and "private" as privacy labels, even though some networks allow more levels such as "viewable by friends," or even friends of friends.
Arvind Narayanan 2009-03-19