Challenges of defining privacy

The fact that we are dealing with non-relational data makes it difficult to come up with a comprehensive definition of privacy in social networks. In general, one would like to say that properties of individual nodes should be privacy-sensitive and thus difficult to learn from the sanitized network, while aggregate properties should be learnable. But what counts as a "property of an individual node?" A natural candidate is any property about a $k$-neighborhood for some small $k$ (for instance, a property that a user has 3 different paths of length 2 to a known Al-Qaeda operative). Unfortunately, there does not seem to be an elegant way of choosing $k$ because social-network graphs have a very small diameter due to the "six degrees of separation" phenomenon [82].

A related approach is differential privacy [23], which in the social-network context would require that the graph look roughly the same if any single node is removed. It is not obvious how to define node removal, and far from clear how to achieve differential privacy on graph-structured data, because aggregate properties of a graph can change substantially with the removal of a single node.

Even when the privacy policy is defined as a simple labeling of attributes (as we do in Section 4.4), the policy can be global or granular. With a global policy, the same privacy label applies to a given attribute in every node (e.g., email addresses are either public for all members, or private for all members). Similarly, the edges in the network are either all public, or all private. With granular policies, the privacy setting can be different for each edge and each attribute of each node.

A global policy is sufficient most of the time. In most contexts, the network operator promises users that none of their data will be released in a personally identifiable way, implying a privacy policy where all edges and all attributes are private. In other contexts, some attributes might be intuitively understood to be public (e.g., node degree) and others private.

Many online social-network services such as Facebook allow users to configure their individual privacy policy with a high level of granularity. This might become a common practice in the future, but so far it appears that the vast majority of users do not change their default settings [34; 46]. There is also some ambiguity in modeling user preferences as formal privacy policies: for instance, an edge may be considered public by one endpoint and private by the other.

To keep the model simple and tractable, we do not use richer formalisms which may be suitable for some situations. For example, a multi-graph is a better model for social networks representing phone calls between individuals. We ignore the complex structure of node and edge attributes that may be relevant to privacy, such as "X knows Y through Z." We only use "public" and "private" as privacy labels, even though some networks allow more levels such as "viewable by friends," or even friends of friends.

Arvind Narayanan 2009-03-19