Wednesday, February 25, 2015

Multi-Ego-Centred Communities

In social networks, communities are groups of people who share common features. Given a set of nodes, the most common way of identifying communities consists in classifying them into classes which may be predefined or not. This is what traditional classification and clustering approaches do respectively.

In the context of real-world graphs, community detection generally aims at finding a partition of nodes, i.e. disjoint communities where each node belongs to exactly one community. However it is unrealistic to apprehend that a user belongs to only a single community. Normally a person is part of several overlapping groups. Overlapping communities therefore should be allowed. However computing all overlapping groups leads to numerous problems including time and space complexity of the algorithm and difficulty in the interpretation of obtained results.

An interesting compromise consists in focusing on the groups related to one specific node, referred to as ego-centred communities. Despite promising initial results, ego-centred community detection is still a difficult problem because a single node can still belong to numerous groups. Therefore, communities identified by a set of nodes i.e multi-ego-centred communities have gained primary focus. It has been established that for real-world graphs, a small set of nodes is generally sufficient to define a unique community, which is not the case with a single node usually.

A recently developed proximity measure for ego-centred communities is the carryover opinion metric. For a given node of interest, the framework consists in first setting the opinion of this node to one and the opinion of all other nodes to zero. Then, at each time step, the opinion of every node is averaged with the opinion of one of its neighbours. The opinion of the node of interest is then reset to one. Thus, its opinion does not change throughout the process and remains equal to one. The speed of convergence is used to characterize the to what extent nodes are similar to the node of interest.
Two conjectures are needed to carry on: 
Conjecture 1: after a sufficient number of iterations, the ranking of nodes according to their opinion no longer changes. 
Conjecture 2: after a sufficient number of iterations, the difference between the opinion of two nodes decreases proportionally to the difference between the opinions of any other two nodes.1
Clearly this measure takes into account the whole depth of the graph, is parameter-free and is fast to compute.

The question is: how may the communities shared by a set of nodes be unfolded? The idea is that a node belonging to both a community of node1 AND a community of node2 has to be somewhat similar to both node1 AND to node2. The following steps are taken:
1) For all nodes, evaluate the proximity to node1 and to node2.
2) The proximity to the set {node1, node2} is then given by the minimum, or by the geometric mean of the similarities to node1 and the similarities to node2. This quantity measures to what extent node is close to node1 AND node2.
The method is easily generalizable to a set of more than two nodes.  

 In order to identify the potential community centred on both node1 and node2, the minimum of the carryover values obtained from node1 and from node2 for each node, u , of the graph is computed. The minimum value of both scores is used to measure the belonging of u to the community of (node1 and node2). These minimum values are sorted and the minimum carryover curve is plotted. An irregularity in the decrease, i.e. a plateau followed by a strong decrease, indicates that all nodes before the decrease constitute a community of (node1 and node2). The automated detection of this plateau/strong decrease pattern can be done by searching for the maximum slope and keeping the outcome if the slope is larger than a given threshold.

There are other methods for finding ego-centred communities, all of them based on the optimisation of a quality function. Quality function techniques, due to the non-convexity of the optimisation problem, often lead to small communities, while this approach does not suffer from this drawback.


No comments:

Post a Comment