Tracking evolution of communities in a time-varying network is a key problem for which many algorithms have been proposed and developed. Mentioned below are few of them.
1.1 Classical Approach
The basic idea of this approach is to detect temporal clusters at each time step and then establish relationships between clusters for tracking community evolution over time. This approach is also known as two-stage approach. Figure (1) below illustrates the result of applying a longitudinal approach to a dynamic network across three time steps.
In first phase, clusters at each time step are detected: at time t, there are two clusters, and then there are three clusters at time t + 1 and four clusters at time t + 2. In second phase, the relationship between clusters at different time steps is established, which is shown by colors. Through the above results, it can be seen how the community structure of this graph evolves from the time step t to the time step t + 2. For the first phase, a graph clustering algorithm can be applied while for the second phase, a suitable matching metric can be used.
Different metrics can be used to track the community evolution. A matching metric is a similarity function, which measures how similar two communities are.E.g. Hopcroft et al.[2] used following match function. Let C and C' be two clusters, their match value is written as follows: match(C;C') = min { (C intersection C' / C), (C intersection C' / C')}
The definition ensures that a high matching value (close to 1) occurs when two clusters have many common nodes and are roughly of the same size. In [1], Palla et al. defined a relative overlap between communities at successive timestep. This is also known as Jaccard Index. The relative overlap value between two communities X and Y is written as follows: J(X; Y ) = {X intersection Y} / {X union Y}.
Two communities are matched if they share the highest matching value. Thus matching metric is a natural choice to connect temporal clusters over time. However, it may lead to noisy results where some nodes often change their community memberships. In those cases, some advanced approaches (e.g. Core-based methods, Union graph-based methods) are used.
1.1.1 Core-Based Methods
If a partition is significant, it will be recovered even if the structure of the graph is modified, as long as the modification is not too extensive. Instead, if a partition is not significant, we may observe that a minimal perturbation of the graph is enough to disrupt its group memberships. A significant cluster, that is, a significant group of nodes, is often defined as a community core. We can reduce noisy results by matching community cores. This is the main principle of core-based methods. The matching metric (as described in last section) is often applied. Two temporal clusters are matched if their community cores share the highest similarity value.
In [2] Hopcroft et al. have proposed the concept of natural communities, which are significant clusters that have high stability against modification of graph structure. Given a temporal graph, by applying 5% of perturbations, a set of modified graphs are produced, each of which has 95% of core nodes. Each natural community is identified by the partitions corresponding to these modified graphs, which has the best match value with clusters in those partitions.
Although core-based approach can smooth variances caused by peripheral nodes, its results still suffer from some limits such as the parameters used in matching metrics. In addition, if we only track evolution of community cores, there is a risk of missing important structural changes which are related to peripheral nodes.
1.1.2 Union-Graph-Based Methods
Another important early work [1] by Palla et al.for detecting community evolution is related to the union graph. Figure (2) depicts the concept of union graph. Each union graph merges two graphs (union of their links) present at contiguous time steps.
The basic idea of this algorithm is to construct a joint graph consisting of union of links for every consecutive snap-shot t and t+1 and extract CPM [9] community structure of this joint network. As by adding links to a network, the CPM communities can only grow, merge or remain unchanged, any community from either the t or the t + 1 snap-shot is contained in exactly one community in the joint graph. Thus, the communities in the joint graph provide a natural connection between the communities at t and at t + 1. If a community in the joint graph contains a single community from t and a single community from t+1, then they are matched. If the joint group contains more than one community from either time steps, the community pairs with highest relative node overlap [1] are considered. The technique is validated by applying it to two social systems: a graph of phone calls between customers of a mobile phone company over one year and a collaboration network between scientists spanning a period of 142 months. The main disadvantage of this technique is that the CPM algorithm fails to detect community structure of networks with few cliques. In addition, some parameters are used to determine how community changes due to the application of similarity metric.
1.1.3 Conclusion
The algorithms discussed are two-stage approaches, which at step (1) detect clusters at each time step independently of the results at any other time step and at step (2) infers relationships between clusters at different time steps. Such natural process often produces signi cant variations between partitions that are close in time, especially when the datasets are noisy. Since the first phase is independent of the past history, smooth transitions are impossible.
1.2 Evolutionary Clustering
An evolutionary clustering approach follows a principle of detecting community structure based on the current graph topology information at a given time t and on the community structure at previous time steps. In [3], Chakrabarti et al. has proposed the outline of the evolutionary algorithms.
- At each timestamp t it produces a clustering C(t) of the objects seen so far
- For clustering, agglomerative hierarchical clustering , k-means clustering approach can be used
- Distance from C(t) and C(t-1) will be evaluated
- Snapshot quality of C(t) will be evaluated against the objects that appeared in timestamp t
The overall quality of the partition C(t) at timestamp t is a combination of current snapshot quality and history cost. Obviously, to obtain a good partition it should have high snapshot quality data for current timestamp and at the same time, it must have low history cost i.e. the partitions are aligned with earlier timestamps.
1.2.1 Conclusion
There exist many other evolutionary clustering approaches. Chi et al. [4] used similar ideas and proposed the first evolutionary spectral clustering algorithms. They introduced graph cut as a metric for measuring community structures and community evolution. For instance, information theory has been used to detect community evolution in dynamic graphs. Sun et al [5] applied the MDL to nd the minimum encoding cost to describe a time sequence of graphs and their partitions into communities. The basic principle of this method is to encode the graph topology into a compression information with the minimum cost of the description. This method enables to provide meaningful information on community evolution. As opposed to two-stage approaches, evolutionary clustering does not encounter the matching problem. However, these methods are using parameters. Also, results of the evolutionary clustering are generally too strongly correlated with community history which may occult structural changes.
1.3 Coupling Graph Clustering
Coupling graph clustering approach is based on a coupling graph as shown in Figure (3) A coupling graph is a graph linking a sequence of graphs over several time steps by adding coupling edges between the same nodes at different time steps. Given a coupling graph, a subgraph which describes all interactions at a specific time step is called a slice. The underlying idea is once the coupling graph built (encompassing the time dimension as edges) to use an efficient standard static community detection heuristic.
The approach was first used by Jdidia et al. in [6] by building a temporal graph and then applying classical community detection algorithm Walktrap [7]. Mucha et al. [8] proposed a coupling graph based approach by optimizing a modified modularity. The modified modularity balances the contribution of community memberships to each slice and the cost for changing community memberships. The major advantage of this algorithm is to smooth community evolution.
1.3.1 Conclusion
This idea of coupling graph clustering simplifies the problem of detecting community evolution. However, it introduces the problems on construction of coupling graphs: how to assign weight on coupling edges? What is the length of coupling window ? These questions make the use of the coupling graph based approaches limited.
1.4 Summary
As observed, that each of above-mentioned approaches has it's own limitations. Two-stage approach aims at exhibiting community partitions from successive graph snapshots and thereafter connecting or smoothing these partitions using clever time-dependent features and sampling techniques. These approaches are nonetheless achieving longitudinal rather than dynamic community detection. Communities are fundamentally defined by the repetition of interactions among a set of nodes over time. According to this definition, analyzing the data by considering successive snapshots induces a significant loss of information and essential dynamic information is blurred such as communities based on repeated inter-temporal interactions, nodes switching from a community to another across time, or the possibility that a community survives while its members are being integrally replaced over a longer time period. As opposed to two-stage approaches, evolutionary clustering does not encounter the matching problem. However, most methods are using parameters. Furthermore, in evolutionary clustering results are generally too strongly correlated with community history which may occult structural changes. As a result to overcome these problems some sort of formalism is done in the context of time-directed datasets (such as citation networks) and this gives rise to representation of data differently and temporal communities are extracted from these data.
References
[1] Gergely Palla, Albert-Laszlo Barabasi, and Tamas Vicsek. Quantifying social group evolution. Nature, 446:664{667, April 2007.
[2] John Hopcroft, Omar Khan, Brian Kulis, and Bart Selman. Tracking evolving communities in large linked networks. Proceedings of the National Academy of Sciences, 101:5249{5253, April 2004.
[3] Deepayan Chakrabarti, Ravi Kumar, and Andrew Tomkins. Evolutionary clustering. KDD '06. ACM.
[4] Yun Chi, Xiaodan Song, Dengyong Zhou, Koji Hino, and Belle L. Tseng. Evolutionary spectral clustering by incorporating temporal smoothness. In Pavel Berkhin, Rich Caruana, and Xindong Wu, editors, KDD, KDD '06. ACM.
[5] Jimeng Sun, Christos Faloutsos, Spiros Papadimitriou, and Philip S Yu. Graphscope: parameter-free mining of large time-evolving graphs. In Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 687{696. ACM, 2007.
[6] Manel Ben Jdidia, Cline Robardet, and Eric Fleury. Communities detection and analysis of their dynamics in collaborative networks. In Dynamic Communities: from Connectivity to Information Society workshop co-located with ICDIM'07, pages 11{17, Oct 2007.
[7] Pascal Pons and Matthieu Latapy. Computing communities in large networks using random walks. In Computer and Information Sciences-ISCIS 2005, pages 284{293. Springer, 2005.
[8] Peter J. Mucha, Thomas Richardson, Kevin Macon, Mason A. Porter, and Jukka-Pekka Onnela. Community structure in time-dependent, multiscale, and multiplex networks, 2009.
[9] Gergely Palla, Imre Der enyi, Ill es Farkas, and Tam as Vicsek. Uncovering the overlapping community structure of complex networks in nature and society. Nature, 435(7043):814{818, 2005.
No comments:
Post a Comment