Small World !!!: February 2015

Saturday, February 28, 2015

Markov Chains and The importance of directionality in detecting interest communities

A Markov chain, named after Andrey Markov, is a mathematical system that undergoes transitions from one state to another on a state space. It is a random process usually characterized as memoryless: the next state depends only on the current state and not on the sequence of events that preceded it. This specific kind of "memorylessness" is called the Markov property.

The changes of state of the system are called transitions. The probabilities associated with various state changes are called transition probabilities. The process is characterized by a state space, a transition matrix describing the probabilities of particular transitions, and an initial state (or initial distribution) across the state space. By convention, It is assumed that all possible states and transitions have been included in the definition of the process, so there is always a next state, and the process does not terminate.

Markov time is the time spent in each state before transition to other state.

The importance of directionality in detecting interest communities

Directionality is a crucial ingredient in many complex networks in which information, energy or influence are transmitted. In such directed networks, analysing flows (and not only the strength of connections) is crucial to reveal important features of the network that might go undetected if the orientation of connections is ignored. Clearly, it is not the same to ‘follow’ a widely known personality in Twitter as to be followed by one.

The riots of 6–10 August 2011 in England were followed by an intense public debate about the role and influence of social media during the unrest. Politicians, journalists, pundits and bloggers alike weighed in on the issue, but few arguments were based on data. A few months after the riots, The Guardian made available to the public a list of the 1000 ‘most influential’ (i.e. the most re-tweeted) Twitter users during the riots. The list compiled by The Guardian comprised a diverse set of Twitter users, including newspapers, broadcasting services, news agencies, as well as individual accounts of journalists, politicians, entertainers, global and local activists, and members of the public. To confirm the hypothesis an independent study was done (which is explained below) to analyse the connected components among 1000 listed people, based on twitter directed network.

Flow-based ‘interest communities’: a view of the
network at different resolutions

To gain a structured view of the communities in the network at different levels of resolution, Markov Stability community detection is used. A key advantage of Markov Stability is that it is based on a quantitative criterion that relies on flow propagation and containment, and thus identifies flow communities. The communities so found correspond to ‘interest communities’, in so much as information, interest and influence are propagated, retained and reinforced within them following the edges. If edge directionality is ignored, the community structure is blurred and the analysis severely hindered. A second advantage of the method is that the network is scanned for structure at all scales, and flow communities are found to be relevant at different levels of resolution.

Interest communities at all scales as detected by Markov Stability.
(a) The number of communities at each Markov time (t). The inset shows the network with nodes and edges colored according to their community at four illustrative Markov times. Two of these partitions at different resolutions are shown in more detail.
(b) At relatively short Markov times (t1 = 0.15), we find 149 communities (coarse-grained network view in the centre). Three examples of communities in this partition are ‘police and crime reporting’ (top), ‘Hackney’ (bottom), ‘the Daily Telegraph’ (left) shown with their members and their self-description word clouds.
(c) At longer Markov times (t4 = 7.0) we find four communities (coarse-grained view in the centre): three large communities broadly corresponding to ‘UK’ (bottom-right), ‘international’ (top), ‘celebrities/entertainment’ (bottom-left) and a small one corresponding to the ‘BBC’ (right), which remains as a distinct community across a large span of Markov times.

As a visual aid to interpret the theme of the communities, ‘word clouds’ are created from the most frequently used words in the Twitter self biographies of the users in each community

The importance of directionality in detecting interest communities

Communities containing the account of BBC Radio 4’s Today programme (bbcr4today) in the undirected (top, diamonds) and directed (bottom, circles) versions of the network at Markov times
t = 0.86, t = 1.3 and t = 7.0, along with their word clouds. In the middle it is shown the size of the communities of the Today programme in both versions of the network for Markov times between 0.1 and 10.

As stated above, the BBC is an example of a flow community that stands out in its persistency. In above figure it is shown how the community of BBC’s Today programme (a morning news broadcast with a broad audience) remains consistently grouped across many levels of resolution in the analysis of the directed network: from an early Markov time, BBC-related accounts are grouped together and remain so all the way up to the top levels of resolution, with consistent word clouds throughout. This phenomenon depends strongly on the directionality of the flows: the nodes in the BBC community are among the most important in the network (high in-degree and PageRank), attracting flow (attention) from elsewhere in the network and retaining it for long periods of Markov time.
Whereas in a symmetric network, such communities can go undetected, as shown in above fig., where the corresponding undirected community of the BBC’s Today programme is quickly blurred across Markov times and gets mixed with a variety of users with little in common, consisting mainly of politicians from the Labour Party and journalists.
Interestingly, this drastic difference between directed and undirected communities is not observed for all communities in the network. There are communities, such as the one including Guardian columnist George Monbiot, which behave in an essentially similar manner in both cases across levels of resolution (Figure is shown below).
This difference between communities that are sensitive or insensitive to directionality persists across time scales, signalling the fact that some groupings (such as the BBC community) are fundamentally based on retention of directed flows, while others (such as the Monbiot community) follow from a balanced flow and, thus, can be captured by standard undirected measures. We note that the directed Markov Stability method is able to detect both types of communities simultaneously.

Directed and undirected communities containing the account of George Monbiot (georgemonbiot) obtained from the undirected (top, diamonds) and directed (bottom, circles) networks at Markov times t = 0.8603, t = 1.3049 and t = 7.0, along with their word clouds. We can compare these results with those obtained in figure 2 for BBC Radio 4’s Today programme

Reference:

http://rsif.royalsocietypublishing.org/content/royinterface/11/101/20140940.full.pdf

http://en.wikipedia.org/wiki/Markov_chain

Evolution of Online User Behavior During a Social Upheaval

Social media represent powerful tools of mass communication and information diffusion. They played a pivotal role during recent social uprisings and political mobilizations across the world. Here we present a study of the Gezi Park movement in Turkey through the lens of Twitter. We analyze over 2.3 million tweets produced during the 25 days of protest occurred between May and June 2013. We first characterize the spatio-temporal nature of the conversation about the Gezi Park demonstrations, showing that similarity in trends of discussion mirrors geographic cues. We then describe the characteristics of the users involved in this conversation and what roles they played. We study how roles and individual influence evolved during the period of the upheaval. This analysis reveals that the conversation becomes more democratic as events unfold, with a redistribution of influence over time in the user population. We conclude by observing how the online and offline worlds are tightly intertwined, showing that exogenous events, such as political speeches or police actions, affect social media conversations and trigger changes in individual behavior.
Our analysis is based on data collected from Twitter. The dataset collected for our study comes from a 10% random sample of all tweets streamed. The observation period covers 27 days, from May 25th to June 20th, 2013. The short period prior to the protest inception is used as baseline to define user activity and interests. Our sample not only contains information about the tweets, but also meta-data about the users, including their screen names, follower/followee counts, self-reported locations, and more. Additionally, for content posted with a GPS-enabled smartphone, we have access to the geographic location from which the tweets were generated. The hashtags were manually divided in three categories: general-interest hashtags, local protest related ones, and finally those used by government supporters. Overall, we collected 2,361,335 tweets associated with the Gezi Park movement, generated by 855,616 distinct users and containing a total of 64,668 unique hashtags. Among these 2.3 million tweets, 1,475,494 are retweets and 47,163 are replies from one users to another. Also, 43,646 tweets have latitude/longitude coordinates. We adopt this subset of geolocated tweets to study the spatio-temporal nature of the protest.

Spatio-temporal cues of the conversation

Our first analysis aims at determining the extent to which the discussion about Gezi Park attracted individual attention inside the national boundaries of Turkey, where the movement began, and how much of this conversation spread worldwide.

Geographic distribution of tweets in our sample related to the discussion of Gezi Park events. The

histograms represent the total volume by latitude and longitude. Content production crossed the Turkish national boundaries and spread in Europe, North and South America.

Distribution of top 10 languages in tweets about the protest. Language information was extracted from the tweet meta-data.

Each location is described by a frequency vector of occurrences of the observed trends. The similarity between pairs of cities is calculated as the cosine similarity of their trends frequency vectors. Above the matrix we show the dendrogram produced by hierarchical clustering, where it is possible to appreciate the separation in three clusters. Such clusters neatly correspond to three different geographic areas of Turkey. Physical proximity seems to play a crucial role in determining the similarity of topical interests of individuals, consistent with other recent results.

We wanted to determine whether the activity on social media mirrored on-the-ground events, and whether bursts of online attention coincided with real-world protest actions. We analyzed the time series of the volume of tweets, retweets and replies occurring during the 27-day-long observation window.

The discussion was driven by bursts of attention that largely corresponded to major on-the-ground events , similar to what has been observed during other social protests. It is also worth noting that the numbers of tweets and retweets are comparable throughout the entire duration of the conversation, suggesting a balance between content production and consumption. In the middle panel of Figure 4 we report the number of users involved in the conversation at a given time, and the cumulative number of distinct users over time (dashed red line); similarly, in the bottom panel of the figure, we show the total number of hashtags related to Gezi Park observed at a given time, and the cumulative number of distinct hashtags over time. We note that approximately 60% of all users observed during the entire discussion joined in the very first few days, whereas additional hashtags emerged at a more regular pace throughout a longer period. This suggests that the conversation acquired traction immediately, and exploded when the first on-the-ground events and police action occurred.

User roles and their evolution

Our second experiment aims at investigating what roles users played in the Gezi Park conversation and how they exercised their influence on others. We also seek to understand whether such roles changed over time, and, if so, to what extent such transformation reshaped the conversation.

The dark cells along the diagonal indicate that most users have a balanced ratio of ingoing and outgoing ties. Users below the diagonal follow more than they are followed. Note that most users are allowed to follow at most 1000 people. Finally, above the diagonal, we observe users with many followers. Note the presence of extremely popular users with hundreds of thousands or even millions of followers.

The y-axis shows the ratio between number of followees and followers of a given user; the x-axis shows the ratio between the number of retweets produced by a user and the number of times other users retweet that user. In other words, the vertical dimension represents social connectivity, whereas the horizontal dimension accounts for information diffusion. We can draw a vertical line to separate influential users on the left (i.e., those whose content is most often retweeted by others) and information consumers on the right (those who mostly retweet other people’s content). Influential users can be further divided in two classes: those with more followers than followees (bottom-left) and those with fewer followers (top-left), which we call hidden influentials. Similarly, information consumers can be divided in two groups–rebroadcasters with a large audience (bottom-right), and common users (top-right). It shows a static picture of aggregated data over the 27-day observation period. To study how roles evolve as events unfold, we carried out a longitudinal analysis whose results are provided in next figure.

First, we observed that the classes of information producers (influentials and hidden influentials) are relatively stable over time; together they include more than 50% of users every day, suggesting that many individuals in the conversation had large audiences, and the content they produced was heavily rebroadcasted by others (information consumers as well as other nfluentials). On the other hand, information consumers show strong fluctuation: starting from an initial configuration with stable roles (May 29–31), common users and rebroadcasters subsequently exhibit large aggregate displacements in the role space (June 1–4). We also note a redistribution of the users in each role: at the beginning of the protest a large fraction represents common users and rebroadcasters, while, as time passed and events unfolded, these two classes shrank. This suggests that common users and rebroadcasters acquired visibility and influence over time.

Online behavior and exogenous factors

Our concluding analysis focused on the way on-the-ground events affected online user behavior.

Many users changed their screen names five or more times. This was an unusual observation that attracted our attention.

Among the many users who changed

screen names, this chart plots the fractions who

adopted different nicknames over time in respons

to external events.

Conclusion

Our analysis of the spatial dynamics of the communication brought two different interesting findings. First, we observed that the discussion about Gezi Park events spread worldwide, and a sustained number of tweets was produced over time outside of Turkey — in Europe, North and South America. International attention was underscored by trending hashtags related to Gezi Park at the worldwide level. Second, we observed that local trends followed geographic and political patterns. Among the 12 cities whose trends we monitored, three clear geographic clusters emerge. This result is consistent with our recent analysis of geospatial spreading patterns of Twitter trends Focusing on users, we identified four types of roles (common users, rebroadcasters, influentials and hidden influentials). We tracked their evolution over time as events unfolded. As time passed, the discussion about Gezi Park became more democratic, with an increased number of influential users. Our analysis concluded by studying an effect of real-world events, such as political speeches, on online user behavior. We found that individuals responded to such external provocations by exhibiting collective actions, namely the change of their Twitter screen names to reflect sobriquets attributed to them by their political leaders.

Reference:

Center for Complex Networks and Systems Research School of Informatics and Computing, Indiana University, Bloomington, USA

http://cnets.indiana.edu/blog/2014/06/26/best-paper-award-at-websci14/

Wednesday, February 25, 2015

Multi-Ego-Centred Communities

In social networks, communities are groups of people who share common features. Given a set of nodes, the most common way of identifying communities consists in classifying them into classes which may be predefined or not. This is what traditional classification and clustering approaches do respectively.

In the context of real-world graphs, community detection generally aims at finding a partition of nodes, i.e. disjoint communities where each node belongs to exactly one community. However it is unrealistic to apprehend that a user belongs to only a single community. Normally a person is part of several overlapping groups. Overlapping communities therefore should be allowed. However computing all overlapping groups leads to numerous problems including time and space complexity of the algorithm and difficulty in the interpretation of obtained results.

An interesting compromise consists in focusing on the groups related to one specific node, referred to as ego-centred communities. Despite promising initial results, ego-centred community detection is still a difficult problem because a single node can still belong to numerous groups. Therefore, communities identified by a set of nodes i.e multi-ego-centred communities have gained primary focus. It has been established that for real-world graphs, a small set of nodes is generally sufficient to define a unique community, which is not the case with a single node usually.

A recently developed proximity measure for ego-centred communities is the carryover opinion metric. For a given node of interest, the framework consists in first setting the opinion of this node to one and the opinion of all other nodes to zero. Then, at each time step, the opinion of every node is averaged with the opinion of one of its neighbours. The opinion of the node of interest is then reset to one. Thus, its opinion does not change throughout the process and remains equal to one. The speed of convergence is used to characterize the to what extent nodes are similar to the node of interest.

Two conjectures are needed to carry on:

Conjecture 1: after a sufficient number of iterations, the ranking of nodes according to their opinion no longer changes.

Conjecture 2: after a sufficient number of iterations, the difference between the opinion of two nodes decreases proportionally to the difference between the opinions of any other two nodes.1

Clearly this measure takes into account the whole depth of the graph, is parameter-free and is fast to compute.

The question is: how may the communities shared by a set of nodes be unfolded? The idea is that a node belonging to both a community of node1 AND a community of node2 has to be somewhat similar to both node1 AND to node2. The following steps are taken:

1) For all nodes, evaluate the proximity to node1 and to node2.

2) The proximity to the set {node1, node2} is then given by the minimum, or by the geometric mean of the similarities to node1 and the similarities to node2. This quantity measures to what extent node is close to node1 AND node2.

The method is easily generalizable to a set of more than two nodes.

In order to identify the potential community centred on both node1 and node2, the minimum of the carryover values obtained from node1 and from node2 for each node, u , of the graph is computed. The minimum value of both scores is used to measure the belonging of u to the community of (node1 and node2). These minimum values are sorted and the minimum carryover curve is plotted. An irregularity in the decrease, i.e. a plateau followed by a strong decrease, indicates that all nodes before the decrease constitute a community of (node1 and node2). The automated detection of this plateau/strong decrease pattern can be done by searching for the maximum slope and keeping the outcome if the slope is larger than a given threshold.

There are other methods for finding ego-centred communities, all of them based on the optimisation of a quality function. Quality function techniques, due to the non-convexity of the optimisation problem, often lead to small communities, while this approach does not suffer from this drawback.

References:

http://www.complexnetworks.fr/wp-content/uploads/2012/06/paper5.pdf

http://www.complexnetworks.fr/wp-content/uploads/2014/04/chapterFinal-REVIEWED-v2.pdf

Gene Networks and Gene Clustering : Recent Advancements

Human genetic clustering analysis uses mathematical cluster analysis of the degree of similarity of genetic data between individuals and groups to infer population structures and assign individuals to groups that often correspond with their self-identified geographical ancestry. Gene Network is the complex network relating genes of people around the world.

We later illustrate that similar techniques can be used for analyzing Human Genetic Disease Networks which are of great use for mankind.

Current state of art:

A Gene Network is an example of a Small World Network. So first of all I want to discuss about what exactly a small world network is and then i will discuss how we can think of a network to get some highly demanded applications .

"A small world network is a class of random graphs where most nodes are not neighbors of one another, but most nodes can be reached from every other by a small number of hops or steps (Perhaps the best known example of this is the 'six degrees of separation' feature discovered by Stanley Milgram in 1967.) . A small world network, where nodes represent people and edge connect people that know each other, captures the small world phenomenon of strangers being linked by a mutual acquaintance. The social network, the connectivity of the Internet, and gene networks all exhibit small-world network characteristics. The small world phenomenon (also known as the small world effect) is the hypothesis that everyone in the world can be reached through a short chain of social acquaintances. The small world problem asks for the probability that two people picked at random have at least one acquaintance in common."

http://en.wikipedia.org/wiki/Watts_and_Strogatz_model

Six degrees of separation.

http://en.wikipedia.org/wiki/Six_degrees_of_separation

Now lets discuss few ideas on how we will construct gene networks.

Considering people as nodes the degree of similarity between the genes will determine the edge construction between nodes.

It has been found that there will be clusters of people together based on community and regions and we can analyze the inter-community relationships. We can guess/determine the origin of a particular community by these relationships. Apart from this we can find traits that are common like certain regions average height is bigger than other regions. Analyzing these patterns and relationships can reveal some interesting results.

Recent Advances have made Gene Clustering More Practical:

1) More than 200,000 people have already had their genomes sequenced as per 2014, a leader in this field a silicon valley company, Illumina says that this year 228,000 will sequence their DNA. More sequencing being we have more data to recognize patterns.

2) One argument for quick action is that the amount of genome data is exploding. The largest labs can now sequence human genomes to a high polish at the pace of two per hour.(The first genome took about 13 years)

3) The cost of genome sequencing has gone down from order of billion dollars to few thousand dollars, see the graph below :

http://www.genome.gov/sequencingcosts/

Few more implications / Proposed applications of gene networks :

Suppose friend of mine Rahul is suffering from a genetic disorder without a name. Doctors have tried to solve the case but are finding it difficult to cure him. There might be people around the who world have suffered with similar disorder but how to know them ? Since the disorder is genetic hope that we could find related genetic data. In this case clustering/gene networks might help to find similar patients. After finding similar patients doctors will have idea what was the history of the patient who suffered from the similar how were they cured or what was their average life etc. Recent studies have shown Human Disease Network and Disease gene network as shown below are useful( found them in Nature Magazine) .

Human disease network (HDN).

http://www.nature.com/scitable/resource? action=showFullImageForTopic&imgSrc=/scitable/content/6413/10[1].1073_pnas0701361104-fig2a_full.jpg

In the HDN, each node corresponds to a distinct disorder, colored based on the disorder class to which it belongs, the name of the 22 disorder classes being shown on the right. A link between disorders in the same disorder class is colored with the corresponding dimmer color and links connecting different disorder classes are gray. The size of each node is proportional to the number of genes participating in the corresponding disorder (see key), and the link thickness is proportional to the number of genes shared by the disorders it connects. We indicate the name of disorders with 10 associated genes.

Disease gene network (DGN).

http://www.nature.com/scitable/resource?action=showFullImageForTopic&imgSrc=/scitable/content/6437/10[1].1073_pnas0701361104-fig2b_full.jpg

In the DGN, each node is a gene, with two genes being connected if they are implicated in the same disorder. The size of each node is proportional to the number of disorders in which the gene is implicated. Nodes representing genes with links to multiple classes are colored dark grey, whereas unclassified genes are colored light grey. Only nodes with at least one link are shown.

References :

http://en.wikipedia.org/wiki/Human_genetic_clustering

http://www.nature.com

http://www.genome.gov/sequencingcosts/

https://books.google.co.in/

Wednesday, February 4, 2015

Complex Network Modeling in Maritime Search and Rescue Operation

Complex Network is is class of complex system models which describes the dynamical set of objects. Their structure varies with time and they can be defined in probabilistic terms. Now we can find this class of complex system in various fields. Perhaps the most popular network that appears when the term 'complex network' comes is social network which may be online as well as offline. There are few others like citation networks, telecommunication networks and so on. But yes there are other systems which are similarly complex in it's structure and dynamics. One such scenario is the collective behavior of floating drifters at sea.

Drifting floaters are one of the major subjects of Search and Rescue ( SAR ) operations. One option is a rescue of lost vessels’ crew members. Drifting and visibility are important factors for such kind of rescue operations. Another option is a search for lifeboats, washed off containers or even World War II mines. Such drift ing objects are hazardous for shipping.

Now by floating drifters we mean a group of object that is floating in the sea with a visible distance . In context of SAR we can consider the plane wreckage floating in the sea as floating drifters after the plane is crashed into the sea.

Let's consider there are V no of floating drifters in the sea. Each drifter can be visible within a given range. Let us consider the line of sight between two floating drifters as an edge E. Now this network evolves with time as position of each drifter is changing with the motion of waves and winds. So orientation of entire graph is getting changed with time. With the periodic motion of wave, a drifter can go out of sight and thus loosing an edge or it may comes back in sight and thus creates an edge. The set (V, E) forms a Complex Network that evolves in time. Initially the structure of Complex Network may be rather regular, but at later time the network structure is in destruction due to irregular wind, wave and current loads.

In the above picture we can see how the network of floating drifters have evolved over time. At time t1 it was very regular in nature and as time elapses the orientation of the network changes due to the effect of waves, winds and current. See at time t3 it becomes a graph of 2 components and in a later time they can be joined again by a link if a drifters come into the visible sight.

Now the sea surface can also be modeled with the superposition of finite number of harmonic waves. The disturbance of waves is determined by a two-dimensional disturbance energy spectrum S(ω,θ), where ω — the angular frequency of the wave and θ — the angle between the direction of the running wave and wind direction.

There are various models of the spectra of disturbance expressed in terms of the energy density function of the angular frequency and direction of propagation . An example of such a spectrum is a spectrum of Pierson-Moskowitz. The continuous spectrum of disturbance can be approximated by a finite number of harmonics. In order to provide a quick generation of harmonic heights and velocities, the two-dimensional Fast Fourier Transform (FFT) can be used. With the help of all these mathematical tools the planer displacement and velocities of floating drifters and water particles can be obtained. These attributes help us to understand the dynamics of the floating drifters.

Application: In Search and Rescue Plans

Considering all the factors including direction of wind, wave, gravity the probable trajectory of the floating drifters can be approximated which helps in rescue operation. Because for example when a plane crashes in mid air and all its wreckage falls into the sea from that high altitude then the wreckage will be spread across miles over the sea. So the helicopter needs to follow the direction of the main wave initially and after that it needs to predict to some extent about the probable location of other parts of the plane considering all the natural parameters like wind, sea wave, current etc.

Again from the above picture say for example due to strong waves some containers have been washed away from the vessel. And after a time the crew has noticed this and sends a helicopter for the rescue of the containers. Now the helicopter starts to fly in the direction of the main wave with a fixed velocity until a cargo is spotted. The helicopter then continues the search for another closest visible cargo or it can again go in the direction of the main motion. Now it should be remembered that high velocity and large visibility range does not guarantee containers rescue because greedy algorithm can run into wrong decision that is critical for Travelling Sailsman Problem.

Reference:this link

Community Detection in Time-Varying Network

Tracking evolution of communities in a time-varying network is a key problem for which many algorithms have been proposed and developed. Mentioned below are few of them.

1.1 Classical Approach

The basic idea of this approach is to detect temporal clusters at each time step and then establish relationships between clusters for tracking community evolution over time. This approach is also known as two-stage approach. Figure (1) below illustrates the result of applying a longitudinal approach to a dynamic network across three time steps.

In first phase, clusters at each time step are detected: at time t, there are two clusters, and then there are three clusters at time t + 1 and four clusters at time t + 2. In second phase, the relationship between clusters at different time steps is established, which is shown by colors. Through the above results, it can be seen how the community structure of this graph evolves from the time step t to the time step t + 2. For the first phase, a graph clustering algorithm can be applied while for the second phase, a suitable matching metric can be used.

Different metrics can be used to track the community evolution. A matching metric is a similarity function, which measures how similar two communities are.E.g. Hopcroft et al.[2] used following match function. Let C and C' be two clusters, their match value is written as follows: match(C;C') = min { (C intersection C' / C), (C intersection C' / C')}

The definition ensures that a high matching value (close to 1) occurs when two clusters have many common nodes and are roughly of the same size. In [1], Palla et al. defined a relative overlap between communities at successive timestep. This is also known as Jaccard Index. The relative overlap value between two communities X and Y is written as follows: J(X; Y ) = {X intersection Y} / {X union Y}.

Two communities are matched if they share the highest matching value. Thus matching metric is a natural choice to connect temporal clusters over time. However, it may lead to noisy results where some nodes often change their community memberships. In those cases, some advanced approaches (e.g. Core-based methods, Union graph-based methods) are used.

1.1.1 Core-Based Methods

If a partition is significant, it will be recovered even if the structure of the graph is modified, as long as the modification is not too extensive. Instead, if a partition is not significant, we may observe that a minimal perturbation of the graph is enough to disrupt its group memberships. A significant cluster, that is, a significant group of nodes, is often defined as a community core. We can reduce noisy results by matching community cores. This is the main principle of core-based methods. The matching metric (as described in last section) is often applied. Two temporal clusters are matched if their community cores share the highest similarity value.

In [2] Hopcroft et al. have proposed the concept of natural communities, which are significant clusters that have high stability against modification of graph structure. Given a temporal graph, by applying 5% of perturbations, a set of modified graphs are produced, each of which has 95% of core nodes. Each natural community is identified by the partitions corresponding to these modified graphs, which has the best match value with clusters in those partitions.

Although core-based approach can smooth variances caused by peripheral nodes, its results still suffer from some limits such as the parameters used in matching metrics. In addition, if we only track evolution of community cores, there is a risk of missing important structural changes which are related to peripheral nodes.

1.1.2 Union-Graph-Based Methods

Another important early work [1] by Palla et al.for detecting community evolution is related to the union graph. Figure (2) depicts the concept of union graph. Each union graph merges two graphs (union of their links) present at contiguous time steps.

The basic idea of this algorithm is to construct a joint graph consisting of union of links for every consecutive snap-shot t and t+1 and extract CPM [9] community structure of this joint network. As by adding links to a network, the CPM communities can only grow, merge or remain unchanged, any community from either the t or the t + 1 snap-shot is contained in exactly one community in the joint graph. Thus, the communities in the joint graph provide a natural connection between the communities at t and at t + 1. If a community in the joint graph contains a single community from t and a single community from t+1, then they are matched. If the joint group contains more than one community from either time steps, the community pairs with highest relative node overlap [1] are considered. The technique is validated by applying it to two social systems: a graph of phone calls between customers of a mobile phone company over one year and a collaboration network between scientists spanning a period of 142 months. The main disadvantage of this technique is that the CPM algorithm fails to detect community structure of networks with few cliques. In addition, some parameters are used to determine how community changes due to the application of similarity metric.

1.1.3 Conclusion

The algorithms discussed are two-stage approaches, which at step (1) detect clusters at each time step independently of the results at any other time step and at step (2) infers relationships between clusters at different time steps. Such natural process often produces signi cant variations between partitions that are close in time, especially when the datasets are noisy. Since the first phase is independent of the past history, smooth transitions are impossible.

1.2 Evolutionary Clustering

An evolutionary clustering approach follows a principle of detecting community structure based on the current graph topology information at a given time t and on the community structure at previous time steps. In [3], Chakrabarti et al. has proposed the outline of the evolutionary algorithms.

At each timestamp t it produces a clustering C(t) of the objects seen so far
For clustering, agglomerative hierarchical clustering , k-means clustering approach can be used
Distance from C(t) and C(t-1) will be evaluated
Snapshot quality of C(t) will be evaluated against the objects that appeared in timestamp t

The overall quality of the partition C(t) at timestamp t is a combination of current snapshot quality and history cost. Obviously, to obtain a good partition it should have high snapshot quality data for current timestamp and at the same time, it must have low history cost i.e. the partitions are aligned with earlier timestamps.

1.2.1 Conclusion

There exist many other evolutionary clustering approaches. Chi et al. [4] used similar ideas and proposed the first evolutionary spectral clustering algorithms. They introduced graph cut as a metric for measuring community structures and community evolution. For instance, information theory has been used to detect community evolution in dynamic graphs. Sun et al [5] applied the MDL to nd the minimum encoding cost to describe a time sequence of graphs and their partitions into communities. The basic principle of this method is to encode the graph topology into a compression information with the minimum cost of the description. This method enables to provide meaningful information on community evolution. As opposed to two-stage approaches, evolutionary clustering does not encounter the matching problem. However, these methods are using parameters. Also, results of the evolutionary clustering are generally too strongly correlated with community history which may occult structural changes.

1.3 Coupling Graph Clustering

Coupling graph clustering approach is based on a coupling graph as shown in Figure (3) A coupling graph is a graph linking a sequence of graphs over several time steps by adding coupling edges between the same nodes at different time steps. Given a coupling graph, a subgraph which describes all interactions at a specific time step is called a slice. The underlying idea is once the coupling graph built (encompassing the time dimension as edges) to use an efficient standard static community detection heuristic.

The approach was first used by Jdidia et al. in [6] by building a temporal graph and then applying classical community detection algorithm Walktrap [7]. Mucha et al. [8] proposed a coupling graph based approach by optimizing a modified modularity. The modified modularity balances the contribution of community memberships to each slice and the cost for changing community memberships. The major advantage of this algorithm is to smooth community evolution.

1.3.1 Conclusion

This idea of coupling graph clustering simplifies the problem of detecting community evolution. However, it introduces the problems on construction of coupling graphs: how to assign weight on coupling edges? What is the length of coupling window ? These questions make the use of the coupling graph based approaches limited.

1.4 Summary

As observed, that each of above-mentioned approaches has it's own limitations. Two-stage approach aims at exhibiting community partitions from successive graph snapshots and thereafter connecting or smoothing these partitions using clever time-dependent features and sampling techniques. These approaches are nonetheless achieving longitudinal rather than dynamic community detection. Communities are fundamentally defined by the repetition of interactions among a set of nodes over time. According to this definition, analyzing the data by considering successive snapshots induces a significant loss of information and essential dynamic information is blurred such as communities based on repeated inter-temporal interactions, nodes switching from a community to another across time, or the possibility that a community survives while its members are being integrally replaced over a longer time period. As opposed to two-stage approaches, evolutionary clustering does not encounter the matching problem. However, most methods are using parameters. Furthermore, in evolutionary clustering results are generally too strongly correlated with community history which may occult structural changes. As a result to overcome these problems some sort of formalism is done in the context of time-directed datasets (such as citation networks) and this gives rise to representation of data differently and temporal communities are extracted from these data.

References

[1] Gergely Palla, Albert-Laszlo Barabasi, and Tamas Vicsek. Quantifying social group evolution. Nature, 446:664{667, April 2007.

[2] John Hopcroft, Omar Khan, Brian Kulis, and Bart Selman. Tracking evolving communities in large linked networks. Proceedings of the National Academy of Sciences, 101:5249{5253, April 2004.

[3] Deepayan Chakrabarti, Ravi Kumar, and Andrew Tomkins. Evolutionary clustering. KDD '06. ACM.

[4] Yun Chi, Xiaodan Song, Dengyong Zhou, Koji Hino, and Belle L. Tseng. Evolutionary spectral clustering by incorporating temporal smoothness. In Pavel Berkhin, Rich Caruana, and Xindong Wu, editors, KDD, KDD '06. ACM.

[5] Jimeng Sun, Christos Faloutsos, Spiros Papadimitriou, and Philip S Yu. Graphscope: parameter-free mining of large time-evolving graphs. In Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 687{696. ACM, 2007.

[6] Manel Ben Jdidia, Cline Robardet, and Eric Fleury. Communities detection and analysis of their dynamics in collaborative networks. In Dynamic Communities: from Connectivity to Information Society workshop co-located with ICDIM'07, pages 11{17, Oct 2007.

[7] Pascal Pons and Matthieu Latapy. Computing communities in large networks using random walks. In Computer and Information Sciences-ISCIS 2005, pages 284{293. Springer, 2005.

[8] Peter J. Mucha, Thomas Richardson, Kevin Macon, Mason A. Porter, and Jukka-Pekka Onnela. Community structure in time-dependent, multiscale, and multiplex networks, 2009.

[9] Gergely Palla, Imre Der enyi, Ill es Farkas, and Tam as Vicsek. Uncovering the overlapping community structure of complex networks in nature and society. Nature, 435(7043):814{818, 2005.

Human health cure as a complex network

Science is a mixed field of knowledge. The most important science for our regular life is the Medical Science . The aims of Medical Science are - 1.examine the signs and symptoms based on the complain of the patients and knowledge base of doctor, 2. draw a differential diagnosis 3. Do tests(blood,Cerebrospinal fluid,X-Rays,MRI,CT-Scan etc) to make a confirm diagnosis of the cause, 4. preventing disease from spreading and healing of the damaged cells, maintain proper balance between all ions and materials which consist our physical entity ; most importantly to save lives and makes us healthy.

Complex network is a different entity than other complex mathematical models-lattices or random graphs. Complex networks display non-trivial topological features, with patterns of connection between their elements that are neither purely regular nor purely random like - heavy tail in the degree distribution, a high clustering coefficient, assortativity or disassortativity among vertices , community structure, and hierarchical structure,reciprocity, triad significance profile and other features. Medical science is an example of complex network where some parts are classical- obey all the rules,some parts do not,some parts are totally unknown.

The functionality of every living being is the resultant of interactions of inter-cellular and intracellular components and between external and internal components. From the resent research it is known that the potential complexity of human interaction network is daunting-approx. 25000 protein encoding genes,1000 metabolites,undefined number of distinct proteins,functional RNA & DNA molecules. These also varies according to sex type, age(children or aged or young) , geographical area ,local environment,race , family history etc. Now at every moment reactions are taking place in between external disease agents and internal cellular components- complex networks can be developed over such a system.

Now the nodes of the network can be following -

1. Examine the signs and symptoms

- after taking patients complain about problems following things are checked-

a) Taking the history of the patients properly –

i) Personal History- patients habits,sleeping style,eating,bowel and bladder habits.

ii) Family History – Some diseases(HIV,Hepatitis,Gene abnormality) spreads through genes so family history(Parents,Grandparents,Siblings,other family members) is very important.

iii) Surgical History : whether patient had a history of surgery which can recurrent or cause secondary abnormality- eg. History of vertical abdominal incision(in abdominal surgery) can cause incisional hernia.

iv) Traumatic History: Traumatic history can cause memory loss(brain trauma) , Rupture of an organ of body (spleen rupture in an accident which can change body functionality) etc.

b) General survey – temperature(fever), pallor, cyanosis,jaundice,clubbing,edema, swelling,pulse,Blood pressure etc.

c) Local examination - local examination of body parts eg- local patch/discoloration in skin diseases, tenderness(pain by touch)in right iliac fossa in young females can be due to acute appendicitis.

2. Differential Diagnosis:

These all features can act as nodes of the network to make a initial differential diagnosis of the cause of the patients.

3. Confirm Diagnosis :

Initial differential diagnosis lead to surest diagnosis by ruling out some differential diagnosis by collecting features from tests and knowledge base of the doctor:

i)Blood test : hemoglobin (level decrease suggesting anemia/blood loss/leukemia etc) level, wbc count (infection level), RBC count , peripheral smear test , sodium ,potassium other ion tests etc.

ii) X-Ray : shows bone fractures or cracks or used as a diagnosis of pneumonia, lung consolidation, abdominal distention etc

iii) CT-Scan – diagnosis of emphysema, fibrosis of lungs, brain trauma, brain diseases.

iv) MRI – cardiac diseases, Brain diseases,liver diseases

v) ECG : for diagnosis of cardiovascular systems - myocardial ischemia , cardiomyopathies, heart attacks etc

There are also other useful tests urine tests, EEG, CECT, Esophagoscopy etc etc.

Now depending on these there are physiological,pathological,microbiological,biochemical etc changes are there in cellular components which all can act as nodes and connection between them can act as edges – so these all help to create part of the total complex network.

4. Preventing diseases from spreading :

Disease prevention can be done by

a) vaccination:

I) schedule vaccination of children : Oral polio vaccination, measles , BCG etc routinely.

ii) other vaccination: hepatitis vaccination, DNA vaccination,immunization

b) Medicines :

Proper medicines are used in optimum dose to prevent diseases from spreading (Antibiotics are used to cease bacterial growth, Anti-fungal to cease growth of fungus etc)- at every interval of taking doses physiological,pathological,microbiological,biochemical conditions of cellular components are to be checked by sign symptoms and tests.

c) Surgical :

Operative treatment are necessary for to cure certain diseases – cholesystectomy(removal of gall bladder) for gall stone diseases, appendicectomy for appendicitis , orthopedics operations for bone fractures, bone marrow transplantation etc.

d) Radiotherapy: can be given to cure cancer diseases and tumors etc.

In the network 1,2,3,4 are the components within which a,b,c or I,ii,iii etc can act as another component and features are the node.

In all the above cases always you have to aware of the status of human being as a whole – whether situation improving or not by the above mentioned features and cell conditions. So if situation does not getting well or deteriorates then changing of treatment modes are necessary.

So we can think of an intelligent complex network systems which consist of nodes (examinations of sign-symptoms,tests for diagnosis, preventing diseases from spreading and healing of the damaged cells, maintain proper balance between all ions and materials which consist our physical entity ).

Now edges between these nodes determine the overall features present in a particular human body or race or a geographical area. Value of the nodes or edges determine the value of those features and these make a linear/nonlinear dependence equation- it value is the overall situation of a particular human being in a particular condition. There are so many factors which may be undetermined but can change the overall result significantly- these values can be estimated by other node or edge values of the Complex network but most of the times prediction of these values with high probability are really difficult and need more knowledge base and data value. This can be a significant challenge for modern world to develop this type of intelligent complex network for Human Disease.

For more reference here1
here2