My PhD thesis in ICTEAM: Detecting Communities


Every day, we make many choices, like which
TV show to watch, or our holiday destinations. And to do so, we often follow advice from
people within our communities, such as our friends, our family and colleagues. Nowadays, more and more of such recommendations
are also being provided by computers. But how can a computer detect to which communities
I belong, so that appropriate recommendations can be
made? As an applied mathematician, one way to study
this problem is to use graph theory. A graph is a mathematical representation of
a network, composed of nodes, connected by edges describing their pairwise interactions. Graphs are used to represent many complex
systems: power grids, food webs, communication networks and so on. One interesting property of many graphs is
that nodes tend to form dense, highly connected clusters. Those dense structures correspond to the communities in your network, and arise naturally from
social behaviors. But the identification of such communities
is a real challenge and was one of the main topics of my PhD thesis. To tackle this problem, you first need to
define a measure that quantifies, given a partition of your graph, how good it is. For
example, you might be interested in computing the density of the partition. Then, you apply a procedure to find the best
possible partition, in general maximizing your quality measure. The real challenge is speed, we must design
methods that are sufficiently fast to be applied on real networks with millions of nodes and
edges. In my thesis, we proposed a new algorithm
that has many benefits: it can be applied to optimize any quality measure, it has a
low computational cost and is highly parallelizable. We start with an initial configuration of
nodes where they are all independent one from another. The first step is to let each node choose its best neighbor, the one for which merging
both nodes into a single community will lead to the largest increase in the density. This creates a new graph with edges that represent
those selections. By construction, this resulting graph contains
many small structures, called weakly connected components, and we simply define the communities
as the sets of nodes spanned by each of those connected components. Then, we apply different rules to update those
assignments, allowing the system to explore the space of possible solutions. Those corrections are decided locally, to
keep a low computational cost, but may still have a dramatic impact on the global structure
of the communities After a small number of iterations, our algorithm
converges to a stable and locally optimal solution. When compared to the state-of-the-art, our
method is able to extract communities of similar quality, but it is much faster. In networks with 1 million nodes, communities
are detected on average in 300 seconds, which is up to 20 times faster than with other software. This has many practical applications. For
example, a provider of video-on-demand can analyze its network of users linked to the
movies they watched. With our approach, we are now able to detect
communities in the underlying graph, grouping individuals based on the type of movies they
mostly enjoy. The result of this is to be able to make efficient
recommendations for the movies that you have not seen by looking for missing edges within
your community. Actually, community detection is only one
part of the story and we also analyzed the more general problem of role detection, but
if you are interested in that topic, I’ll let you discover that in my thesis.




Comments
  1. very good way to explain.. can you share your email id or whatsapp number through which i can clear my doubts

Leave a Reply

Your email address will not be published. Required fields are marked *