Monday, May 23, 2011

Data Structures, Data Structures, Data Structures...

When I met with my mentor, Aric, last week to talk about the project, the topic that required the most thought surprisingly was: What data structure a community detection algorithm should return?

I had initially thought that a dictionary keyed by node containing a nodes found community would be sufficient, but Aric asked the obvious questions, what about overlapping communities, hierarchies, link communities?

Re-reviewing some of the methods in my proposal as well as some other methods we might want to include shows that communities on a network can have just as complicated a structure as the network itself. For this reason I've settled on a custom Python class (Communities) to store the information.  It should be easier to abstract some of the important functions in a class like this (is a node a member of a specific community, do two communities overlap?), rather than having the user try to figure out how some combination of iterables I've come up with is supposed to go together. Check out the class in the repo.

Bitbucket repo

Being that coding officially starts today, I've forked off the mainline branch of networkx to work exclusively on community detection algorithms. You can find the repo here.