Monday, May 23, 2011

Data Structures, Data Structures, Data Structures...

When I met with my mentor, Aric, last week to talk about the project, the topic that required the most thought surprisingly was: What data structure a community detection algorithm should return?

I had initially thought that a dictionary keyed by node containing a nodes found community would be sufficient, but Aric asked the obvious questions, what about overlapping communities, hierarchies, link communities?

Re-reviewing some of the methods in my proposal as well as some other methods we might want to include shows that communities on a network can have just as complicated a structure as the network itself. For this reason I've settled on a custom Python class (Communities) to store the information.  It should be easier to abstract some of the important functions in a class like this (is a node a member of a specific community, do two communities overlap?), rather than having the user try to figure out how some combination of iterables I've come up with is supposed to go together. Check out the class in the repo.

Bitbucket repo

Being that coding officially starts today, I've forked off the mainline branch of networkx to work exclusively on community detection algorithms. You can find the repo here.

Wednesday, April 27, 2011

Implement Community Detection Algorithms in NetworkX in GSoC!

Rejoice!

My second application has been accepted into Google Summer of Code! This means better community detection algorithms in NetworkX are coming!

Tuesday, April 12, 2011

Patching NetworkX

As part of the Python Software Foundations requirements for Prospective students, I've been diving back into the NetworkX code base. I've contributed patches into NetworkX before, (as detailed in my proposals), but I had been slacking a little over the school year with class work and research pressing down. It was nice to find a request on the mailing list and dive into a ticket. The request concerns producing various graph products, and can be found waiting for inclusion (and for me to make a real patch out of it) here. It wasn't terribly hard to code up, but Aric (the project maintainer) reminded me of a few bad habits I tend to have when coding for myself.
  1. Factoring my code into every smaller functions. Python apparently has fairly high function call overhead, so calling a function is a loop is worse than having a function which loops.
  2. Remembering all the possible use cases. I tend to work on Graphs without too many attributes, so remembering other people might use what I code in different ways is important.
  3. Test cases. While the code only took me about an hour to generate, thinking about effective ways to test it took a bit longer. I like settling on random networks and checking to make sure the graph product conditions are met.
Too bad this patch wasn't totally relevant to my GSoC proposal, but I am sure there will be plenty of opportunities for me later.

Thursday, April 7, 2011

Specificity and Proposals

It was suggested that my community detection proposal include specific algorithms I am planning to implement. I have rewritten it accordingly. Perhaps it's just my style, but I liked the open ended feel of the original proposal. I think its primarily a style thing, but talking about specific algorithms makes the proposal feel dry, while leaving it open ended let me talk about interfacing with the community and how it would benefit NetworkX. I suppose it's all moot if it gets accepted, NetworkX benefits either way…

Wednesday, April 6, 2011

Lessons learned from OpenGL NetworkX Visualizations and my current proposal.

Shortly before the Google Summer of Code announcement I commenced writing an openGL NetworkX visualization function. I was inspired by Ian Johnson who had an interesting post about about using openCL in python and had a great class for making a 3D OpenGL scene. I had also recently come across some network data that I wanted to display in 3 dimensions, and nothing I found could easily do so. So I dropped most everything else I should have been doing (apologies to my advisor), and went to work. I posted the result to the NetworkX mailing list a few days later. The code certainly works, and for modest sized, sparse graphs (<10000 nodes) it does a fair job of displaying network data in 2 or 3 dimensions.
I was of course thrilled to see that one of the GSoC suggestions for NetworkX was to improve visualization support. I was convinced I was a perfect fit, and began hammering out a proposal and thinking about how the project might go. And as I did, I slowly began to realize how I might have gone about things better.
  1. The world probably doesn't need another Network visualization tool. Tools like Gephi and cytoscape produce some pretty amazing results, and more general tools like matplotlib and mayavi even if they aren't designed to display networks handle all the under the hood graphics and speed optimizations.
  2. Interactivity is important. Because of how I set up OpenGL, or perhaps limitations of PyOpenGL (I am certainly no expert), my work required everything to be set up ahead of time. This makes it hard to visualize graph generation algorithms in real time, or change colors based on paritions, like in gephi, or do anything fun other than rotate around a scene.
  3. The model wasn't general enough. Rather than just translating directly from a NetworkX class to OpenGL, I should have thought about how to generally interface with a drawing system. This is reflected in the proposal to develop the visualization engine.
After some discussion with the NetworkX developers, I am going to retool my proposal to focus more on an interface with Gephi as well as improvements to the matplotlib drawing functions. Hopefully this will be of some more use.

Tuesday, April 5, 2011

Implementing Community Detection Algorithms in Network

I just finished my second proposal(html, pdf). This is a topic I've seen mentioned on the NetworkX mailing list a number of times, and was one of the first things I was interested in when I started studying complex networks. Hopefully, we can bring together a set of algorithms from the social networks and the physics community.