Visualizing Graphs with Pajek

by November 3, 2010 0 comments

-Anandarup Mukherjee, Sumit Goswami

Pajek is a software tool developed to have large network visualization. The database representing the network which could not be visualized earlier is still in machine readable form. To have an insight into such networks we prefer visualizing them in a graphical form. This gives an advantage of studying the relations, differences and similarities amongst different communities formed in the visual layout. Pajek provides tools for analyzing and visualizing networks such as citation networks, Internet, social networks etc.

‘Pajek’ is a word from Slovenian language, which means Spider. It is a free open source software and can be downloaded from the Internet. The software is used offline and helps in visualization,simplification and optimization of networks, be it large or small. It takes data in the form of .NET, .TXT, .MAT, etc file formats. The graph created from the input data can be viewed in various ways and optimized for ease of understanding using the inbuilt optimization algorithms based on energy, eigen values, etc. The latest release of Pajek 1.28 includes various powerful algorithms like Kamada- Kawai (which is a force based algorithm for graph drawing), Fuchterman-Reingold algorithm, etc. The entire graph is simulated as if it were a physical system with edges behaving as springs and nodes behaving as electrically charged particles; the physical interpretation being that all systems are in mechanical equilibrium. The main advantages of these algorithms are simplicity, flexibility, intuitiveness and interactivity. Some of the disadvantages are poor minima and high running time as the number of nodes increases. Other useful features of Pajek include finding closest vertices, smallest angle, shortest/longest line, number of crossings, etc, which are highlighted in the draw window itself and corresponding numerical values are shown in the console window simultaneously.

Applications of PAJEK
The major applications of Pajek are to provide user with a powerful visualization tools, to implement selection of efficient sub quadratic algorithms for analytical purposes, to provide abstraction in order to decompose large networks into several smaller networks for analysis and get good-quality results (at least for graphs of medium size, up to 50-100 vertices).
Besides, applications of Pajek involve achieveing flexibility; force-directed algorithms fulfill additional aesthetic criteria and to achieve interactivity. By drawing the intermediate stages of the graph, the user can follow how the graph evolves, seeing it unfold from a tangled mess into a good-looking configuration. Pajek also supports multi relational networks, 2-mode networks (bipartite valued graphs) and temporal networks.

Implementation Platform
Pajek takes input in various formats as mentioned. For ease of understanding and availability, we take the .TXT as input in our implementation example. Prime requirements in the implementation of this software is a basic text editor such as NOTEPAD or any such editor for saving output file in a .TXT, and the Pajek software itself.

A system with high processing speed is recommended for large networks as the graph resolving algorithms tend to load up processor and the system eventually slows down. For beginners it is recommended, to take small graph inputs and then slowly increase it. Our experiments revealed that with an input file with 50,000 vertices on a system with 8GB of RAM; it took about 15 minutes to simply draw the initial mesh, without any resolving algorithms.

Data Objects in Pajek
An object is any entity that can be manipulated by the commands of a programming language, such as a value, variable, function, or data structure. Objects make it possible to handle very disparate objects by the same piece of code, as long as they all have the proper method. Pajek has six types of data objects. They are as follows:

Network: Default extension is ‘.NET’. It is used for visualization using vertices and lines. It can take input in four forms:1) Using Arcs/Edges (relation between various nodes are given in this format.) 2) Using Arcslist/Edgeslist (relation between various arcs formed are given in this format.) 3) Using Matrix (the relation between various nodes are given in matrix form, where presence of edge between two nodes is denoted by ‘1’ and absence by ‘0’4) UCINET, GEDCOM, Chemical formats etc (Genealogical Data COMmunication format, chemical formats for data input are used mainly in case of genealogical simplification like ‘family tree generation’ to find out how one person is related to his forefathers or other relatives ).

Additional information for network drawing can be included in input file as well.

Partitions: Tells us which vertex belongs to which class vertex. The default extension is ‘.CLU’.

Clusters: Give the subset of vertices, it assigns a set of observations into subsets (called clusters) so that observations in the same cluster are similar in some sense. The default extension is ‘.CLS’.
Permutation: It is used for reordering of vertices using basic mathematical permutations (i.e. rearranging data sequentially). The default extension is ‘.PER’.

Hierarchies: It shows the relation between ordered vertices on the basis of priority level (hierarchy).It starts from the root level (highest priority) and extends up to the remotest branch of the graph. The default extension is ‘.HIE’.

Vectors: It gives some numerical property to each vertex which would later on help in quickly figuring out the data just by looking at the vertex and performs operations based on the value of the vertex or its vector. The default extension is ‘.VEC’.

Generating a Sample Graph
Step-1: Input data in standard format
Input data in the appropriate format in the text editor similar to the figure given below, without any blank lines and save as .txt or .net. Open Pajek and browse for the file from the ‘Networks’ tab.

Step-2: Graph generation
Press CTRL+G (Draw) to generate the graph. As an example, we have taken a random network without any special meaning, just for the sake of understanding. After the graph has been generated, it can be optimized using various algorithms present in the draw window itself (Draw Window -> Layout -> Energy).

Step -3: Graph and relationship visualisation
The Draw window itself provides various useful and time-saving tools, which can provide lots of information about the graph, especially if it is a huge one. The results are printed on the console and the corresponding nodes are highlighted on the graph itself. (Draw Window -> Info)

Conclusion
The small yet powerful tool is very handy for network analysis and optimization and also is a freeware. The list for Pajek implementation is endless and limited only by human imagination. It is being used in social network analysis, communication network optimization, etc. Future trends point to large scale use of this software in the field of Chemical Technology, Bio-Technology, Genealogy, etc.

No Comments so far

Jump into a conversation

No Comments Yet!

You can be the one to start a conversation.

Your data will be safe!Your e-mail address will not be published. Also other data will not be shared with third person.