Large-Scale Graph Processing System

Main Members

 Toyotaro Suzumura  IBM Research Ireland
 Associate Professor, University Collage Doublin
 Anthony Ventresque  Researcher, University Collage Doublin  
 Elvis Liu  Researcher, University Collage Doublin  
 Derek Greene  Researcher, University Collage Doublin  
 Cristina Montanola  Researcher, Bercelona Supercomputing Center  
 Georgios Theodoropoulos  Researcher, Durham University  
 Stanislav Sobolevsky  Researcher, Massachusetts Institute of Technology  
 Miyuru Dayarathna  Researcher, Nanyang Technological University  
 Shin'ichiro Takizawa  Assistant Professor, RIKEN AICS  Individual
 Koji Ueno  Doctor's Course, Tokyo Institute of Technology  

Research Theme :
Development of a real-time, large-scale, graph-streaming system, and graph optimization libraries

The research focuses on carrying out research and development concerning the basic software for real-time and precise processing of large-scale graph data. In order to implement real-time analytic processing of data streaming, from a very large number of sensors, and precise analytic processing of graphs, we are developing (1) a real-time, large-scale, graph-streaming processing system, and (2) an optimization library for large-scale graphs. These two processing systems complement each other. The former system emphasizes real-time processing over precision, while the latter library and the processing system for executing it provide precise solutions by processing the entire body of the graph data. Both processing systems run simultaneously on a supercomputer, which feeds back the results from the latter large-scale, graph-processing system to the real-time, large-scale, graph-stream processor on a regular basis to enhance the precision.

Real-time, large-scale, graph-stream processing system
We are developing a real-time, large-scale, graph-stream processing system that receives a large-scale flow of graph data, moment-to-moment, from an external network and processes all the data in memory, without accumulating any data in external storage. Using the previously described supercomputer, we are designing and implementing this system using the number of sensors to be processed, the amount of traffic, and the graph processing needed to run in several tens of millions of levels of parallelism with high scalability and fault tolerance. This provides a data-flow programming model for application developers. The compiler takes an application written as a data flow and generates code that is optimized for a low-level computing environment and network environment. Furthermore, in order to realize a high-speed data-stream processing system, we develop the processing system whixh enable the user to transparently use accelerators and high-speed networks such as Infiniband. The basis of this processing system consists of performing real-time analysis processing on in-memory sub-graphs while also implementing the interface to rapidly reference any necessary sub-graph by fetching the sub-graph as needed from the entire graph, which is kept in large-scale-graph datastore. We implemented clustering, centrality analysis, and indexing for real-time graph algorithms.

Currently, the scaling of real-time data-stream processing has not yet been verified beyond a few hundred processor cores, and software systems have not yet been implemented for performing real-time graph processing on large amounts of streaming sensor data in an extremely large-scale, mixed vector and scalar distributed parallel processing environment. Nevertheless, we are implementing the processing performance optimization and real-time graph-stream processing described below.

1. Performance optimization of the streaming processing system.
Usually applications that perform streaming processing are represented as a data flow whose vertex corresponds to a calculation, and whose edge corresponds to a stream that flows between the vertices. In a typical stream processing application, the calculation consisting of a single vertex itself relatively light. In order to rapidly process this data flow in a distributed parallel environment on hundreds of thousands of processors, inter-node communication must somehow be eliminated. Also, in order to balance the computation and communication costs by optimizing the data flow performance, the graph is divided into sub-graphs, and each sub-graph must be optimally distributed on its own individual node. In order to do this we used run-time profiling, and the large-scale graph-analysis library built by Fujisawa et al. research group to execute large-scale data-flow graph clustering. In a mixed vector and scalar environment, in order to exploit accelerators like a graphics processing unit (GPU) for real-time graph processing, currently data must be copied from the network to CPU memory and then from CPU memory to GPU memory. However, to implement real-time streaming-data processing, data must be continuously copied at a high speed from the network into the GPU memory without going through the user’s space.

2. Real-time graph-stream processing.
Generally, graph processing requires having data for the entire graph. However, a different strategy for implementing real-time graph analysis is to receive each portion of the graph (new vertices, updated edges, etc.) as it is being streamed from the network (the graph stream) and incrementally process the differences between the streaming data and the graph data already storedin memory. This type of processing system implements an incremental processing model centered on graph analysis, such as connectivity analysis, centrality analysis, clustering, the shortest-path problem, etc. Furthermore, in order to implement a parallel distributed processing system with multiple hundreds of thousands of cores that access a large-scale graph datastore (Hitoshi Sato’s research group), we are implementing a design, whereby only a portion of the graph required for processing is kept in the memory, while the rest of the graph data that does not require updating is migrated stepwise into the lower levels of large-scale storage.

Graph-optimization library in X10
Presently, when building a high-performance optimization system for a large-scale mixed environment composed of vector and scalar processors, a heavy burden is placed on developers, because they need to have a command over multiple programming languages and development environments, such as MPI, threads, OpenMP, CUDA, etc. In our proposed system, we will provide a high-productivity development environment using post-petascale supercomputers for high-speed execution of graph-analysis algorithms (shortest-path problem, centrality analysis, betweenness centrality, etc.) and graph-optimization algorithms (semidefinite programming, mixed integer programming, etc.) by building a partitioned global address space (PGAS) language extension processing system and libraries. X10, an extension of the base PGAS language, allows the generation of large numbers of asynchronous lightweight threads. Graph algorithms are particularly difficult to execute efficiently on distributed memory machines and only a few examples exist that execute these algorithms efficiently on PGAS. Furthermore, one of the characteristics of the PGAS language is that it allows developers to write a concerted code for a mixed vector and scalar environment, and although an experimental version of X10 supports code generation for accelerators such as GPGPUs, this feature has not yet reached the stage of practical use. Our system will extend X10 code generation to allow the use of the accelerator by the optimization algorithms and other graph algorithms previously mentioned. We will also extend the processing system by optimizing the back-end communication system in order to facilitate the scalability to post-petascale processing. On the other hand, in order to analyze the hierarchical data, the system software must be able to support the transfer of data between storage layers. We are developing X10 system libraries to perform these large-scale datastore read/write functions.