Skip to Main Content U.S. Department of Energy
Center for Adaptive Supercomputing - Multithreaded Architectures

An Efficient XMT Database System

High-performance computing has always been concerned with processing huge datasets. Examples of applications that use such datasets are social network analysis, Internet knowledge discovery, and certain bioinformatics applications such as genome mapping.

However, we have seen the advent of very large datasets without any physical or logical locality that could be used to partition the dataset across the distributed memory nodes of a typical MPP or cluster architecture.

Such applications are widely expected to be the “sweet spot” for the Cray XMT, given its large shared memory, fine-grain synchronization, and the latency tolerance and abundant parallelism provided by its highly multithreaded processor architecture.

This task is aimed at prototyping several aspects of the infrastructure that will be needed to enable broad use of the XMT on this class of applications. Issues that are being addressed include:

  • What data structures provide the most efficient underlying support for a large dataset, a relational database, or a graph-oriented database on the XMT?
  • What is the optimal XMT implementation of the most demanding database operations, such as joins and range queries?
  • For graph-oriented datasets, what are the optimal data structures for representing graphs, what are the optimal algorithms for operating on them, and what are the optimal methods for synchronizing concurrent accesses to them? What assumptions change if graphical data structures are being designed to support a graph-oriented database system that users interactively query?
  • There needs to be an infrastructure on the XMT that enables a variety of applications using a variety of large datasets to be easily implemented and used.

CASS-MT

Research and Development

Resources

Recent News

Additional Resources

PNNL Contacts