====== Structured Streaming Data System (SSDS) Wiki ====== Data-intensive HPC applications are becoming increasingly important, adding substantial challenges to the already daunting input/output requirements of MPP codes. A well-known example is the interpretation of data from seismic exploration. In these applications, I/O problems occur both from the large data volumes produced by seismic sensing and from the fact that this data must be manipulated to fit simulation requirements for translating the time series data from multiple sensor locations into a format ready for 3-D subsurface reconstruction. Translation steps include data filtering, data transformations, stacking of traces, etc. Similarly, in online collaboration systems, visualizations require conversion and/or filtering to meet client needs. The difficulties faced by scientists and engineers in attaining high performance I/O for data-intensive MPP applications are exacerbated by the low level of abstraction presented by current I/O systems. This research will create higher level I/O abstractions for developers. Specifically, the SSDS framework we propose models I/O as **I/O Graphs** that `connect' application components with input or output mechanisms like file systems based on metadata constructed offline by autonomous metabots. I/O Graphs can be programmed to realize application-specific I/O functionality, such as data filtering and conversion, data remeshing, and similar tasks. Their management is automated, including the mapping of their logical graph nodes to underlying physical MPP and distributed machine resources. I/O performance in SSDS will be improved by integrating the computational I/O actions of I/O Graphs with the backend file systems that store high volume data and with the I/O actions already taken by applications, and by moving metadata management offline into metabots. ===== People ===== ==== Georgia Tech ==== * Karsten Schwan * Greg Eisenhauer * Ada Gavrilovska * Matt Wolf * Hasan Abbasi * Jay Lofstead * Vibhore Kumar ==== University of New Mexico ==== * Barney Maccabe * Patrick Bridges * Patrick Widener * Mary Payne ==== Other collaborators ==== * Ron Oldfield, Sandia National Laboratories * Pete Wyckoff, Ohio Supercomputing Center ===== Meeting Notes ===== * {{notes:HECURA-SC06.doc|SC'06 discussion notes}} (from Karsten) * {{notes:120406.rtf|12-04-06 conference call notes}} * {{notes:minutes-020107.txt|02-01-07 conference call notes}} (UNM and Matt Wolf - overview of target applications) * Notes from a meeting with Joe Kniss and Terran Lane (UNM) on [[other potential SSDS applications]]. ===== Documents and other resources ===== * Scott Klasky's comments on [[upcoming ORNL "big-science" projects]] * Ron Oldfield's {{docs:oldfield-examples-013007.pdf|description of example applications}} for I/O graphs: data-permutation, seismic imaging, FMRI analysis * Matthew and Jack's brainstorming on [[Metabot Microbenchmarks]] * A first stab at understanding Metabots and a possible partitioning of the problem {{:metabot_motivation_0.2.pdf|Motivating Metabots}} * The original HECURA {{docs:proposal.pdf|proposal document}} * UNM CS Student Conference (2007) {{:csposter.pdf|poster}} submission * [[Asynchronous I/O API|ADIOS - ADaptable IO System]] * [[Metabot Ideas]] * Jay Lofstead wrote up some notes on his [[ORNL summer (2007) internship]] * {{:super_computing_demo_gui_requirements.doc|Super Computer Demo GUI Requirements}} * Matt Wolf / Scott Klasky on [[GTC visualization for SC]] * {{:s02_tutorial.pdf|SC07 tutorial on parallel I/O}} * {{http://www.llnl.gov/icc/lc/siop/downloads/download.html|LLNL IO benchmarks}} * Scott Klasky's [[mesh creation example code]] for GTC visualization * [[Metabots:Lists|Metabots]] * [[IOgraph/Metabot Ideas from M3D (CPES Meeting 12/5/07)]] * [[Rough Metabot framework ideas]] * [[Starting LWFS Servers]] * [[Metabot Controller API]] * [[Chunking Metabot Design Doc]] ===== Acknowledgements ===== The SSDS project is supported through the National Science Foundation's HECURA program (award #0621538).