ADaptable IO System (ADIOS) initially developed at ORNL by Jay Lofstead and Scott Klasky, Summer 2007 This was built on work previously done by * Hasan Abbasi, Karsten Schwan, and Matt Wolf (GT) - PBIO portals & infiniband * Ciprian Docan, Manish Parashar (Rutgers) - DART * Chen Jin (ORNL) - data tagging format and code for reading and writing The major goals of the API are threefold: - Provide a simplified, easy to use API for scientists to write their IO operations - Deliver enhanced I/O and code performance through both asynchronous techniques and best practices implementations of IO routines - Provide a stable interface platform for experimentation in the IO space for existing scientific codes at scale without requiring any changes to the scientific codes The API consists of two main parts: - The programmatic interface for Fortran and also usable for C and other C-linkable languages. - An XML configuration file for defining the IO types and methods. Programmatic Interface: * adios_init (filename, ...) - [required] load the XML configuration file creating internal representations of the various data types and defining the methods used for writing. For right now, there are additional parameters to define various MPI elements that are supposed to be transparently compatible between Fortan and C, but are not. * adios_open (io_handle, group_name, filename, mode) - [required] prepare a data type for subsequent calls to write data using the io_handle. Mode is one of "r" (read), "w" (write), "a" (append), "u" (update [a future feature]). * adios_write (io_handle, field_name, var) - [required] submit a data element for writing and associate it with the given field_name for this type. This does NOT actually perform the write. Scalars are duplicated, vectors are referenced. Any changes to vectors before adios_close is called will be reflected in the written data. * adios_get_write_buffer (io_handle, field_name, size, buffer) - [optional] for the given field, get a buffer that will be used at the transport level for it of the given size. If size == 0, then auto calculate the size based on what is known from the datatype in the XML file and any provided additional elements (such as array dimension elements). To return this buffer, just do a normal call to adios_write using the same io_handle, field_name, and the returned buffer. * adios_set_path (io_handle, path) - [optional] set the HDF-5-style path for all vars in a group. This will reset whatever is specified in the XML file. * adios_set_path_var (io_handle, path, var) - [optional] set the HDF-5-style path for the specified var in the group. This will reset whatever is specified in the XML file. * adios_read (io_handle, field_name, var) - submit a buffer space (var) for reading a data element into. This does NOT actually perform the read. Actual population of the buffer space will happen on the call to adios_close * adios_get_data_size (size, io_handle) - gets the size for this io_handle. This is primarily useful for appending one type to the end of an existing file of another type (used in conjunction with adios_open_append). Other uses would be to predict the size for a read based on the knowledge in the datatype. If a buffer is not provided for read or write, then the element is killed regardless of the write flag in the config.xml file. * adios_close (io_handle) - [required] trigger the building of the buffer for transfer and then returns control back to the caller. At this point, all of the data is copied and will be sent as-is downstream. [experimental] If the handle is opened for read, this will cause the fetch of the data, parse it, and populate it into the provided buffers. This is currently hard-coded to use posix io calls. * adios_end_iteration () - [optional] a tick counter for the IO routines to time how fast they are emptying the buffers. * adios_start_calculation () - [optional?] an indicator that it is now an ideal time to do bulk data transfers as the code will not be performing IO for a while. * adios_end_calculation () - [optional?] an indicator that it is no longer a good time to do bulk data transfers as the code is about to start doing communication with other nodes causing possible conflicts. * adios_allocate_buffer () - [required/optional] tells the API to allocate the write buffers now. This is used in conjunction with the configuration file to determine the size and wether or not this all is required. * adios_finalize () - [required] cleanup anything remaining before exiting the code * adios_get_methods (methods) - get the linked list of methods. * adios_get_types (types) - get the linked list of types. XML file format and elements: format: with descriptions to follow. Formatted like an XML document. - root element for the entire file * host-language - [optional]. Default "Fortran". Either "Fortran" or "C". This is an indicator for MPI handle conversion. Since this API was written in C, if it is being called from Fortran, the MPI handles need to be converted. - a grouping element for a datatype used for a write operation (such as a restart or diagnostics data set) * name - the name used to select this type from within the code * coordination-communicator - [optional] the name of the var that contains the communicator used for coordinated writes * coordination-var - [optional] the name of the var that can be used to perform the grouping/coordination downstream from the compute nodes - [optional] enclosing var element(s) within a global-bounds specifies how those var(s) map into a global space. Use the coordination-* attributes of the adios-group to collate the vars into a single whole. * dimensions - the global array sizes for each dimension. Follows the same standard as the var dimension (below) * offsets - the offset the enclosed var(s) should have in this global space - non-vector data types * name - name of this element * path - HDF-5-style path * type - data type. Currently supported values (size): byte (1-byte), integer (4-byte), real (4-byte), string, real*8 (8-byte), double (8-byte), integer*4 (4-byte), integer*8 (8-byte), long (8-byte), real*4 (4-byte), complex (16-byte (2 doubles)) * dimensions - a comma separated list of numbers and/or names that correspond to var elements to determine the size of this item. * write - [optional] Default "yes". Either "yes" or "no". if set to "no", then this is an informational element not to be written intended for either grouping or dataset usage * copy-on-write - [optional] Default "no". Either "yes" or "no". if set to "yes", the transport layer is required to ensure that whatever is passed as the value for this item is stored elsewhere when the call to adios_write returns. Otherwise, a pointer is stored for more efficient memory usage. * name - name of the attribute * path - HDF-5-style path of the element (var) or group to which this attribute is attached * type - [optional, default="string"] data type of this attribute * var - [optional] var value name this value will be provided through. Must be unique across the entire adios-group * value - [optional] value for the attribute. Either var or value must be provided, but not both. * type - this changes the expected contents and must be one of these 4 values (expected contents): "uniform" (dimensions, origin, spacing), "rectilinear" (dimensions, coordinates-multi-var or coordinates-single-var), "strutured" (nspace, dimensions, points-single-var or points-multi-var), or "unstructured" (points, one or more of uniform-cells and mixed-cells). * time-varying - does this mesh change over time. Valid values are "yes" and "no". It defaults to "no". If it does not vary then it should generally only be written the first time writes are done. "uniform" * value - magnitude of the space to mesh * value - origin of the space to mesh * value - spacing (size) of each mesh element "rectilinear" * value - number of points in each dimension * value - a single multi-dimensional array that lists the points for all dimensions * value - comma separated list of array vars that list the points for each dimension "structured" * value - number of dimensions in mesh * value - count of points in each dimension * value - a single multi-dimensional array that lists the points for all dimensions * value - comma separated list of array vars that list the points for each dimension "unstructured" * components - number of dimensions in each point * number-of-points - how many points will be provided * value - one dimensional array of values that will be interpreted in components-sized groups as coordinates. Numbered from 1 * count - number of cells to look for in the value * data - a list of points that correspond to entries in the points value element. There are no shape entries in this list * type - the vtk cell shape to interpret the data using * count - number of cells to look for in the value * data - a one dimensional integer list of point count and point lists for the cells * types - the list of the vtk cell shape types for interpreting the data parameters - mapping a writing method to a data type including any initialization. One or more of these should be provided for each data-group. If more than one is provided, all will be used. * group - corresponds to a datatype specified earlier in the file * method - a string indicating the method to use. Currently supported values: MPI, PBIO, DART, POSIX, NULL (no io). * priority - [optional] a numeric priority for the IO methods to better schedule this write with others that may be pending currently * iterations - [optional] a number of iterations between writes of this type used to gauge how quickly this data should be evacuated from the compute node * base-path - [optional] the root path to use as a starting point for writes. This will be prepended to filenames, in most cases. * parameters - [optional] a string passed to the method for initialization. - internal buffer sizing and creation time * size-MB - the number of MB to allocate for buffering. Either size-MB or free-memory-percentage is required. * free-memory-percentage - the percentage of free ram to allocate for buffering. Either size-MB or free-memory-percentage is required. * allocate-time - either 'now' or 'oncall' to indicate when the buffer should be allocated. 'oncall' will wait until the programmer decides that all memory needed for calculation has been allocated and will then call adios_allocate_buffer () NOTES: - Name elements in the XML file are just strings. The only restrictions are that if the item is to be used in a dataset dimension, it must not contain a comma and must contain at least one non-numeric character. This is useful for putting expressions as various dimensions. - It is critical that the first item in your XML file be the standard in order for it to be parsed properly. - There is a mailing list for those interested in developments about this API available at [[http://caip.rutgers.edu/mailman/listinfo/aio]] - The format for the .bp files is the [[Tagged Binary Format]] - A semi-public SVN repository of the code will be hosted at UNM and will be available soon.