ADaptable IO System (ADIOS) initially developed at ORNL by Jay Lofstead and Scott Klasky, Summer 2007
This was built on work previously done by
Hasan Abbasi, Karsten Schwan, and Matt Wolf (GT) - PBIO portals & infiniband
Ciprian Docan, Manish Parashar (Rutgers) - DART
Chen Jin (ORNL) - data tagging format and code for reading and writing
The major goals of the API are threefold:
Provide a simplified, easy to use
API for scientists to write their IO operations
Deliver enhanced I/O and code performance through both asynchronous techniques and best practices implementations of IO routines
Provide a stable interface platform for experimentation in the IO space for existing scientific codes at scale without requiring any changes to the scientific codes
The API consists of two main parts:
The programmatic interface for Fortran and also usable for C and other C-linkable languages.
An
XML configuration file for defining the IO types and methods.
Programmatic Interface:
adios_init (filename, ...) - [required] load the
XML configuration file creating internal representations of the various data types and defining the methods used for writing. For right now, there are additional parameters to define various MPI elements that are supposed to be transparently compatible between Fortan and C, but are not.
adios_open (io_handle, group_name, filename, mode) - [required] prepare a data type for subsequent calls to write data using the io_handle. Mode is one of “r” (read), “w” (write), “a” (append), “u” (update [a future feature]).
adios_write (io_handle, field_name, var) - [required] submit a data element for writing and associate it with the given field_name for this type. This does NOT actually perform the write. Scalars are duplicated, vectors are referenced. Any changes to vectors before adios_close is called will be reflected in the written data.
adios_get_write_buffer (io_handle, field_name, size, buffer) - [optional] for the given field, get a buffer that will be used at the transport level for it of the given size. If size == 0, then auto calculate the size based on what is known from the datatype in the
XML file and any provided additional elements (such as array dimension elements). To return this buffer, just do a normal call to adios_write using the same io_handle, field_name, and the returned buffer.
adios_set_path (io_handle, path) - [optional] set the HDF-5-style path for all vars in a group. This will reset whatever is specified in the
XML file.
adios_set_path_var (io_handle, path, var) - [optional] set the HDF-5-style path for the specified var in the group. This will reset whatever is specified in the
XML file.
adios_read (io_handle, field_name, var) - submit a buffer space (var) for reading a data element into. This does NOT actually perform the read. Actual population of the buffer space will happen on the call to adios_close
adios_get_data_size (size, io_handle) - gets the size for this io_handle. This is primarily useful for appending one type to the end of an existing file of another type (used in conjunction with adios_open_append). Other uses would be to predict the size for a read based on the knowledge in the datatype. If a buffer is not provided for read or write, then the element is killed regardless of the write flag in the config.xml file.
adios_close (io_handle) - [required] trigger the building of the buffer for transfer and then returns control back to the caller. At this point, all of the data is copied and will be sent as-is downstream. [experimental] If the handle is opened for read, this will cause the fetch of the data, parse it, and populate it into the provided buffers. This is currently hard-coded to use posix io calls.
adios_end_iteration () - [optional] a tick counter for the IO routines to time how fast they are emptying the buffers.
adios_start_calculation () - [optional?] an indicator that it is now an ideal time to do bulk data transfers as the code will not be performing IO for a while.
adios_end_calculation () - [optional?] an indicator that it is no longer a good time to do bulk data transfers as the code is about to start doing communication with other nodes causing possible conflicts.
adios_allocate_buffer () - [required/optional] tells the
API to allocate the write buffers now. This is used in conjunction with the configuration file to determine the size and wether or not this all is required.
XML file format and elements: format: <element-name attr1 attr2 ...> with descriptions to follow. Formatted like an XML document.
<adios-config> - root element for the entire file
host-language - [optional]. Default “Fortran”. Either “Fortran” or “C”. This is an indicator for MPI handle conversion. Since this
API was written in C, if it is being called from Fortran, the MPI handles need to be converted.
<adios-group name coordination-communicator coordination-var> - a grouping element for a datatype used for a write operation (such as a restart or diagnostics data set)
name - the name used to select this type from within the code
coordination-communicator - [optional] the name of the var that contains the communicator used for coordinated writes
coordination-var - [optional] the name of the var that can be used to perform the grouping/coordination downstream from the compute nodes
<global-bounds dimensions offsets> - [optional] enclosing var element(s) within a global-bounds specifies how those var(s) map into a global space. Use the coordination-* attributes of the adios-group to collate the vars into a single whole.
<var name path type dimensions write copy-on-write/> - non-vector data types
name - name of this element
path - HDF-5-style path
type - data type. Currently supported values (size): byte (1-byte), integer (4-byte), real (4-byte), string, real*8 (8-byte), double (8-byte), integer*4 (4-byte), integer*8 (8-byte), long (8-byte), real*4 (4-byte), complex (16-byte (2 doubles))
dimensions - a comma separated list of numbers and/or names that correspond to var elements to determine the size of this item.
write - [optional] Default “yes”. Either “yes” or “no”. if set to “no”, then this is an informational element not to be written intended for either grouping or dataset usage
copy-on-write - [optional] Default “no”. Either “yes” or “no”. if set to “yes”, the transport layer is required to ensure that whatever is passed as the value for this item is stored elsewhere when the call to adios_write returns. Otherwise, a pointer is stored for more efficient memory usage.
</global-bounds>
<attribute name path type var value/>
name - name of the attribute
path - HDF-5-style path of the element (var) or group to which this attribute is attached
type - [optional, default=”string”] data type of this attribute
var - [optional] var value name this value will be provided through. Must be unique across the entire adios-group
value - [optional] value for the attribute.
Either var or value must be provided, but not both.
<mesh type time-varying>
type - this changes the expected contents and must be one of these 4 values (expected contents): “uniform” (dimensions, origin, spacing), “rectilinear” (dimensions, coordinates-multi-var or coordinates-single-var), “strutured” (nspace, dimensions, points-single-var or points-multi-var), or “unstructured” (points, one or more of uniform-cells and mixed-cells).
time-varying - does this mesh change over time. Valid values are “yes” and “no”. It defaults to “no”. If it does not vary then it should generally only be written the first time writes are done.
“uniform” <dimensions value/>
<origin value/>
<spacing value/>
“rectilinear” <dimensions value/>
<coordinate-single-var value/>
<coordinate-multi-var value/>
“structured” <nspace value/>
<dimensions value/>
<points-single-var value/>
<points-multi-var value/>
“unstructured” <points components number-of-points value/>
components - number of dimensions in each point
number-of-points - how many points will be provided
value - one dimensional array of values that will be interpreted in components-sized groups as coordinates. Numbered from 1
<uniform-cells count data type/>
count - number of cells to look for in the value
data - a list of points that correspond to entries in the points value element. There are no shape entries in this list
type - the vtk cell shape to interpret the data using
<mixed-cells count data types/>
count - number of cells to look for in the value
data - a one dimensional integer list of point count and point lists for the cells
types - the list of the vtk cell shape types for interpreting the data
</mesh>
</adios-group>
<method type method priority iterations>parameters</method> - mapping a writing method to a data type including any initialization. One or more of these should be provided for each data-group. If more than one is provided, all will be used.
group - corresponds to a datatype specified earlier in the file
method - a string indicating the method to use. Currently supported values: MPI, PBIO, DART, POSIX, NULL (no io).
priority - [optional] a numeric priority for the IO methods to better schedule this write with others that may be pending currently
iterations - [optional] a number of iterations between writes of this type used to gauge how quickly this data should be evacuated from the compute node
base-path - [optional] the root path to use as a starting point for writes. This will be prepended to filenames, in most cases.
parameters - [optional] a string passed to the method for initialization.
</method>
<buffer size-MB free-memory-percentage allocate-time/> - internal buffer sizing and creation time
</adios-config>
NOTES:
Name elements in the
XML file are just strings. The only restrictions are that if the item is to be used in a dataset dimension, it must not contain a comma and must contain at least one non-numeric character. This is useful for putting expressions as various dimensions.
It is critical that the first item in your
XML file be the standard <?xml version=”1.0”?> in order for it to be parsed properly.
-
-
A semi-public SVN repository of the code will be hosted at UNM and will be available soon.