Implementation Overview¶
The basic outline of mppnccombine-fast
consists of one “Writer” rank and one or
more “Reader” ranks. The Writer rank handles all writing to the output file,
while the Reader ranks read in data from the many files to be collated and
send the data to the Writer rank.
The main slowdown in copying compressed variables is that the hdf5 library has
to de-compress them during the read, and re-compress them during the write.
mppnccombine-fast
works around this by using HDF5 1.10.2’s direct IO
functions
H5DOwrite_chunk()
and
H5DOread_chunk()
to copy the compressed data from one file to the other directly, rather than
going through the de-compress/re-compress cycle.
To get a even larger speedup MPI is used to have separate read and write processes, since HDF5 IO is a blocking function.
Since the NetCDF4 library is much nicer to use, but doesn’t provide public access to the underlying HDF5 file, we need to do a bit of musical chairs with the files, swapping between NetCDF4 and HDF5 modes by re-opening the files.
Writer Rank¶
The Writer starts out by copying the dimensions, attributes and any uncollated
variables from the first of the listed input files using the NetCDF API in
init()
and copy_contiguous()
. It then re-opens the file using
the HDF5 API and enters the ‘Async Write Loop’ in run_async_writer()
.
This loop polls for any incoming MPI messages from the Reader processes then performs some action (e.g. write a compressed chunk directly to the file at some location). Once a Reader has finished reading all of its input files it sends a close message to the Writer rank, once all close messages have been received the Writer rank closes the output file and exits.
Reader Ranks¶
The Readers distribute input files amonst themselves using a shared atomic
counter. When a Reader is ready for a new file it gets the next value from the
counter, then in copy_chunked()
opens that file using NetCDF to query
its attributes and discover and copy collated variables.
Depending on the chunking and alignment of the file the Reader will decide to
copy the data either in uncompressed form using NetCDF with
copy_netcdf_variable_chunks()
or directly copying the compressed chunks
by re-opening the file in HDF5 mode with copy_hdf5_variable_chunks()
.
Once all available files have been read the Reader sends a final close message to the Writer and exits.
The Async Write Loop¶
The Async write loop is set up to handle a number of messages that the Readers will send to the Writer
open_variable_async()
: Obtain a handle to a variable in the output filevariable_info_async()
: Obtain chunking and compression information for a variablewrite_uncompressed_async()
: Write uncompressed data to a given logical location in the variablewrite_chunk_async()
: Write a compressed chunk directly to the output file at a given chunk locationclose_variable_async()
: Return the variable handleclose_async()
: Reports that the Reader will not send any more messages
The Writer asyncronously polls for these messages in
run_async_writer()
, then actions them in
receive_open_variable_async()
etc.