Installation

For installation instructions see the README at https://github.com/coecms/mppnccombine-fast

Usage

mppnccombine-fast is a MPI program, and requires at least two MPI ranks to run:

mpirun -n 2 mppnccombine-fast --output output.nc input.nc.000 input.nc.001

Variables in the input files whos dimensions have a domain_distribution attribute will be collated. All other dimensions, variables and attributes will be copied from the first input file.

The domain_distribution values are expected to be in the format provided by the MOM model - an array of 4 integer values using 1-based array indices:

First index of this dimension in the full dataset

Last index of this dimension in the full dataset

First index of this dimension in this file’s data

Last index of this dimension in this file’s data

A domain_distribution of [1, 10, 5, 10] states that the full dimension has a length of 10, and this file contains the 5 values starting at offset 5.

Globbing inputs

Input files may be listed either individually or as an escaped shell glob (both to reduce the history attribute in the output file as well as to avoid issues when there are thousands of input files):

mpirun -n 2 mppnccombine-fast --output output.nc input.nc.\*

Changing compression settings

Chunk size and compression settings will by default come from the first input file, though they can be overridden using flags. Note that the optimised copying routines can only be used when the compression settings of an input file matches those of the output, and when the input file’s data chunks align with the chunks in the output file (e.g. if a variable in the output file has chunk sizes [10, 15, 30] then the input file’s offset in the full dataset must be [m*10, n*15, o*30] where m, n and o are integers).

If only some of the chunks in the input file align with the output these chunks will use the fast path (so partial chunks on the edges of the dataset are fine).