From: Ken on
I am trying to write single precision 3D array (M x N x K) data to an HDF5 file and need to append the data as a series of new arrays onto an existing dataset within the HDF5 file. From what I have learned so far, this is accomplished by extending the dataset dimensions, selecting the hyperslab, and writing the data to the new dimensions.

I have two datasets each with dimensions of 48 x 100 x 2. My initial HDF5 file writes properly and i have the first (48x100x2) dataset correct.

I am running into two problems:

1. My newly appended data do not appear to be matching the matlab dimensions. As I scroll through the arrays in the dataset within the HDF5 file I have the values that should be constant (value = 2.0) moving upward along the dimension with length=100.

2. I have not been able to figure out a way to append a new (48x100x2) dataset without first combining it to a dummy (48x100x2) dataset. In other words, i need a (96x100x2) dataset in order to append an additional 48 arrays to a (48x100x2) dataset.

Here is my code:


%----------------------------------------------------------------------------------------
% Initialize Data
%----------------------------------------------------------------------------------------

testdata = single(ones(48,100,2));
testdata2=testdata.*2;

data_initialize =zeros(size(testdata2));
data_combined=[ data_initialize; testdata2];

filename = 'test3dim.h5'
dsetname = 'my_dataset'

dims(1) = 48;
dims(2) = 100;
dims(3) = 2;

newdims(1) = 96;
newdims(2) = 100;
newdims(3) = 2;

chunk(1) = 48;
chunk(2) = 100;
chunk(3) = 2;

%----------------------------------------------------------------------------------------
% Create Initial HDF5 File
%----------------------------------------------------------------------------------------

%
% Create a new file using the default properties.
%
fileID = H5F.create(filename, 'H5F_ACC_TRUNC', 'H5P_DEFAULT', 'H5P_DEFAULT');
%
% Create dataspace with unlimited dimensions.
%
maxdims = {'H5S_UNLIMITED', 'H5S_UNLIMITED', 'H5S_UNLIMITED'};
space = H5S.create_simple (3, dims, maxdims);
%
% Create the dataset creation property list, add the gzip
% compression filter and set the chunk size.
%
dcpl = H5P.create('H5P_DATASET_CREATE');
H5P.set_deflate(dcpl, 9);
H5P.set_chunk(dcpl, chunk);
%
% Create the compressed unlimited dataset.
%
datasetID = H5D.create(fileID, dsetname, 'H5T_NATIVE_FLOAT', space, dcpl);
%
% Write the data to the dataset.
%
datatypeID = H5T.copy('H5T_NATIVE_FLOAT');
H5D.write(datasetID, datatypeID,'H5S_ALL', 'H5S_ALL','H5P_DEFAULT', testdata);


%
% Close and release resources.
%
H5P.close(dcpl);
H5D.close(datasetID);
H5S.close(space);
H5F.close(fileID);

% =================New HDF5 File Created ===================




%----------------------------------------------------------------------------------------
% Open Existing HDF5 File and Append Data to Dataset
%----------------------------------------------------------------------------------------

% Open Existing HDF5 File
fileID = H5F.open(filename, 'H5F_ACC_RDWR', 'H5P_DEFAULT');
% Open Existing Dataset
datasetID = H5D.open(fileID, dsetname);

% Extend Existing Dataset Dimensions
H5D.extend(datasetID, newdims);
space = H5D.get_space(datasetID);

% Setup Hyperslab
start = [0,0,0];
count= [48,100,2];
stride = [1,1,1];
block = [];
H5S.select_hyperslab(space, 'H5S_SELECT_NOTB', start, stride, count, block);

% Write Data to newly extended dimensions
H5D.write(datasetID, 'H5T_NATIVE_FLOAT', 'H5S_ALL', space,'H5P_DEFAULT', data_combined);

H5D.close(datasetID);
H5S.close(space);
H5F.close(fileID);
% ===============New HDF5 File Appended ===================



Any assistance in providing a solution to the two problems stated at the top of the message would be greatly appreciated.

Many Thanks!
From: Dinesh Iyer on
Hello Ken,
The issue you are facing is due to the difference in memory ordering between C and MATLAB. The HDF5 library uses C-style ordering for multidimensional arrays, while MATLAB uses FORTRAN-style ordering.

For more information about this, please refer to pt 4 in the README file at the link below:
http://www.hdfgroup.org/ftp/HDF5/examples/examples-by-api/api18-m.html

I am providing you with the modified code:

%----------------------------------------------------------------------------------------
% Initialize Data
%----------------------------------------------------------------------------------------
%%
testdata = single(ones(48,100,2));
testdata2=testdata.*2;

data_initialize =zeros(size(testdata2));
data_combined=[ data_initialize; testdata2];

filename = 'test3dim_mod.h5';
dsetname = 'my_dataset';

dims(1) = 48;
dims(2) = 100;
dims(3) = 2;

newdims(1) = 96;
newdims(2) = 100;
newdims(3) = 2;

chunk(1) = 48;
chunk(2) = 100;
chunk(3) = 2;

%----------------------------------------------------------------------------------------
% Create Initial HDF5 File
%----------------------------------------------------------------------------------------

%
% Create a new file using the default properties.
%
fileID = H5F.create(filename, 'H5F_ACC_TRUNC', 'H5P_DEFAULT', 'H5P_DEFAULT');
%
% Create dataspace with unlimited dimensions.
%
maxdims = {'H5S_UNLIMITED', 'H5S_UNLIMITED', 'H5S_UNLIMITED'};
h_dims = fliplr(dims);
h_maxdims = fliplr(maxdims);
space = H5S.create_simple (3, h_dims, h_maxdims);
%
% Create the dataset creation property list, add the gzip
% compression filter and set the chunk size.
%
dcpl = H5P.create('H5P_DATASET_CREATE');
H5P.set_deflate(dcpl, 9);

h5_chunk = fliplr(chunk);
H5P.set_chunk(dcpl, h5_chunk);
%
% Create the compressed unlimited dataset.
%
datasetID = H5D.create(fileID, dsetname, 'H5T_NATIVE_FLOAT', space, dcpl);
%
% Write the data to the dataset.
%
datatypeID = H5T.copy('H5T_NATIVE_FLOAT');
H5D.write(datasetID, datatypeID,'H5S_ALL', 'H5S_ALL','H5P_DEFAULT', testdata);


%
% Close and release resources.
%
H5P.close(dcpl);
H5D.close(datasetID);
H5S.close(space);
H5F.close(fileID);

% =================New HDF5 File Created ===================

%%
%----------------------------------------------------------------------------------------
% Open Existing HDF5 File and Append Data to Dataset
%----------------------------------------------------------------------------------------

% Open Existing HDF5 File
fileID = H5F.open(filename, 'H5F_ACC_RDWR', 'H5P_DEFAULT');
% Open Existing Dataset
datasetID = H5D.open(fileID, dsetname);

% Extend Existing Dataset Dimensions
h5_newdims = fliplr(newdims);
H5D.extend(datasetID, h5_newdims);
space = H5D.get_space(datasetID);

% Setup Hyperslab
start = [0 0 0]; h5_start = fliplr(start);
stride = [1 1 1]; h5_stride = fliplr(stride);
count = [1 1 1]; h5_count = fliplr(count);
block = [48 100 2]; h5_block = fliplr(block);
H5S.select_hyperslab(space, 'H5S_SELECT_NOTB', h5_start, h5_stride, h5_count, h5_block);

% Write Data to newly extended dimensions
H5D.write(datasetID, 'H5T_NATIVE_FLOAT', 'H5S_ALL', space,'H5P_DEFAULT', data_combined);

H5D.close(datasetID);
H5S.close(space);
H5F.close(fileID);
% ===============New HDF5 File Appended ===================
From: Ken on
Dinesh,

Many thanks for your post. It was quite helpful.

The solution you presented resulted in clean results, but the dimensions of the dataset were not what I was attempting to achieve. However, after studying your solution, it was able to help me resolve my primary problem. Your post helped me focus on the dimensions and the stride, count, start, block settings.

I ended up reverting back to my original script, but with two modifications:

1. I added a line to rearrange the data to be written:
data_combined=permute(data_combined,[3 2 1]);

2. I modified the hyperslab setup similar to your script:
start = [0 0 0];
stride = [1 1 1];
count= [1 1 1];
block = [48 100 2];


These modifications resulted in the dataset that I was attempting to obtain with two columns, 100 rows, and 96 pages (arrays) long. Here is the modified code:

%----------------------------------------------------------------------------------------
% Initialize Data
%----------------------------------------------------------------------------------------

testdata = single(ones(48,100,2));
testdata2=testdata.*2;

data_initialize =zeros(size(testdata2));
data_combined=[ data_initialize; testdata2];
data_combined=permute(data_combined,[3 2 1]);

filename = 'test3dim10.h5'
dsetname = 'my_dataset'

dims(1) = 48;
dims(2) = 100;
dims(3) = 2;

newdims(1) = 96;
newdims(2) = 100;
newdims(3) = 2;

chunk(1) = 48;
chunk(2) = 100;
chunk(3) = 2;

%----------------------------------------------------------------------------------------
% Create Initial HDF5 File
%----------------------------------------------------------------------------------------


%
% Create a new file using the default properties.
%
fileID = H5F.create(filename, 'H5F_ACC_TRUNC', 'H5P_DEFAULT', 'H5P_DEFAULT');
%
% Create dataspace with unlimited dimensions.
%
maxdims = {'H5S_UNLIMITED', 'H5S_UNLIMITED', 'H5S_UNLIMITED'};
space = H5S.create_simple (3, dims, maxdims);
%
% Create the dataset creation property list, add the gzip
% compression filter and set the chunk size.
%
dcpl = H5P.create('H5P_DATASET_CREATE');
H5P.set_deflate(dcpl, 9);

H5P.set_chunk(dcpl, chunk);

%
% Create the compressed unlimited dataset.
%
datasetID = H5D.create(fileID, dsetname, 'H5T_NATIVE_FLOAT', space, dcpl);
%
% Write the data to the dataset.
%
datatypeID = H5T.copy('H5T_NATIVE_FLOAT');
H5D.write(datasetID, datatypeID,'H5S_ALL', 'H5S_ALL','H5P_DEFAULT', testdata);


%
% Close and release resources.
%
H5P.close(dcpl);
H5D.close(datasetID);
H5S.close(space);
H5F.close(fileID);

% ====================New HDF5 File Created ================




%----------------------------------------------------------------------------------------
% Open Existing HDF5 File and Append Data to Dataset
%----------------------------------------------------------------------------------------

% Open Existing HDF5 File
fileID = H5F.open(filename, 'H5F_ACC_RDWR', 'H5P_DEFAULT');
% Open Existing Dataset
datasetID = H5D.open(fileID, dsetname);

% Extend Existing Dataset Dimensions
H5D.extend(datasetID, newdims);
space = H5D.get_space(datasetID);

% Setup Hyperslab
start = [0 0 0];
stride = [1 1 1];
count= [1 1 1];
block = [48 100 2];
H5S.select_hyperslab(space, 'H5S_SELECT_NOTB', start, stride, count, block);

% Write Data to newly extended dimensions
H5D.write(datasetID, 'H5T_NATIVE_FLOAT', 'H5S_ALL', space,'H5P_DEFAULT', data_combined);

H5D.close(datasetID);
H5S.close(space);
H5F.close(fileID);
% ==========New HDF5 File Appended =========================



The next question I have is:

Is there any way to append the second set of 48 arrays (or pages) to the dataset without including a set of 48 "dummy" arrays prior to the new 48 as filler? I'd like to append an additional 48 arrays without first making that dimension 96 long.

I tried various attempts at changing the "start" value to different values (for example start = [48 0 0]; and other attempts), but I could not get the data to write.

Would I need to use H5S.offset_simple for this?

Thanks again!
From: Ken on
Dinesh,

Many thanks for your post. It was quite helpful.

The solution you presented resulted in clean results, but the dimensions of the dataset were not what I was attempting to achieve. However, after studying your solution, it was able to help me resolve my primary problem. Your post helped me focus on the dimensions and the stride, count, start, block settings.

I ended up reverting back to my original script, but with two modifications:

1. I added a line to rearrange the data to be written:
data_combined=permute(data_combined,[3 2 1]);

2. I modified the hyperslab setup similar to your script:
start = [0 0 0];
stride = [1 1 1];
count= [1 1 1];
block = [48 100 2];


These modifications resulted in the dataset that I was attempting to obtain with two columns, 100 rows, and 96 pages (arrays) long. Here is the modified code:

%----------------------------------------------------------------------------------------
% Initialize Data
%----------------------------------------------------------------------------------------

testdata = single(ones(48,100,2));
testdata2=testdata.*2;

data_initialize =zeros(size(testdata2));
data_combined=[ data_initialize; testdata2];
data_combined=permute(data_combined,[3 2 1]);

filename = 'test3dim10.h5'
dsetname = 'my_dataset'

dims(1) = 48;
dims(2) = 100;
dims(3) = 2;

newdims(1) = 96;
newdims(2) = 100;
newdims(3) = 2;

chunk(1) = 48;
chunk(2) = 100;
chunk(3) = 2;

%----------------------------------------------------------------------------------------
% Create Initial HDF5 File
%----------------------------------------------------------------------------------------


%
% Create a new file using the default properties.
%
fileID = H5F.create(filename, 'H5F_ACC_TRUNC', 'H5P_DEFAULT', 'H5P_DEFAULT');
%
% Create dataspace with unlimited dimensions.
%
maxdims = {'H5S_UNLIMITED', 'H5S_UNLIMITED', 'H5S_UNLIMITED'};
space = H5S.create_simple (3, dims, maxdims);
%
% Create the dataset creation property list, add the gzip
% compression filter and set the chunk size.
%
dcpl = H5P.create('H5P_DATASET_CREATE');
H5P.set_deflate(dcpl, 9);

H5P.set_chunk(dcpl, chunk);

%
% Create the compressed unlimited dataset.
%
datasetID = H5D.create(fileID, dsetname, 'H5T_NATIVE_FLOAT', space, dcpl);
%
% Write the data to the dataset.
%
datatypeID = H5T.copy('H5T_NATIVE_FLOAT');
H5D.write(datasetID, datatypeID,'H5S_ALL', 'H5S_ALL','H5P_DEFAULT', testdata);


%
% Close and release resources.
%
H5P.close(dcpl);
H5D.close(datasetID);
H5S.close(space);
H5F.close(fileID);

% ====================New HDF5 File Created ================




%----------------------------------------------------------------------------------------
% Open Existing HDF5 File and Append Data to Dataset
%----------------------------------------------------------------------------------------

% Open Existing HDF5 File
fileID = H5F.open(filename, 'H5F_ACC_RDWR', 'H5P_DEFAULT');
% Open Existing Dataset
datasetID = H5D.open(fileID, dsetname);

% Extend Existing Dataset Dimensions
H5D.extend(datasetID, newdims);
space = H5D.get_space(datasetID);

% Setup Hyperslab
start = [0 0 0];
stride = [1 1 1];
count= [1 1 1];
block = [48 100 2];
H5S.select_hyperslab(space, 'H5S_SELECT_NOTB', start, stride, count, block);

% Write Data to newly extended dimensions
H5D.write(datasetID, 'H5T_NATIVE_FLOAT', 'H5S_ALL', space,'H5P_DEFAULT', data_combined);

H5D.close(datasetID);
H5S.close(space);
H5F.close(fileID);
% ==========New HDF5 File Appended =========================



The next question I have is:

Is there any way to append the second set of 48 arrays (or pages) to the dataset without including a set of 48 "dummy" arrays prior to the new 48 as filler? I'd like to append an additional 48 arrays without first making that dimension 96 long.

I tried various attempts at changing the "start" value to different values (for example start = [48 0 0]; and other attempts), but I could not get the data to write.

Would I need to use H5S.offset_simple for this?

Thanks again!
From: Ken on
Dinesh,

Many thanks for your post. It was quite helpful.

The solution you presented resulted in clean results, but the dimensions of the dataset were not what I was attempting to achieve. However, after studying your solution, it was able to help me resolve my primary problem. Your post helped me focus on the dimensions and the stride, count, start, block settings.

I ended up reverting back to my original script, but with two modifications:

1. I added a line to rearrange the data to be written:
data_combined=permute(data_combined,[3 2 1]);

2. I modified the hyperslab setup similar to your script:
start = [0 0 0];
stride = [1 1 1];
count= [1 1 1];
block = [48 100 2];


These modifications resulted in the dataset that I was attempting to obtain with two columns, 100 rows, and 96 pages (arrays) long. Here is the modified code:

%----------------------------------------------------------------------------------------
% Initialize Data
%----------------------------------------------------------------------------------------

testdata = single(ones(48,100,2));
testdata2=testdata.*2;

data_initialize =zeros(size(testdata2));
data_combined=[ data_initialize; testdata2];
data_combined=permute(data_combined,[3 2 1]);

filename = 'test3dim10.h5'
dsetname = 'my_dataset'

dims(1) = 48;
dims(2) = 100;
dims(3) = 2;

newdims(1) = 96;
newdims(2) = 100;
newdims(3) = 2;

chunk(1) = 48;
chunk(2) = 100;
chunk(3) = 2;

%----------------------------------------------------------------------------------------
% Create Initial HDF5 File
%----------------------------------------------------------------------------------------


%
% Create a new file using the default properties.
%
fileID = H5F.create(filename, 'H5F_ACC_TRUNC', 'H5P_DEFAULT', 'H5P_DEFAULT');
%
% Create dataspace with unlimited dimensions.
%
maxdims = {'H5S_UNLIMITED', 'H5S_UNLIMITED', 'H5S_UNLIMITED'};
space = H5S.create_simple (3, dims, maxdims);
%
% Create the dataset creation property list, add the gzip
% compression filter and set the chunk size.
%
dcpl = H5P.create('H5P_DATASET_CREATE');
H5P.set_deflate(dcpl, 9);

H5P.set_chunk(dcpl, chunk);

%
% Create the compressed unlimited dataset.
%
datasetID = H5D.create(fileID, dsetname, 'H5T_NATIVE_FLOAT', space, dcpl);
%
% Write the data to the dataset.
%
datatypeID = H5T.copy('H5T_NATIVE_FLOAT');
H5D.write(datasetID, datatypeID,'H5S_ALL', 'H5S_ALL','H5P_DEFAULT', testdata);


%
% Close and release resources.
%
H5P.close(dcpl);
H5D.close(datasetID);
H5S.close(space);
H5F.close(fileID);

% ====================New HDF5 File Created ================




%----------------------------------------------------------------------------------------
% Open Existing HDF5 File and Append Data to Dataset
%----------------------------------------------------------------------------------------

% Open Existing HDF5 File
fileID = H5F.open(filename, 'H5F_ACC_RDWR', 'H5P_DEFAULT');
% Open Existing Dataset
datasetID = H5D.open(fileID, dsetname);

% Extend Existing Dataset Dimensions
H5D.extend(datasetID, newdims);
space = H5D.get_space(datasetID);

% Setup Hyperslab
start = [0 0 0];
stride = [1 1 1];
count= [1 1 1];
block = [48 100 2];
H5S.select_hyperslab(space, 'H5S_SELECT_NOTB', start, stride, count, block);

% Write Data to newly extended dimensions
H5D.write(datasetID, 'H5T_NATIVE_FLOAT', 'H5S_ALL', space,'H5P_DEFAULT', data_combined);

H5D.close(datasetID);
H5S.close(space);
H5F.close(fileID);
% ==========New HDF5 File Appended =========================



The next question I have is:

Is there any way to append the second set of 48 arrays (or pages) to the dataset without including a set of 48 "dummy" arrays prior to the new 48 as filler? I'd like to append an additional 48 arrays without first making that dimension 96 long.

I tried various attempts at changing the "start" value to different values (for example start = [48 0 0]; and other attempts), but I could not get the data to write.

Would I need to use H5S.offset_simple for this?

Thanks again!