reading HDF5 dataset subset [Matlab]

Prev: DS-CDMA matlab source code
Next: latex problem in textbox

From: Ashish Uthama on 3 Nov 2009 15:10

On Tue, 03 Nov 2009 14:11:03 -0500, Tess <tess.brandon(a)noaa.gov> wrote:

> Thanks; I've seen these examples but haven't been able to get them to
> apply to my problem successfully. Is 'hyperslab' just a fancy word for
> subset or chunk of the data? Does it matter at all how the data set is
> chunked?
>
> I ran the following code and received the following error:
> fileID=H5F.open(file);
> dset=H5D.open(fileID,'/WeeklySST'); % this is a 53x512x512 data set
> which I
> % read successfully as a whole using H5D.read
> space=H5D.get_space(dset);
> start=[0 0 0];
> stride=[50 50 50] % making it symmetrical b/c I'm still figuring out how
> HDF5 flips
> % dimensions
> count=[]; block=[]; % "If count and block are specified as [], the count
> and block
> % size default to a single element."
> H5S.select(space,'H5S_SELECT_SET',start,stride,count,block);
>
> ??? Error using ==> hdf5lib2
> The HDF5 library encountered an error: "unable to set hyperslab
> selection"
>
> Error in ==> H5S.select_hyperslab at 35
> H5ML.hdf5lib2('H5Sselect_hyperslab', spaceID, op, start, stride, count,
> block);
>
> I am running 7.8.0 R2009a.
>
> Thanks again,
> Tess

1.
As I understand it, chunking is defined on the internals of the file, i.e
how the data is stored/organized internally in the file. This appears to
be something required to enable 'extendible' data sets.
http://www.zib.de/zibdoc/zib/mpp/prog/hdf5/Tutor/extend.html

A hyperslab is what you appear to think of as a 'chunk of data'. It is a
subset of the data. I dont think the chunking of the file matters when you
are reading a hyperslab.

I am sure there is more info somewhere in the HDF documentation.

2. Using [] results in the arguments defaulting to a single element. Thats
not what you want if you want to select a single element (look at code
below)

3. You need to use H5S.select_hyperslab

4. You need to specify a memory space ID which matches the size of the
data selected (see code)

Heres something which might help you understand better and get started
with your own code:

%hdf5write('o.h5','/DS1',reshape(1:53*512*512,[53 512 512])); %used to
create sample data, each element ought to be unique

fileID=H5F.open('o.h5','H5F_ACC_RDONLY','H5P_DEFAULT');
dset=H5D.open(fileID,'/DS1'); % this is a 53x512x512 data set
space=H5D.get_space(dset);

start=[7 18 26]; %zero based!
stride=[1 1 1];
count=[1 1 1]; block=[ ];

H5S.select_hyperslab(space,'H5S_SELECT_SET',fliplr(start-1),... %-1 to
take care of 0 based indexing
fliplr(stride),fliplr(count),fliplr(block)); %fliplr
to take care of row/column major reversal (please see the ReadMe.txt in my
previous post)
mem_space = H5S.create_simple(3, fliplr(count), []); %see Readme.txt for
explanation on need for flipping

rdata = H5D.read(dset,'H5T_NATIVE_INT',mem_space,space,'H5P_DEFAULT');

H5D.close(dset);
H5S.close(space);
H5S.close(mem_space);
H5F.close(fileID);

disp(rdata)

%Cross check with the expected element:
r=hdf5read('o.h5','/DS1');
r(7,18,26)

From: Tess on 3 Nov 2009 15:55

I think you hit on it with #4 -- I didn't know how to set the memory space.

Apologies for accidentally leaving out the "_hyperslab" in "H5S.select_hyperslab" when copying from my workspace.

Just to be clear, I already wrote these files to be chunked -- that's not the issue. The issue is whether a file needs to be written with hyperslabs in order to use H5S.select_hyperslab, etc. to read hyperslabs? Should I be using H5S.select_elements instead if I wrote the dataset to the file in one piece originally?

"Ashish Uthama" <first.last(a)mathworks.com> wrote in message <op.u2t4zlc7a5ziv5(a)uthamaa.dhcp.mathworks.com>...
> On Tue, 03 Nov 2009 14:11:03 -0500, Tess <tess.brandon(a)noaa.gov> wrote:
>
> > Thanks; I've seen these examples but haven't been able to get them to
> > apply to my problem successfully. Is 'hyperslab' just a fancy word for
> > subset or chunk of the data? Does it matter at all how the data set is
> > chunked?
> >
> > I ran the following code and received the following error:
> > fileID=H5F.open(file);
> > dset=H5D.open(fileID,'/WeeklySST'); % this is a 53x512x512 data set
> > which I
> > % read successfully as a whole using H5D.read
> > space=H5D.get_space(dset);
> > start=[0 0 0];
> > stride=[50 50 50] % making it symmetrical b/c I'm still figuring out how
> > HDF5 flips
> > % dimensions
> > count=[]; block=[]; % "If count and block are specified as [], the count
> > and block
> > % size default to a single element."
> > H5S.select(space,'H5S_SELECT_SET',start,stride,count,block);
> >
> > ??? Error using ==> hdf5lib2
> > The HDF5 library encountered an error: "unable to set hyperslab
> > selection"
> >
> > Error in ==> H5S.select_hyperslab at 35
> > H5ML.hdf5lib2('H5Sselect_hyperslab', spaceID, op, start, stride, count,
> > block);
> >
> > I am running 7.8.0 R2009a.
> >
> > Thanks again,
> > Tess
>
> 1.
> As I understand it, chunking is defined on the internals of the file, i.e
> how the data is stored/organized internally in the file. This appears to
> be something required to enable 'extendible' data sets.
> http://www.zib.de/zibdoc/zib/mpp/prog/hdf5/Tutor/extend.html
>
> A hyperslab is what you appear to think of as a 'chunk of data'. It is a
> subset of the data. I dont think the chunking of the file matters when you
> are reading a hyperslab.
>
> I am sure there is more info somewhere in the HDF documentation.
>
> 2. Using [] results in the arguments defaulting to a single element. Thats
> not what you want if you want to select a single element (look at code
> below)
>
> 3. You need to use H5S.select_hyperslab
>
> 4. You need to specify a memory space ID which matches the size of the
> data selected (see code)
>
>
>
> Heres something which might help you understand better and get started
> with your own code:
>
>
>
> %hdf5write('o.h5','/DS1',reshape(1:53*512*512,[53 512 512])); %used to
> create sample data, each element ought to be unique
>
>
> fileID=H5F.open('o.h5','H5F_ACC_RDONLY','H5P_DEFAULT');
> dset=H5D.open(fileID,'/DS1'); % this is a 53x512x512 data set
> space=H5D.get_space(dset);
>
>
> start=[7 18 26]; %zero based!
> stride=[1 1 1];
> count=[1 1 1]; block=[ ];
>
>
> H5S.select_hyperslab(space,'H5S_SELECT_SET',fliplr(start-1),... %-1 to
> take care of 0 based indexing
> fliplr(stride),fliplr(count),fliplr(block)); %fliplr
> to take care of row/column major reversal (please see the ReadMe.txt in my
> previous post)
> mem_space = H5S.create_simple(3, fliplr(count), []); %see Readme.txt for
> explanation on need for flipping
>
>
> rdata = H5D.read(dset,'H5T_NATIVE_INT',mem_space,space,'H5P_DEFAULT');
>
> H5D.close(dset);
> H5S.close(space);
> H5S.close(mem_space);
> H5F.close(fileID);
>
> disp(rdata)
>
>
> %Cross check with the expected element:
> r=hdf5read('o.h5','/DS1');
> r(7,18,26)
>
>

From: Tess on 3 Nov 2009 16:09

I just ran your code and realized that while you defined the memory space to be rank 3, [x y z], the output is only one value. The way you've defined start, stride and count, you should have ended up with a 1x3 vector. I need to be able to subset, say, a [1x512x512] slice of data.

"Ashish Uthama" <first.last(a)mathworks.com> wrote in message <op.u2t4zlc7a5ziv5(a)uthamaa.dhcp.mathworks.com>...
> On Tue, 03 Nov 2009 14:11:03 -0500, Tess <tess.brandon(a)noaa.gov> wrote:
>
> > Thanks; I've seen these examples but haven't been able to get them to
> > apply to my problem successfully. Is 'hyperslab' just a fancy word for
> > subset or chunk of the data? Does it matter at all how the data set is
> > chunked?
> >
> > I ran the following code and received the following error:
> > fileID=H5F.open(file);
> > dset=H5D.open(fileID,'/WeeklySST'); % this is a 53x512x512 data set
> > which I
> > % read successfully as a whole using H5D.read
> > space=H5D.get_space(dset);
> > start=[0 0 0];
> > stride=[50 50 50] % making it symmetrical b/c I'm still figuring out how
> > HDF5 flips
> > % dimensions
> > count=[]; block=[]; % "If count and block are specified as [], the count
> > and block
> > % size default to a single element."
> > H5S.select(space,'H5S_SELECT_SET',start,stride,count,block);
> >
> > ??? Error using ==> hdf5lib2
> > The HDF5 library encountered an error: "unable to set hyperslab
> > selection"
> >
> > Error in ==> H5S.select_hyperslab at 35
> > H5ML.hdf5lib2('H5Sselect_hyperslab', spaceID, op, start, stride, count,
> > block);
> >
> > I am running 7.8.0 R2009a.
> >
> > Thanks again,
> > Tess
>
> 1.
> As I understand it, chunking is defined on the internals of the file, i.e
> how the data is stored/organized internally in the file. This appears to
> be something required to enable 'extendible' data sets.
> http://www.zib.de/zibdoc/zib/mpp/prog/hdf5/Tutor/extend.html
>
> A hyperslab is what you appear to think of as a 'chunk of data'. It is a
> subset of the data. I dont think the chunking of the file matters when you
> are reading a hyperslab.
>
> I am sure there is more info somewhere in the HDF documentation.
>
> 2. Using [] results in the arguments defaulting to a single element. Thats
> not what you want if you want to select a single element (look at code
> below)
>
> 3. You need to use H5S.select_hyperslab
>
> 4. You need to specify a memory space ID which matches the size of the
> data selected (see code)
>
>
>
> Heres something which might help you understand better and get started
> with your own code:
>
>
>
> %hdf5write('o.h5','/DS1',reshape(1:53*512*512,[53 512 512])); %used to
> create sample data, each element ought to be unique
>
>
> fileID=H5F.open('o.h5','H5F_ACC_RDONLY','H5P_DEFAULT');
> dset=H5D.open(fileID,'/DS1'); % this is a 53x512x512 data set
> space=H5D.get_space(dset);
>
>
> start=[7 18 26]; %zero based!
> stride=[1 1 1];
> count=[1 1 1]; block=[ ];
>
>
> H5S.select_hyperslab(space,'H5S_SELECT_SET',fliplr(start-1),... %-1 to
> take care of 0 based indexing
> fliplr(stride),fliplr(count),fliplr(block)); %fliplr
> to take care of row/column major reversal (please see the ReadMe.txt in my
> previous post)
> mem_space = H5S.create_simple(3, fliplr(count), []); %see Readme.txt for
> explanation on need for flipping
>
>
> rdata = H5D.read(dset,'H5T_NATIVE_INT',mem_space,space,'H5P_DEFAULT');
>
> H5D.close(dset);
> H5S.close(space);
> H5S.close(mem_space);
> H5F.close(fileID);
>
> disp(rdata)
>
>
> %Cross check with the expected element:
> r=hdf5read('o.h5','/DS1');
> r(7,18,26)
>
>

From: Ashish Uthama on 3 Nov 2009 16:11

On Tue, 03 Nov 2009 15:55:17 -0500, Tess <tess.brandon(a)noaa.gov> wrote:

> I think you hit on it with #4 -- I didn't know how to set the memory
> space.
>
> Apologies for accidentally leaving out the "_hyperslab" in
> "H5S.select_hyperslab" when copying from my workspace.
>
> Just to be clear, I already wrote these files to be chunked -- that's
> not the issue. The issue is whether a file needs to be written with
> hyperslabs in order to use H5S.select_hyperslab, etc. to read
> hyperslabs? Should I be using H5S.select_elements instead if I wrote
> the dataset to the file in one piece originally?

I am curious to know why you enabled chunking when you were writing?

I hope you got my point earlier that chunking and hyperslabs are not the
same. Chunking has to do with how HDF5 library internally organizes parts
of the dataset. Hyperslab concept is at the application level, it is
independent of the chunking in the file.

Have a quick look at '4.5 Storage strategies' here
http://www.hdfgroup.org/HDF5/doc/UG/UG_frame10Datasets.html

You can use any of he selection/read methods (hyperslab/elements) on a
HDF5 file irrespective of how it was written/chunked.

From: Ashish Uthama on 3 Nov 2009 16:19

On Tue, 03 Nov 2009 16:09:01 -0500, Tess <tess.brandon(a)noaa.gov> wrote:

> I just ran your code and realized that while you defined the memory
> space to be rank 3, [x y z], the output is only one value. The way
> you've defined start, stride and count, you should have ended up with a
> 1x3 vector. I need to be able to subset, say, a [1x512x512] slice of
> data.

With this:
start=[7 18 26];
stride=[1 1 1];
count=[1 1 1]; block=[ ];

Count is the number of data points along that specific dimension.
Total number of actual data points selected is 1x1x1 = 1

Did you consider using this:

start=[1 1 1];
stride=[1 1 1];
count=[1 512 512]; block=[];

First | Prev | Next | Last
Pages: 1 2 3
Prev: DS-CDMA matlab source code
Next: latex problem in textbox