From: Markus on
I have a large number of data points n(x,y,z,t) where n is the refractive index and the data is produced by a Large Eddy Simulation model (a type of CFD-calculation). The simulation was performed on a cluster and the results dumped to file. I am now doing post-processing of the data. I know that the time average should be cylindrically symmetric and would thus want to calculate ñ(rho,z) and n_var(rho,z), where I mean average with ñ and variance with n_var. The time series is not long enough for ñ(x,y,z) to look symmetric. The grid has structure but is not rectangular or symmetric. There are approximately 7 million grid points and 50 time steps so the amount of data is considerable.

What I have tried is to greate a grid (rgrid,zgrid), and then bin all the ñ(x,y,z) values to the rectangles created by this grid. Calculating the mean of the values in each bin is simple. The problem is that this method (at least in my implementation) is very slow as there is a loop over all points in the rectangular grid (rgrid,zgrid). The variance can in principle be calculated by taking the variance of the values in each rectangle. If the underlying distribution however varies inside the rectangle this will overestimate the variance. To compensate for this I would need to interpolate ñ(rgrid,zgrid) to ñ(rho,z) for each of the 7 million gridpoints, calculate the variance in that gridpoint and again average the variances in the rectangle. I think this would give an approximately correct value, but it seems very inefficient and slow. Can anyone give me a better algorithm or
implementation than the one I have suggested here?

Here is some code that tries to perform my slow algorithm (I think it works):
x,y,z are px1 vectors where p is the number of gridpoints
N is a px50 array of the values

[~,rho]=cart2pol(x,y);
nmean=mean(N,2);

rgrid=0:rstep:max(rho);
zgrid=0:zstep:max(z);
% in future version the grid may need to be nonequidistant as the original sampling varies

rlim=[0 (rgrid(1:end-1)+rgrid(2:end))/2 max(rho)];
zlim=[0 (zgrid(1:end-1)+zgrid(2:end))/2 max(z)];

nmeancyl=zeros(length(zgrid),length(rgrid));
nvarcyl=zeros(length(zgrid),length(rgrid));

for ii=1:length(zgrid)
for jj=1:length(rgrid)
ind=zlim(ii)<z & zlim(ii+1)>z & rlim(jj)<rho & rlim(jj+1)>rho;
nmeancyl(ii,jj)=mean(nmean(ind));
% I would like to actually calculate the value at zgrid,rgrid. Maybe by fitting some surface % to the data inside each rectangle?
nvarcyl(ii,jj)=mean(mean(N(ind,:).^2-nmeancyl(ii,jj)^2)/nmeancyl(ii,jj)^2);
% I would like to include the spatial variation of nmeancyl inside each rectangle
end
end
 | 
Pages: 1
Prev: dataset training
Next: surf plot