Transformation for combining data sets [Matlab]

Prev: Simhydraulics (
Next: How to transfrom a multidimensional Array into a Matrix?

From: Michelle on 31 Mar 2010 20:02

Hi all!
So I have multiple sets of the same type of data. But due to biology.... there is of course some environmental differences between the sets. I have controls for each data set ( or run) that I use as references, but I would like to be able to "tranform" all the data I have ( lets say using the controls to figure out the transformation) then apply it to the respective data sets to run statistical analysis.
For each data set i have successfully applied PCA to data compress then mahalanobis distance to look at outliers. I would like to be able to do this for the all the runs I have. At the moment, without doing anything, things cluster or separate by runs.
I think people deal with this problem when combining microarray data sets. The idea seems simple enough, but I don't really know how to implement it. So if anyone has an ideas that would be greatly appreciated!
Thanks

From: Arthur Goldsipe on 1 Apr 2010 08:59

"Michelle " <gondu1(a)hotmail.com> wrote in message <hp0nqd$rba$1(a)fred.mathworks.com>...
> Hi all!
> So I have multiple sets of the same type of data. But due to biology.... there is of course some environmental differences between the sets. I have controls for each data set ( or run) that I use as references, but I would like to be able to "tranform" all the data I have ( lets say using the controls to figure out the transformation) then apply it to the respective data sets to run statistical analysis.
> For each data set i have successfully applied PCA to data compress then mahalanobis distance to look at outliers. I would like to be able to do this for the all the runs I have. At the moment, without doing anything, things cluster or separate by runs.
> I think people deal with this problem when combining microarray data sets. The idea seems simple enough, but I don't really know how to implement it. So if anyone has an ideas that would be greatly appreciated!
> Thanks

Hi Michelle,

The problem of normalizing biological data sets is a big one! Unfortunately, I don't think there's one approach that works for everyone. In fact, sometimes the variation between runs is indicative of a real problem. So, I would start by looking at your data sets and trying to understand how you convince yourself that the variation between data sets is not important. Presumably, that has something to do with the controls in each data set. Perhaps that analysis will give you some insight into how to normalize the individual data sets in a way that will allow you to make comparisons across data sets.

If that's too general an answer, then you probably need to dig into the specifics of your experiments. Presumably, other people have dealt with this kind of data before. If you can't find people you work with who can give you some guidance, then maybe you can try sharing more details with us about your experiments and data. Biologists are a minority on the MATLAB newsgroups, but you never know who will read your message...

--Arthur

From: Michelle on 1 Apr 2010 12:40

Hi,
I am dealing with whole plants.... plants grown as tight of conditions as possible but of course suffer from biological variances that unfortunately I can't control. So that is the limit of the experimental control I have. Essentially I am doing spectroscopy on the plants and running cluster analysis to group. This works for each batch( with a control grown at the same time), but now I would like to see if I could normalized the batches and cluster. In theory I was imaging transforming the basis set like in linear algebra, except I don't think difference are not linear. So if anyone has any "buzz" words I should try to look up, I would be grateful.
Thanks again!

|
Pages: 1
Prev: Simhydraulics (
Next: How to transfrom a multidimensional Array into a Matrix?