From: jihane on
Albert:
My collegue has done something on another system. You think I could run it on mathematica even if the functions are different from a software to the other?

Here is a sample of the data:
-0.13923438E+00 -0.22521242E+00 0.10765536E+01
-0.13928019E+00 -0.22522102E+00 0.10765295E+01
-0.13934083E+00 -0.22523038E+00 0.10765673E+01
-0.13940084E+00 -0.22523966E+00 0.10766749E+01
-0.13944457E+00 -0.22524883E+00 0.10768325E+01
-0.13946098E+00 -0.22525747E+00 0.10769989E+01
-0.13944666E+00 -0.22526383E+00 0.10771308E+01
-0.13940556E+00 -0.22526550E+00 0.10771986E+01
-0.13934693E+00 -0.22526184E+00 0.10771959E+01
-0.13928294E+00 -0.22525565E+00 0.10771388E+01
-0.13922668E+00 -0.22525267E+00 0.10770591E+01
-0.13919051E+00 -0.22525826E+00 0.10769958E+01
-0.13918413E+00 -0.22527380E+00 0.10769836E+01
-0.13921118E+00 -0.22529592E+00 0.10770431E+01
-0.13926551E+00 -0.22531886E+00 0.10771747E+01
-0.13933089E+00 -0.22533810E+00 0.10773579E+01

From: David Bailey on
jihane wrote:
> Thank you for all your replies.
> To give more details about my file: it is a file with numerical data, presented in 3 columns for x y and z axis. (they are acceleration measurements). My computer is a 64 bits machine with 8 GB of RAM. Why my file is that huge? well the measurements are done 1000 times per second. I can't ignore parts of the data. I need to analyze all of it. I don't think of any other useful detail I could provide. What I want to do with this data is do some basic calculations and generate a couple of plots.
>
> Thank you again for all the great help!
>

OK, well if you want Mathematica to run in 64 bit mode (under Windows)
you need to load the 64-bit version of Windows on your machine. This
will then run both 64 and 32 bit applications. When you install
Mathematica it will recognise this configuration and load the 64-bit
version. Together with your 8GB or RAM, you will be able to process much
larger problems with Mathematica.

Even so, that is a lot of data, and if you plot it all at once, you will
have an absurd number of plot points. I can imagine that perhaps you
want to plot it in sections, or maybe smooth it and plot the result. In
either case, you can read it in sections using:

stream=OpenRead[file];

followed by:

ReadList[stream,Real,nn]

where nn is the number of lines you want to read in one block. I reckon
you have approx 250 million numbers in that file, and at 8 bytes each,
that will be 2GB of storage to hold the result, so it is not impossible
that you could store that many numbers, but your memory might run short
while reading the data in.

Be aware also, that lists of numbers can be stored in two ways in
Mathematica - packed and unpacked. They look the same, but packed arrays
are much more efficient, and occupy less storage. I tried a little
experiment, and ReadList applied to your data seems to return an
unpacked array, so you may want to read it in with:

Developer`ToPackedArray[ReadList[stream,Real,nn]]

Unless you intend to read the data in over and over again, there isn't
going to be too much point in converting the file to binary format.

My advice would be to start with a small portion of the file - either a
small time range, or every N'th sample, as appropriate, and develop your
code using that small file - bearing in mind that you will need to scale
the result up later.

I suggest you describe what you want to do with that data - would it be
easy to handle it in blocks - because that would be the obvious approach.

David Bailey
http://www.dbaileyconsultancy.co.uk

From: Bill Rowe on
On 4/7/10 at 3:21 AM, jihane.ajaja(a)mail.mcgill.ca (jihane) wrote:

>Thank you for all your replies. To give more details about my file:
>it is a file with numerical data, presented in 3 columns for x y and
>z axis. (they are acceleration measurements). My computer is a 64
>bits machine with 8 GB of RAM. Why my file is that huge? well the
>measurements are done 1000 times per second. I can't ignore parts of
>the data. I need to analyze all of it. I don't think of any other
>useful detail I could provide. What I want to do with this data is
>do some basic calculations and generate a couple of plots.

A few thoughts. Your other message indicated each value in your
file is represented with ~16 characters including the character
that acts as the column separator. A bit of experimentation on
my system (also a 64 bit system) with ByteCount indicates
Mathematica stores a single real value using 16 bytes. So, the
4GB file you have will occupy a bit more than 4GB of internal
memory since I assume the data will be in an array. Mathematica
has some overhead associated with internal storage of arrays in
addition to the storage for each element of the array.

Since this is more than half the RAM you have available, it is
clear any operation that would make a copy of the array will
fail due to insufficient memory. In fact, when you consider RAM
needed by your operating system, the Mathematica program and
anything else you might have running, it is pretty easy to see
there won't be much you can do with that much data read into RAM.

=46urther, even if you could actually create a simple time plot of
all the data after it is read into RAM, the resolution of your
screen and printer is not sufficient to display that much data
at once. For example, my system has a horizontal display width
of 1280 pixels. If I were to plot a data array with say 10,000
data points, it is clear some of those points must overlap. And
note, 10,000 real triplets would need less than ~500 KB to
store. So, 10,000 data points is clearly a small subset of your data.

Your only real choices to work with as much data as you've
described, are either to intelligently downsample the data or
break the data up into smaller blocks and work with those. And
note, this really isn't a limitation Mathematica imposes. It is
an inherent limitation of the amount of RAM you have available
and real display sizes.


From: Albert Retey on
Hi,

> My collegue has done something on another system. You think I could
> run it on mathematica even if the functions are different from a
> software to the other?

No, you of course can not run the code that was written for the other
system. But if he converted the text file to a binary file with the
other system, then you probably can import the binary file he generated
directly, if the format is supported. Since the text representation of a
floating point number is usually much longer than the binary
representation of a double, my guess is that on your 64bit system the
import of the binary file would just work.

If you want/need to import the text file directly, I think reading
larger junks with Open/ReadList/Close, converting each junk to arrays of
numbers (probably ensuring they are stored as PackedArrays with
Developer`ToPackedArray) and joining all junks in the end will have a
good chance to work. Maybe you can also start with counting the line
numbers, then preallocate a list to hold the data with
data=Table[0.,{numberoflines},{3}], then read the data and fill the
array data will be the most memory efficient way to import the large
file. It will then take a little experimentation with the junk size to
make the code fast...

> Here is a sample of the data: -0.13923438E+00 -0.22521242E+00
> 0.10765536E+01 -0.13928019E+00 -0.22522102E+00 0.10765295E+01
> -0.13934083E+00 -0.22523038E+00 0.10765673E+01 -0.13940084E+00
> -0.22523966E+00 0.10766749E+01 -0.13944457E+00 -0.22524883E+00
> 0.10768325E+01 -0.13946098E+00 -0.22525747E+00 0.10769989E+01
> -0.13944666E+00 -0.22526383E+00 0.10771308E+01 -0.13940556E+00
> -0.22526550E+00 0.10771986E+01 -0.13934693E+00 -0.22526184E+00
> 0.10771959E+01 -0.13928294E+00 -0.22525565E+00 0.10771388E+01
> -0.13922668E+00 -0.22525267E+00 0.10770591E+01 -0.13919051E+00
> -0.22525826E+00 0.10769958E+01 -0.13918413E+00 -0.22527380E+00
> 0.10769836E+01 -0.13921118E+00 -0.22529592E+00 0.10770431E+01
> -0.13926551E+00 -0.22531886E+00 0.10771747E+01 -0.13933089E+00
> -0.22533810E+00 0.10773579E+01
>

Are there line break after every third number? If yes, for this data the
following would work and should be rather memory efficient, but probably
slow. For decent speed you will certainly need a lot larger values for
junksize:

fname = ToFileName[{$HomeDirectory, "Desktop"}, "data.txt"];

junksize = 3;

stream = OpenRead[fname];
lines = {1};
numlines = 0;
While[lines =!= {},
lines = ReadList[stream, {Number, Number, Number}, junksize];
numlines += Length[lines];
];
Close[stream];
Print[numlines];

stream = OpenRead[fname];
lines = {1};
numline = 1;
data = Developer`ToPackedArray[Table[0., {numlines}, {3}]];
While[lines =!= {},
lines = ReadList[stream, {Number, Number, Number}, junksize];
Do[
data[[numline + i - 1, k]] = lines[[i, k]], {i, Length[lines]}, {k,
3}
];
numline += Length[lines];
];
Close[stream];
ByteCount[data]
Developer`PackedArrayQ[data]


hth,

albert

From: John Fultz on
On Thu, 8 Apr 2010 08:02:40 -0400 (EDT), Bill Rowe wrote:
> On 4/7/10 at 3:21 AM, jihane.ajaja(a)mail.mcgill.ca (jihane) wrote:
>
>> Thank you for all your replies. To give more details about my file:
>> it is a file with numerical data, presented in 3 columns for x y and
>> z axis. (they are acceleration measurements). My computer is a 64
>> bits machine with 8 GB of RAM. Why my file is that huge? well the
>> measurements are done 1000 times per second. I can't ignore parts of
>> the data. I need to analyze all of it. I don't think of any other
>> useful detail I could provide. What I want to do with this data is
>> do some basic calculations and generate a couple of plots.
>>
> A few thoughts. Your other message indicated each value in your
> file is represented with ~16 characters including the character
> that acts as the column separator. A bit of experimentation on
> my system (also a 64 bit system) with ByteCount indicates
> Mathematica stores a single real value using 16 bytes. So, the
> 4GB file you have will occupy a bit more than 4GB of internal
> memory since I assume the data will be in an array. Mathematica
> has some overhead associated with internal storage of arrays in
> addition to the storage for each element of the array.
>
> Since this is more than half the RAM you have available, it is
> clear any operation that would make a copy of the array will
> fail due to insufficient memory. In fact, when you consider RAM
> needed by your operating system, the Mathematica program and
> anything else you might have running, it is pretty easy to see
> there won't be much you can do with that much data read into RAM.
>
> Further, even if you could actually create a simple time plot of
> all the data after it is read into RAM, the resolution of your
> screen and printer is not sufficient to display that much data
> at once. For example, my system has a horizontal display width
> of 1280 pixels. If I were to plot a data array with say 10,000
> data points, it is clear some of those points must overlap. And
> note, 10,000 real triplets would need less than ~500 KB to
> store. So, 10,000 data points is clearly a small subset of your data.
>
> Your only real choices to work with as much data as you've
> described, are either to intelligently downsample the data or
> break the data up into smaller blocks and work with those. And
> note, this really isn't a limitation Mathematica imposes. It is
> an inherent limitation of the amount of RAM you have available
> and real display sizes.

I wouldn't dispute your general remarks about the wisdom of down-sampling. But
let me just correct one statement you made. You correctly point out that a Real
consumes 16 bytes. But it is not correct to generalize from this that an array
from Reals will consume 16 bytes per Real. If the array is constructed in such
a way so that the numbers pack, then you'll get 8 bytes per Real, plus a small
number of additional bytes for the array as a whole. Some evidence to show
this...

In[1]:= ByteCount[RandomReal[]]

Out[1]= 16

In[2]:= ByteCount[Table[RandomReal[], {1000}]]

Out[2]= 8124

Although I'm aware of some of the issues, I'm not the best person to discuss the
details of dealing with packed arrays. I will say that the typical experience
should be that, as long as you're dealing with uniform data (all Real or all
Integer), Mathematica should automatically pack things for you.

Of course, packed arrays only throw off your numbers by a factor of two, so I'll
reemphasize that this doesn't invalidate your general conclusions about how to
tackle the problem.

Sincerely,

John Fultz
jfultz(a)wolfram.com
User Interface Group
Wolfram Research, Inc.