Prev: math lib change in OS X bug + workaround
Next: trying to use InputField as a continually attentive
From: jihane on 7 Apr 2010 03:21 Albert: My collegue has done something on another system. You think I could run it on mathematica even if the functions are different from a software to the other? Here is a sample of the data: -0.13923438E+00 -0.22521242E+00 0.10765536E+01 -0.13928019E+00 -0.22522102E+00 0.10765295E+01 -0.13934083E+00 -0.22523038E+00 0.10765673E+01 -0.13940084E+00 -0.22523966E+00 0.10766749E+01 -0.13944457E+00 -0.22524883E+00 0.10768325E+01 -0.13946098E+00 -0.22525747E+00 0.10769989E+01 -0.13944666E+00 -0.22526383E+00 0.10771308E+01 -0.13940556E+00 -0.22526550E+00 0.10771986E+01 -0.13934693E+00 -0.22526184E+00 0.10771959E+01 -0.13928294E+00 -0.22525565E+00 0.10771388E+01 -0.13922668E+00 -0.22525267E+00 0.10770591E+01 -0.13919051E+00 -0.22525826E+00 0.10769958E+01 -0.13918413E+00 -0.22527380E+00 0.10769836E+01 -0.13921118E+00 -0.22529592E+00 0.10770431E+01 -0.13926551E+00 -0.22531886E+00 0.10771747E+01 -0.13933089E+00 -0.22533810E+00 0.10773579E+01
From: David Bailey on 8 Apr 2010 08:00 jihane wrote: > Thank you for all your replies. > To give more details about my file: it is a file with numerical data, presented in 3 columns for x y and z axis. (they are acceleration measurements). My computer is a 64 bits machine with 8 GB of RAM. Why my file is that huge? well the measurements are done 1000 times per second. I can't ignore parts of the data. I need to analyze all of it. I don't think of any other useful detail I could provide. What I want to do with this data is do some basic calculations and generate a couple of plots. > > Thank you again for all the great help! > OK, well if you want Mathematica to run in 64 bit mode (under Windows) you need to load the 64-bit version of Windows on your machine. This will then run both 64 and 32 bit applications. When you install Mathematica it will recognise this configuration and load the 64-bit version. Together with your 8GB or RAM, you will be able to process much larger problems with Mathematica. Even so, that is a lot of data, and if you plot it all at once, you will have an absurd number of plot points. I can imagine that perhaps you want to plot it in sections, or maybe smooth it and plot the result. In either case, you can read it in sections using: stream=OpenRead[file]; followed by: ReadList[stream,Real,nn] where nn is the number of lines you want to read in one block. I reckon you have approx 250 million numbers in that file, and at 8 bytes each, that will be 2GB of storage to hold the result, so it is not impossible that you could store that many numbers, but your memory might run short while reading the data in. Be aware also, that lists of numbers can be stored in two ways in Mathematica - packed and unpacked. They look the same, but packed arrays are much more efficient, and occupy less storage. I tried a little experiment, and ReadList applied to your data seems to return an unpacked array, so you may want to read it in with: Developer`ToPackedArray[ReadList[stream,Real,nn]] Unless you intend to read the data in over and over again, there isn't going to be too much point in converting the file to binary format. My advice would be to start with a small portion of the file - either a small time range, or every N'th sample, as appropriate, and develop your code using that small file - bearing in mind that you will need to scale the result up later. I suggest you describe what you want to do with that data - would it be easy to handle it in blocks - because that would be the obvious approach. David Bailey http://www.dbaileyconsultancy.co.uk
From: Bill Rowe on 8 Apr 2010 08:02 On 4/7/10 at 3:21 AM, jihane.ajaja(a)mail.mcgill.ca (jihane) wrote: >Thank you for all your replies. To give more details about my file: >it is a file with numerical data, presented in 3 columns for x y and >z axis. (they are acceleration measurements). My computer is a 64 >bits machine with 8 GB of RAM. Why my file is that huge? well the >measurements are done 1000 times per second. I can't ignore parts of >the data. I need to analyze all of it. I don't think of any other >useful detail I could provide. What I want to do with this data is >do some basic calculations and generate a couple of plots. A few thoughts. Your other message indicated each value in your file is represented with ~16 characters including the character that acts as the column separator. A bit of experimentation on my system (also a 64 bit system) with ByteCount indicates Mathematica stores a single real value using 16 bytes. So, the 4GB file you have will occupy a bit more than 4GB of internal memory since I assume the data will be in an array. Mathematica has some overhead associated with internal storage of arrays in addition to the storage for each element of the array. Since this is more than half the RAM you have available, it is clear any operation that would make a copy of the array will fail due to insufficient memory. In fact, when you consider RAM needed by your operating system, the Mathematica program and anything else you might have running, it is pretty easy to see there won't be much you can do with that much data read into RAM. =46urther, even if you could actually create a simple time plot of all the data after it is read into RAM, the resolution of your screen and printer is not sufficient to display that much data at once. For example, my system has a horizontal display width of 1280 pixels. If I were to plot a data array with say 10,000 data points, it is clear some of those points must overlap. And note, 10,000 real triplets would need less than ~500 KB to store. So, 10,000 data points is clearly a small subset of your data. Your only real choices to work with as much data as you've described, are either to intelligently downsample the data or break the data up into smaller blocks and work with those. And note, this really isn't a limitation Mathematica imposes. It is an inherent limitation of the amount of RAM you have available and real display sizes.
From: Albert Retey on 8 Apr 2010 08:03 Hi, > My collegue has done something on another system. You think I could > run it on mathematica even if the functions are different from a > software to the other? No, you of course can not run the code that was written for the other system. But if he converted the text file to a binary file with the other system, then you probably can import the binary file he generated directly, if the format is supported. Since the text representation of a floating point number is usually much longer than the binary representation of a double, my guess is that on your 64bit system the import of the binary file would just work. If you want/need to import the text file directly, I think reading larger junks with Open/ReadList/Close, converting each junk to arrays of numbers (probably ensuring they are stored as PackedArrays with Developer`ToPackedArray) and joining all junks in the end will have a good chance to work. Maybe you can also start with counting the line numbers, then preallocate a list to hold the data with data=Table[0.,{numberoflines},{3}], then read the data and fill the array data will be the most memory efficient way to import the large file. It will then take a little experimentation with the junk size to make the code fast... > Here is a sample of the data: -0.13923438E+00 -0.22521242E+00 > 0.10765536E+01 -0.13928019E+00 -0.22522102E+00 0.10765295E+01 > -0.13934083E+00 -0.22523038E+00 0.10765673E+01 -0.13940084E+00 > -0.22523966E+00 0.10766749E+01 -0.13944457E+00 -0.22524883E+00 > 0.10768325E+01 -0.13946098E+00 -0.22525747E+00 0.10769989E+01 > -0.13944666E+00 -0.22526383E+00 0.10771308E+01 -0.13940556E+00 > -0.22526550E+00 0.10771986E+01 -0.13934693E+00 -0.22526184E+00 > 0.10771959E+01 -0.13928294E+00 -0.22525565E+00 0.10771388E+01 > -0.13922668E+00 -0.22525267E+00 0.10770591E+01 -0.13919051E+00 > -0.22525826E+00 0.10769958E+01 -0.13918413E+00 -0.22527380E+00 > 0.10769836E+01 -0.13921118E+00 -0.22529592E+00 0.10770431E+01 > -0.13926551E+00 -0.22531886E+00 0.10771747E+01 -0.13933089E+00 > -0.22533810E+00 0.10773579E+01 > Are there line break after every third number? If yes, for this data the following would work and should be rather memory efficient, but probably slow. For decent speed you will certainly need a lot larger values for junksize: fname = ToFileName[{$HomeDirectory, "Desktop"}, "data.txt"]; junksize = 3; stream = OpenRead[fname]; lines = {1}; numlines = 0; While[lines =!= {}, lines = ReadList[stream, {Number, Number, Number}, junksize]; numlines += Length[lines]; ]; Close[stream]; Print[numlines]; stream = OpenRead[fname]; lines = {1}; numline = 1; data = Developer`ToPackedArray[Table[0., {numlines}, {3}]]; While[lines =!= {}, lines = ReadList[stream, {Number, Number, Number}, junksize]; Do[ data[[numline + i - 1, k]] = lines[[i, k]], {i, Length[lines]}, {k, 3} ]; numline += Length[lines]; ]; Close[stream]; ByteCount[data] Developer`PackedArrayQ[data] hth, albert
From: John Fultz on 9 Apr 2010 03:32 On Thu, 8 Apr 2010 08:02:40 -0400 (EDT), Bill Rowe wrote: > On 4/7/10 at 3:21 AM, jihane.ajaja(a)mail.mcgill.ca (jihane) wrote: > >> Thank you for all your replies. To give more details about my file: >> it is a file with numerical data, presented in 3 columns for x y and >> z axis. (they are acceleration measurements). My computer is a 64 >> bits machine with 8 GB of RAM. Why my file is that huge? well the >> measurements are done 1000 times per second. I can't ignore parts of >> the data. I need to analyze all of it. I don't think of any other >> useful detail I could provide. What I want to do with this data is >> do some basic calculations and generate a couple of plots. >> > A few thoughts. Your other message indicated each value in your > file is represented with ~16 characters including the character > that acts as the column separator. A bit of experimentation on > my system (also a 64 bit system) with ByteCount indicates > Mathematica stores a single real value using 16 bytes. So, the > 4GB file you have will occupy a bit more than 4GB of internal > memory since I assume the data will be in an array. Mathematica > has some overhead associated with internal storage of arrays in > addition to the storage for each element of the array. > > Since this is more than half the RAM you have available, it is > clear any operation that would make a copy of the array will > fail due to insufficient memory. In fact, when you consider RAM > needed by your operating system, the Mathematica program and > anything else you might have running, it is pretty easy to see > there won't be much you can do with that much data read into RAM. > > Further, even if you could actually create a simple time plot of > all the data after it is read into RAM, the resolution of your > screen and printer is not sufficient to display that much data > at once. For example, my system has a horizontal display width > of 1280 pixels. If I were to plot a data array with say 10,000 > data points, it is clear some of those points must overlap. And > note, 10,000 real triplets would need less than ~500 KB to > store. So, 10,000 data points is clearly a small subset of your data. > > Your only real choices to work with as much data as you've > described, are either to intelligently downsample the data or > break the data up into smaller blocks and work with those. And > note, this really isn't a limitation Mathematica imposes. It is > an inherent limitation of the amount of RAM you have available > and real display sizes. I wouldn't dispute your general remarks about the wisdom of down-sampling. But let me just correct one statement you made. You correctly point out that a Real consumes 16 bytes. But it is not correct to generalize from this that an array from Reals will consume 16 bytes per Real. If the array is constructed in such a way so that the numbers pack, then you'll get 8 bytes per Real, plus a small number of additional bytes for the array as a whole. Some evidence to show this... In[1]:= ByteCount[RandomReal[]] Out[1]= 16 In[2]:= ByteCount[Table[RandomReal[], {1000}]] Out[2]= 8124 Although I'm aware of some of the issues, I'm not the best person to discuss the details of dealing with packed arrays. I will say that the typical experience should be that, as long as you're dealing with uniform data (all Real or all Integer), Mathematica should automatically pack things for you. Of course, packed arrays only throw off your numbers by a factor of two, so I'll reemphasize that this doesn't invalidate your general conclusions about how to tackle the problem. Sincerely, John Fultz jfultz(a)wolfram.com User Interface Group Wolfram Research, Inc.
First
|
Prev
|
Pages: 1 2 Prev: math lib change in OS X bug + workaround Next: trying to use InputField as a continually attentive |