From: per isakson on
per isakson wrote:
>
>
> Francis Burton wrote:
>>
>>
>> In article <4524edd1$1(a)news1.ethz.ch>,
>> Michael Wild <themiwi.REMOVE.THIS(a)student.ethz.ch> wrote:
>>>or if you are on linux/unix/mac and this is not a single
> incident,
>> then
>>>you might want to read up on a tool called awk.
>>
>> Or, even simpler, tr(1).
>>
>> If the poster really wants to do it in Matlab, he could try
>> something like:
>>
>> tname = tempname;
>> fidi=fopen(fname, 'r');
>> fido=fopen(tname, 'w');
>> while feof(fidi) == 0
>> line = fgetl(fidi);
>> line = strrep(line, ',', '.');
>> fprintf(fido, '%s\n', line);
>> end
>> fclose(fidi);
>> fclose(fido);
>> m = dlmread(tname); % or other 'import' of choice
>> delete(tname);
>>
>> Not the most efficient method, but easy to understand.
>>
>> Francis
>>
>
> If the size of the text file is less than half of the available
> memory here is function that creates a file with decimal points.
> All
> commas are replace by points.
>
>>> tic, comma2point( 'd:\slask\comma.txt',
'd:\slask\point.txt'
> ),
> toc
> Elapsed time is 0.050921 seconds.
>>>
>
> function comma2point( strInFileSpec, strOutFileSpec )
>
> fid = fopen( strInFileSpec, 'r' );
> str = fread( fid, inf, '*char' );
> fclose( fid );
>
> str = strrep( str', ',', '.' );
>
> fid = fopen( strOutFileSpec, 'w' );
> fwrite( fid, str, 'char' );
> fclose( fid );
>
> end
>
> / per

The timing is rather meaningless without knowing that the size of
strInFileSpec was 1MB. That fast enough for me.

/ per
From: Rune Allnor on

per isakson skrev:

> > Elapsed time is 0.050921 seconds.

> The timing is rather meaningless without knowing that the size of
> strInFileSpec was 1MB. That fast enough for me.

I had a similar problem as the OP a few months ago. We were
issuing ASCII data files somewhere, and had done that for a
couple of weeks when somebody discovered the very same
problem as the OP has. Commas as decimal separators where
there ought to have been dots.

We were kindly asked to take all our 3GB of data files back,
some 100-150 files of 5MB to 30MB size, and convert them
ASAP.

Somebody had used the find..replace trick before I came back
on watch, and had spent 12 hours on the job. Unfortunately,
the converted files were in an in-house respository and not up
to date with the book-keeping of deliveries. So I had the choise
of either get through 150 files and try and get them correctly
logged, or convert the files I knew to be correctly logged.
I chose the latter.

The brute-force-naive matlab code worked, but was ridiculously
slow. It read one line at the time, did the conversion, and wrote
the result to the destination. Easy to code, though, it took me 5
min or so to get the first draft up and running. When I checked
the converted files and saw that they were correct, I just let what
had been supposed to be a first test and verification run, process
the full data set. The job took some 3 hrs in all. Some of the
most tense hours I can remember...

BTW, the root cause of the problem was some "reginoal
settings" in the Windows OS. In some areas of the world,
the convention is to use comma as decimal separator.
Once all PCs in the processing loop were set to "English",
we had no more problems. Maybe something for the OP
to be aware of?

Rune

From: Miroslav Balda on
Tobias wrote:
>
>
> Hi,
>
> I guess you import the data as text and convert it then to numbers.
> Try 'strrep' before you convert the text to numbers.
>
> Tobias
>
> Jake the Snake schrieb:
>> Hello,
>>
>> I have a huge amount of numbers in a .txt file. The numbers are
> in the form 2,43252e+1. I need to replace the , with . How should I
> do this? I'd prefer some import method that does this during the
> import procedure.
>>
>> -Janne
>

Try the ffread.m from FEX #9034.

Mira
From: Titus Edelhofer on
Hi Rune, Michael, per, ...
what about memmapfile? It's simple and fast for this kind of problem (here
for R2006b):

function comma2point(filename)
file=memmapfile(filename,'writable',true);
comma=uint8(',');
point=uint8('.');
file.Data((file.Data==comma)') = point;
delete(file)

Timing: 20MB file in 0.7seconds

Cheers,
Titus


"Rune Allnor" <allnor(a)tele.ntnu.no> schrieb im Newsbeitrag
news:1160194867.118331.87740(a)e3g2000cwe.googlegroups.com...
>
> Rune Allnor skrev:
>> Jake the Snake skrev:
>> > Rune Allnor wrote:
>> > >
>> > > If you have one file and the decimal separators are the only commas
>> > > in the file, load the file into some ascii text editor (notepad,
>> > > emacs,
>> > >
>> > > maube even matlab's editor) and use the "find..replace" function.
>> > >
>> > > It will take you 2 seconds to do and the computer may be running
>> > > for a couple of minutes. Don't spend more of YOUR time (the
>> > > computer
>> > > run-time is irrelevant if this truly is a one-off incident) with
>> > > this
>> > > than
>> > > necessary, unless there are other complicating factors or you
>> > > expect
>> > > or fear that this number format will occur again.
>> > >
>> > > Rune
>> > >
>> > Well that's just the thing. I have constant stream of those files
>> > coming in and each has something like half a million commas. I've tried
>> > notepad and Word. Notepad crashes and Word does it, but it takes 15
>> > minutes. Ok, I CAN do this, but I was just wondering if I can do it
>> > more efficiently with some selfmade code.
>>
>> I have some C++ code that does the conversion in some 20 MB of
>> ASCII numbers in just less than a second. I have tried to get it to
>> compile
>> as a MEX file, but there are some memory issues with matlab.
>
> The memory issues are solved, so below is the MEXable C++ version.
> I always thought that the matlab LCC C compiler handled C++ code.
> It seems, unfortunately, it doesn't. So you need an external C++
> compiler
> to get this to run... but then, it shouldn't be too hard to convert
> this to C.
>
> Oh well.
>
> Save the code to some cpp file, say, convert.cpp,
> mex it, and use the routine as
>
> convert('input.txt','output.txt');
>
> Note that the routine converts ALL commas in a file to dot.
>
> On my computer (1.4 GHz) 20+ MB worth of text is scanned
> and all commas converted in 1 s to 2 s.
>
> Rune
>
> /*************************************************************/
> #include <fstream>
> #include "mex.h"
>
> using namespace std;
>
> extern void _main();
>
> void mexFunction(
> int nlhs,
> mxArray *plhs[],
> int nrhs,
> const mxArray *prhs[]
> )
> {
> char* fname0;
> char* fname1;
> ifstream fin;
> ofstream fout;
>
> if (nrhs != 2) {
> mexErrMsgTxt("Exacly two input arguments required.");
> }
>
> if((!mxIsChar(prhs[0]))||(!mxIsChar(prhs[1]))){
> mexErrMsgTxt("Inputs must be of type char.");
> }
>
> int N0 = mxGetN(prhs[0]);
> int M0 = mxGetM(prhs[0]);
> int N1 = mxGetN(prhs[1]);
> int M1 = mxGetM(prhs[1]);
>
> if((M0!=1)||(M1!=1)){
> mexErrMsgTxt("Inputs must have exactly one row.");
> }
>
> fname0 = (char*)mxMalloc(N0+1);
> fname0 = mxArrayToString(prhs[0]);
> fname1 = (char*)mxMalloc(N1+1);
> fname1 = mxArrayToString(prhs[1]);
>
> fin.open(fname0,ios_base::binary);
> if (fin.bad())
> {
> mexErrMsgTxt("Could not open source file.");
> }
> fin.seekg(0,ios::end);
> long int buffersize = fin.tellg();
> fin.seekg(0,ios::beg);
>
> fout.open(fname1,ios_base::binary);
>
> char* buffer = (char*) mxMalloc(buffersize);
> long int n;
>
> fin.read(buffer,buffersize);
> for (n=0;n<buffersize;n++)
> {
> if (buffer[n]==',')
> {
> buffer[n] = '.';
> }
> }
>
> fout.write(buffer,buffersize);
> fout.close();
> fin.close();
>
> mxFree(fname0);
> mxFree(fname1);
> mxFree(buffer);
>
> return;
> }
>


From: Steve Amphlett on
<snip, many complicated solutions...

Here's a simple C solution:

#include <stdio.h>
int main(void)
{
char c;
while((c=getchar())!=EOF) putchar(c==','? '.' : c);
}

Usage:

a.out < filein > fileout

Or for a load of files (in C shell):

foreach file ( * )
a.out < $file > temp
mv tmp $file
end
First  |  Prev  |  Next  |  Last
Pages: 1 2 3 4
Prev: Flight gear Simulink
Next: error saving figure