From: xlr82sas on
On Feb 26, 10:09 pm, xlr82sas <xlr82...(a)aol.com> wrote:
> On Feb 26, 7:29 pm, xlr82sas <xlr82...(a)aol.com> wrote:
>
>
>
>
>
> > On Feb 26, 9:06 am, alan.church...(a)SAVIAN.NET (Alan Churchill) wrote:
>
> > > Again, piping is easy. I'll try and toss something together this weekend and
> > > pipe delimited, Excel, and XML.
>
> > > I am a bit unclear on what it is you need. If I dump the data to a delimited
> > > file, for example, I treat the values as doubles (do not print hex
> > > representation) so you should have better representation anyway. You can
> > > always convert them as needed.
>
> > > Alan
>
> > > Alan Churchill
> > > Savianwww.savian.net
> > > Office:   (719) 687-5954
> > > Cell:      (719) 310-4870
>
> > > -----Original Message-----
> > > From: SAS(r) Discussion [mailto:SA...(a)LISTSERV.UGA.EDU] On Behalf Of
>
> > > xlr82sas
> > > Sent: Thursday, February 25, 2010 7:43 PM
> > > To: SA...(a)LISTSERV.UGA.EDU
> > > Subject: Re: dsread - Windows command-line utility for SAS7BDAT files
>
> > > Hi Alan and Chis
>
> > >     What interests me is communications with open source products,
> > > like R, perl and MySQL. Export and import needs to be lossless. My
> > > understanding is that SAS -ODBC/OleDB... requires you to bring up
> > > SAS(ODBC server) and the server has to be closed manually. I feel this
> > > is unsatisfactory. The crux of the issue is IEEE floating point. I
> > > think HEX16. would be lossless, all  I have to in R is convert the
> > > character  hex16. to  floating point. The nice thing about dsread is
> > > it is light, fast and you do not need SAS and it looks like it
> > > supports pipes.
>
> > > On Feb 25, 12:15 pm, xlr82sas <xlr82...(a)aol.com> wrote:
> > > > On Feb 25, 10:43 am, xlr82sas <xlr82...(a)aol.com> wrote:
>
> > > > > On Feb 25, 6:45 am, ChrisBLong <ch...(a)oview.co.uk> wrote:
>
> > > > > > Hi,
>
> > > > > > Readers of this group may be interested in dsread, available
> > > athttp://www.oview.co.uk/dsread
>
> > > > > > It's a command-line utility that understands the SAS7BDAT file
> > > > > > format.  It lets you examine the structure of datasets conveniently
> > > > > > from the command-line, and converts SAS7BDAT data into valid CSV
> > > > > > format for import into other software.
>
> > > > > > All comments and suggestions gratefully received,
>
> > > > > > Chris.
>
> > > > > Hi Chris,
>
> > > > >    Congratulations for reading SAS datasets. WPS has powerfull
> > > > > capabilities. Thanks!!! Competition is great.
>
> > > > >    Even though you only create csv's, I see this as a great product
> > > > > because you do not need SAS and with pipes users can programtically
> > > > > get at SAS data from other languages. Also CSVs are also very amenable
> > > > > to EXCEL.
>
> > > ===========================================================================­­­
> > > ===================================================
>
> > > > > Just some thoughts:
>
> > > > >    Any chance you could create a lossless output format, like SAS
> > > > > export datasets, but allow for longer names and
> > > > > character values greater than 200 bytes. This would open up SAS
> > > > > datasets to other languages. The format would have to be open.
>
> > > > >   This could be a really big deal, if instead of a csv, you created R
> > > > > dataframes, if called from R. An even bigger deal would be if you
> > > > > created a SAS dataset from an R dataframe.
>
> > > > >   XML would be another nice output.
>
> > > > >   A silient ODBC would also be great.
>
> > > > >   I bet you can use pipes whith yow command line interface.
>
> > > > My apologies.
>
> > > > I assumed your site was somehow affiliated with WPS.
> > > > It looks like it may not be.
>
> > > > Which makes what you have done all the more remarkable.
>
> > > > I was hoping you honored formats because I wanted to associate hex16
> > > > with the numeric columns so I could create a lossless csv, but it did
> > > > not work.- Hide quoted text -
>
> > > > - Show quoted text -- Hide quoted text -
>
> > > - Show quoted text -
>
> > Hi,
>
> >   I think you may have misunderstood me, when I said pipe, I did not
> > mean pipe delimited.
>
> >  I meant
>
> >  filename pyp pipe "dsread.exe < data.sas7bdat";
>
> >   I do plan on using dsread this weekend.
>
> >   I will share my experieces with you.- Hide quoted text -
>
> > - Show quoted text -
>
> FYI
>
> I just posted this to the R group
>
> If I do the following
>
> sprintf("%A",pi)
> "0X1.921FB54442D18"
>
> I have this 16 byte character string
>
> hx<-"400921FB54442D18"
>
> This is the exact hex16 representation of PI in
> IEEE float that R uses in Intel 32bit(little endian) Windows
> SAS uses the same representation. 11 bit exponent and 53 bit
> mantissa.
>
> I want to do is recreate the float exactly from the 16 char hex
>
> something like
>
> MyPI<-readChar(hx,numeric(),16)
>
> or in SAS
>
> MyPI=input("400921FB54442D18",hex16.);
> put MyPI=;
>
> MYPI=3.1415926536
>
> What I am trying to do is set up a lossless
> transfer method from SAS to R- Hide quoted text -
>
> - Show quoted text -

Hi,

You utility solves the 200 byte, 8 char name and potential precision
errors with other methods of transfering SAS datasets to perl and R.
Thanks.

Importing SAS datasets(sas7bdat) into R
(32 bit windows 2000, 32 bit SAS 9.2 and
32 bit R version 2.9.0 (2009-04-17)

Here is what I want to accomplish, the double floats below show data
from SAS to R.
They are exactly the same in R and SAS memory, bit for bit.

R Internal SAS Internal
16 Byte Float 16 byte Float

3FFAAAAAAAAAAAAB 3FFAAAAAAAAAAAAB
4002AAAAAAAAAAAB 4002AAAAAAAAAAAB
400D555555555555 400D555555555555
3FF6666666666666 3FF6666666666666
3FFCCCCCCCCCCCCD 3FFCCCCCCCCCCCCD
400199999999999A 400199999999999A
4004CCCCCCCCCCCD 4004CCCCCCCCCCCD
3FF4924924924925 3FF4924924924925
3FF9249249249249 3FF9249249249249
3FFDB6DB6DB6DB6E 3FFDB6DB6DB6DB6E
4001249249249249 4001249249249249
3FF2E8BA2E8BA2E9 3FF2E8BA2E8BA2E9
3FF5D1745D1745D1 3FF5D1745D1745D1
3FF8BA2E8BA2E8BA 3FF8BA2E8BA2E8BA
3FFBA2E8BA2E8BA3 3FFBA2E8BA2E8BA3
3FF2762762762762 3FF2762762762762
3FF4EC4EC4EC4EC5 3FF4EC4EC4EC4EC5
3FF7627627627627 3FF7627627627627
3FF9D89D89D89D8A 3FF9D89D89D89D8A
1.7976931348623E 1.7976931348623E
0010000000000000 0010000000000000

I don't believe this high accuracy transfer is possible with any
other method except ODBC,
but SAS ODBC is unsatisfactory for me. If you use CSV with the maximum
assured decimal
precision(15 significant digits?). The CSV decimal numbers will only
approximate the double floats.

I consider the Csv to be corrupt if the relative of absolute
difference using the decimal
Csv numbers and the memory floats is greater than 10^-12. There are
two sources of error first
the SAS floats are decimally rounded and converted to decimal then the
rounded decimal
approximations are converted into R floats.

Status of R Internal CSV
Csv 16 Byte Float

Csv corrupt 3FFAAAAAAAAAAAAB 1.66666666666667 >10^-12 different
Csv corrupt 4002AAAAAAAAAAAB 2.33333333333333
Csv corrupt 400D555555555555 3.66666666666667
Csv OK 3FF6666666666666 1.4
Csv OK 3FFCCCCCCCCCCCCD 1.8
Csv OK 400199999999999A 2.2
Csv OK 4004CCCCCCCCCCCD 2.6
Csv corrupt 3FF4924924924925 1.28571428571429
Csv corrupt 3FF9249249249249 1.57142857142857
Csv corrupt 3FFDB6DB6DB6DB6E 1.85714285714286
Csv corrupt 4001249249249249 2.14285714285714
Csv corrupt 3FF2E8BA2E8BA2E9 1.18181818181818
Csv corrupt 3FF5D1745D1745D1 1.36363636363636
Csv corrupt 3FF8BA2E8BA2E8BA 1.54545454545455
Csv corrupt 3FFBA2E8BA2E8BA3 1.72727272727273
Csv corrupt 3FF2762762762762 1.15384615384615
Csv corrupt 3FF4EC4EC4EC4EC5 1.30769230769231
Csv corrupt 3FF7627627627627 1.46153846153846
Csv corrupt 3FF9D89D89D89D8A 1.61538461538462
Csv corrupt 1.7976931348623E 1.7976931348623E+308
Csv corrupt 0010000000000000 2.2250738585072E-308

Bacground

1. Provide absolutely loss less transfer
of character(max 32756 bytes per character variable) and numeric
data from SAS to R
Since SAS has only two datatypes so this code should be
exhaustive.

2. This code is useful because:
a. The SAS ODBC driver requires the user to not only have
SAS but the user must bring up a SAS session and
the session has to be closed manually. (SAS issue not a
foreign issue)
b. The foreign package also requires interaction with SAS. (SAS
issue)
c. SASxport only supports 8 character SAS names and a max of
200 byte character values. (This is a SAS issue not a SASxport
issue)
d. SASxport creates floating point doubles that have an 8 bit
exponent
and 56 bit mantissa while IEEE is 11 bit exponent and 53 bit
mantissa
(sometimes defined slightly differently depending of where you
consider
the sign bits). This results is the loss of some very small
and
very large numbers. ( SAS issue not a SASxport issue)

3. How this code overcomes the issues above for import only.

You need the dsread exec in the previous mesage. Also the input
SAS dataset must have
16 byte character representations for the floats. I am working with
the developer to see what we
can do about this..
He will make it an option on the invocation to do the hex conversion
for numerics.

Here is the R code run inside a SAS datastep. Actually I can interact
with the output of the R code
in the same dataqstep. It is also possible to run perl, SAS procs and
other SAS languages in the same datastep.
Note the input pipe, no physical CSV file is produced).

If there is interest I can provide the code that executes R.

data _null_;
length pgm $1250;
pgm=compbl("
library (SASxport);
library (foreign);
hexdigits <- function(s) {;
digits <- 0:15;
names(digits) <- c(0:9, LETTERS[1:6]);
digits[strsplit(s, '')[[1]]];
};
bytes <- function(s) {;
digits <- matrix(hexdigits(s), ncol=2, byrow=TRUE);
digits;
as.raw(digits %*% c(16,1));
};
todouble <- function(bytes) {;
con <- rawConnection(bytes);
val <- readBin(con, 'double', endian='big');
close(con);
val;
};
x <-c(1:21);
rc<-c(1:21);
ln<-c(1:21);
z<-read.table(pipe('C:\\tip\\dsread.exe -v C:\\tip\
\fix.sas7bdat'),header=TRUE,sep=',',colClasses='character');
st<-z$STR;
lin<-z$LIN;
d<-as.numeric(z$DECIMAL_REPRESENTATION);
h<-as.character(z$HEXIDECIMAL_REPRESENTATION);
for ( i in 1:21 ) {;
x[i] <- todouble(bytes(h[i]));
rc[i] <- if (((abs( x[i] - d[i] ) > 1E-12 )) || ;
(abs((x[i] - d[i])/x[i] ) > 1E-12 )) 0 else 1;
ln[i] <- nchar(st[i], type = 'bytes');
};
R_ntrnl <-h ;
SASntrnl <-h ;
R_deciml <-sprintf('%.14e',x);
SAS_deciml <-sprintf('%.14e',x);
Csv_stmat <-z$DECIMAL_UNTOUCHED;
Corrupt <-rc;
datfrm <-
data.frame(R_ntrnl ,SASntrnl ,R_deciml ,SAS_deciml ,Csv_stmat ,Corrupt,ln,lin);
write.xport(datfrm,file='C:\\utl\
\datfrm.xpt',autogen.formats=FALSE);
");
call rxeq(pgm);
call getxpt('datfrm');
run;

SAS code to create fix.sas7bdat

options xsync xwait;run;
%let fac=1000;
data "c:\tip\fix.sas7bdat"(drop=prime nonprime byt);
retain byt 0 str;
length str $%eval(&fac * 32);
do prime=3,5,7,11,13;
do nonprime=2,4,6,8;
byt+&fac;
str=repeat(byte(64+byt/&fac),byt);
decimal_representation =nonprime/prime+1;
hexidecimal_representation=put(decimal_representation,hex16.);
decimal_untouched =cats(put(round(decimal_representation,
1e-14),best32.));
lin=length(str);
if decimal_representation ne 3 then output;
end;
end;
decimal_representation =constant('big');
hexidecimal_representation=put(constant('big'),e20.);
decimal_untouched =cats(put(decimal_representation,e20.));
str=repeat('@',%eval(&fac * 30));
lin=length(str);
output;
decimal_representation =constant('small');
hexidecimal_representation=put(constant('small'),hex16.);
decimal_untouched =cats(put(decimal_representation,e20.));
str=repeat('@',%eval(&fac * 32));
lin=length(str);
output;
format _numeric_ e20.;
run;
From: Alan Churchill on
I put in support for the command-line interface this weekend and have it so
it exports to Excel, delimited, and XML. I did not put in the lossless as it
requires a bit more effort and I had to work on some other stuff. The
command-line will be:

sds.exe /parms="c:\temp\parms.xml"

Parms.xml will look like this:

<?xml version="1.0" encoding="utf-8" ?>
<Export>
<Xml file="c:\temp\sas7bdat.xml"/>
<Excel file="c:\temp\sas7bdat.xls">
<WorksheetName>Sas Data</WorksheetName>
<StartCell>R2C2</StartCell>
</Excel>
<Delimited file="c:\temp\sas7bdat.txt" delimiter="\t" headers="true"
surroundWithQuotes="true"/>
<Contents file="c:\temp\sas7bdatContents.txt"/>
</Export>

I used an xml file since the number of command-line switches will grow a lot
as additional features are added, especially import into SAS.

I will release this later this week (hopefully) since I am also trying to
put a SAS dataset viewer in it as well.

Alan

Alan Churchill
Savian
www.savian.net
Office: (719) 687-5954
Cell: (719) 310-4870


-----Original Message-----
From: SAS(r) Discussion [mailto:SAS-L(a)LISTSERV.UGA.EDU] On Behalf Of
xlr82sas
Sent: Friday, February 26, 2010 7:56 PM
To: SAS-L(a)LISTSERV.UGA.EDU
Subject: Re: dsread - Windows command-line utility for SAS7BDAT files

On Feb 26, 9:06 am, alan.church...(a)SAVIAN.NET (Alan Churchill) wrote:
> Again, piping is easy. I'll try and toss something together this
> weekend and pipe delimited, Excel, and XML.
>
> I am a bit unclear on what it is you need. If I dump the data to a
> delimited file, for example, I treat the values as doubles (do not
> print hex
> representation) so you should have better representation anyway. You
> can always convert them as needed.
>
> Alan
>
> Alan Churchill
> Savianwww.savian.net
> Office: (719) 687-5954
> Cell: (719) 310-4870
>
>
>
> -----Original Message-----
> From: SAS(r) Discussion [mailto:SA...(a)LISTSERV.UGA.EDU] On Behalf Of
>
> xlr82sas
> Sent: Thursday, February 25, 2010 7:43 PM
> To: SA...(a)LISTSERV.UGA.EDU
> Subject: Re: dsread - Windows command-line utility for SAS7BDAT files
>
> Hi Alan and Chis
>
> What interests me is communications with open source products,
> like R, perl and MySQL. Export and import needs to be lossless. My
> understanding is that SAS -ODBC/OleDB... requires you to bring up
> SAS(ODBC server) and the server has to be closed manually. I feel this
> is unsatisfactory. The crux of the issue is IEEE floating point. I
> think HEX16. would be lossless, all I have to in R is convert the
> character hex16. to floating point. The nice thing about dsread is
> it is light, fast and you do not need SAS and it looks like it
> supports pipes.
>
> On Feb 25, 12:15 pm, xlr82sas <xlr82...(a)aol.com> wrote:
> > On Feb 25, 10:43 am, xlr82sas <xlr82...(a)aol.com> wrote:
>
> > > On Feb 25, 6:45 am, ChrisBLong <ch...(a)oview.co.uk> wrote:
>
> > > > Hi,
>
> > > > Readers of this group may be interested in dsread, available
> athttp://www.oview.co.uk/dsread
>
> > > > It's a command-line utility that understands the SAS7BDAT file
> > > > format. It lets you examine the structure of datasets
> > > > conveniently from the command-line, and converts SAS7BDAT data
> > > > into valid CSV format for import into other software.
>
> > > > All comments and suggestions gratefully received,
>
> > > > Chris.
>
> > > Hi Chris,
>
> > > Congratulations for reading SAS datasets. WPS has powerfull
> > > capabilities. Thanks!!! Competition is great.
>
> > > Even though you only create csv's, I see this as a great
> > > product because you do not need SAS and with pipes users can
> > > programtically get at SAS data from other languages. Also CSVs are
> > > also very amenable to EXCEL.
>
> ======================================================================
> =====� ===================================================
>
> > > Just some thoughts:
>
> > > Any chance you could create a lossless output format, like SAS
> > > export datasets, but allow for longer names and character values
> > > greater than 200 bytes. This would open up SAS datasets to other
> > > languages. The format would have to be open.
>
> > > This could be a really big deal, if instead of a csv, you
> > > created R dataframes, if called from R. An even bigger deal would
> > > be if you created a SAS dataset from an R dataframe.
>
> > > XML would be another nice output.
>
> > > A silient ODBC would also be great.
>
> > > I bet you can use pipes whith yow command line interface.
>
> > My apologies.
>
> > I assumed your site was somehow affiliated with WPS.
> > It looks like it may not be.
>
> > Which makes what you have done all the more remarkable.
>
> > I was hoping you honored formats because I wanted to associate hex16
> > with the numeric columns so I could create a lossless csv, but it
> > did not work.- Hide quoted text -
>
> > - Show quoted text -- Hide quoted text -
>
> - Show quoted text -

Thanks for the hex16 consideration. With this I should be able to try
'lossless' communications with R.
I won't have to shorten variable names to 8 bytes or restrict char length to
200.
I like the fact that I can get character lengths from your 'contents'
feature, users do not have to worry about traucation.

Thanks again

Regards
From: Alan Churchill on
I am pretty sure I have this operational. Keep in mind that the need for
lossless has to happen at the dataset level since the data will be read in
before you want to export it. Hence, it is a property you can set at the
dataset level and it will store the numerics as a string of hex values.

It then will output the value in all of the exported values (delimited,
Excel, XML).

It will be in the next release.

Alan

Alan Churchill
Savian
www.savian.net
Office: (719) 687-5954
Cell: (719) 310-4870


-----Original Message-----
From: SAS(r) Discussion [mailto:SAS-L(a)LISTSERV.UGA.EDU] On Behalf Of
xlr82sas
Sent: Monday, March 01, 2010 2:18 AM
To: SAS-L(a)LISTSERV.UGA.EDU
Subject: Re: dsread - Windows command-line utility for SAS7BDAT files

On Feb 26, 10:09 pm, xlr82sas <xlr82...(a)aol.com> wrote:
> On Feb 26, 7:29 pm, xlr82sas <xlr82...(a)aol.com> wrote:
>
>
>
>
>
> > On Feb 26, 9:06 am, alan.church...(a)SAVIAN.NET (Alan Churchill) wrote:
>
> > > Again, piping is easy. I'll try and toss something together this
> > > weekend and pipe delimited, Excel, and XML.
>
> > > I am a bit unclear on what it is you need. If I dump the data to a
> > > delimited file, for example, I treat the values as doubles (do not
> > > print hex
> > > representation) so you should have better representation anyway.
> > > You can always convert them as needed.
>
> > > Alan
>
> > > Alan Churchill
> > > Savianwww.savian.net
> > > Office: (719) 687-5954
> > > Cell: (719) 310-4870
>
> > > -----Original Message-----
> > > From: SAS(r) Discussion [mailto:SA...(a)LISTSERV.UGA.EDU] On Behalf
> > > Of
>
> > > xlr82sas
> > > Sent: Thursday, February 25, 2010 7:43 PM
> > > To: SA...(a)LISTSERV.UGA.EDU
> > > Subject: Re: dsread - Windows command-line utility for SAS7BDAT
> > > files
>
> > > Hi Alan and Chis
>
> > > What interests me is communications with open source products,
> > > like R, perl and MySQL. Export and import needs to be lossless. My
> > > understanding is that SAS -ODBC/OleDB... requires you to bring up
> > > SAS(ODBC server) and the server has to be closed manually. I feel
> > > this is unsatisfactory. The crux of the issue is IEEE floating
> > > point. I think HEX16. would be lossless, all I have to in R is
> > > convert the character hex16. to floating point. The nice thing
> > > about dsread is it is light, fast and you do not need SAS and it
> > > looks like it supports pipes.
>
> > > On Feb 25, 12:15 pm, xlr82sas <xlr82...(a)aol.com> wrote:
> > > > On Feb 25, 10:43 am, xlr82sas <xlr82...(a)aol.com> wrote:
>
> > > > > On Feb 25, 6:45 am, ChrisBLong <ch...(a)oview.co.uk> wrote:
>
> > > > > > Hi,
>
> > > > > > Readers of this group may be interested in dsread, available
> > > athttp://www.oview.co.uk/dsread
>
> > > > > > It's a command-line utility that understands the SAS7BDAT
> > > > > > file format. It lets you examine the structure of datasets
> > > > > > conveniently from the command-line, and converts SAS7BDAT
> > > > > > data into valid CSV format for import into other software.
>
> > > > > > All comments and suggestions gratefully received,
>
> > > > > > Chris.
>
> > > > > Hi Chris,
>
> > > > > Congratulations for reading SAS datasets. WPS has powerfull
> > > > > capabilities. Thanks!!! Competition is great.
>
> > > > > Even though you only create csv's, I see this as a great
> > > > > product because you do not need SAS and with pipes users can
> > > > > programtically get at SAS data from other languages. Also CSVs
> > > > > are also very amenable to EXCEL.
>
> > > ==================================================================
> > > =========��� ===================================================
>
> > > > > Just some thoughts:
>
> > > > > Any chance you could create a lossless output format, like
> > > > > SAS export datasets, but allow for longer names and character
> > > > > values greater than 200 bytes. This would open up SAS datasets
> > > > > to other languages. The format would have to be open.
>
> > > > > This could be a really big deal, if instead of a csv, you
> > > > > created R dataframes, if called from R. An even bigger deal
> > > > > would be if you created a SAS dataset from an R dataframe.
>
> > > > > XML would be another nice output.
>
> > > > > A silient ODBC would also be great.
>
> > > > > I bet you can use pipes whith yow command line interface.
>
> > > > My apologies.
>
> > > > I assumed your site was somehow affiliated with WPS.
> > > > It looks like it may not be.
>
> > > > Which makes what you have done all the more remarkable.
>
> > > > I was hoping you honored formats because I wanted to associate
> > > > hex16 with the numeric columns so I could create a lossless csv,
> > > > but it did not work.- Hide quoted text -
>
> > > > - Show quoted text -- Hide quoted text -
>
> > > - Show quoted text -
>
> > Hi,
>
> > I think you may have misunderstood me, when I said pipe, I did not
> > mean pipe delimited.
>
> > I meant
>
> > filename pyp pipe "dsread.exe < data.sas7bdat";
>
> > I do plan on using dsread this weekend.
>
> > I will share my experieces with you.- Hide quoted text -
>
> > - Show quoted text -
>
> FYI
>
> I just posted this to the R group
>
> If I do the following
>
> sprintf("%A",pi)
> "0X1.921FB54442D18"
>
> I have this 16 byte character string
>
> hx<-"400921FB54442D18"
>
> This is the exact hex16 representation of PI in IEEE float that R uses
> in Intel 32bit(little endian) Windows SAS uses the same
> representation. 11 bit exponent and 53 bit mantissa.
>
> I want to do is recreate the float exactly from the 16 char hex
>
> something like
>
> MyPI<-readChar(hx,numeric(),16)
>
> or in SAS
>
> MyPI=input("400921FB54442D18",hex16.);
> put MyPI=;
>
> MYPI=3.1415926536
>
> What I am trying to do is set up a lossless transfer method from SAS
> to R- Hide quoted text -
>
> - Show quoted text -

Hi,

You utility solves the 200 byte, 8 char name and potential precision
errors with other methods of transfering SAS datasets to perl and R.
Thanks.

Importing SAS datasets(sas7bdat) into R
(32 bit windows 2000, 32 bit SAS 9.2 and
32 bit R version 2.9.0 (2009-04-17)

Here is what I want to accomplish, the double floats below show data from
SAS to R.
They are exactly the same in R and SAS memory, bit for bit.

R Internal SAS Internal
16 Byte Float 16 byte Float

3FFAAAAAAAAAAAAB 3FFAAAAAAAAAAAAB
4002AAAAAAAAAAAB 4002AAAAAAAAAAAB
400D555555555555 400D555555555555
3FF6666666666666 3FF6666666666666
3FFCCCCCCCCCCCCD 3FFCCCCCCCCCCCCD
400199999999999A 400199999999999A
4004CCCCCCCCCCCD 4004CCCCCCCCCCCD
3FF4924924924925 3FF4924924924925
3FF9249249249249 3FF9249249249249
3FFDB6DB6DB6DB6E 3FFDB6DB6DB6DB6E
4001249249249249 4001249249249249
3FF2E8BA2E8BA2E9 3FF2E8BA2E8BA2E9
3FF5D1745D1745D1 3FF5D1745D1745D1
3FF8BA2E8BA2E8BA 3FF8BA2E8BA2E8BA
3FFBA2E8BA2E8BA3 3FFBA2E8BA2E8BA3
3FF2762762762762 3FF2762762762762
3FF4EC4EC4EC4EC5 3FF4EC4EC4EC4EC5
3FF7627627627627 3FF7627627627627
3FF9D89D89D89D8A 3FF9D89D89D89D8A
1.7976931348623E 1.7976931348623E
0010000000000000 0010000000000000

I don't believe this high accuracy transfer is possible with any other
method except ODBC, but SAS ODBC is unsatisfactory for me. If you use CSV
with the maximum assured decimal
precision(15 significant digits?). The CSV decimal numbers will only
approximate the double floats.

I consider the Csv to be corrupt if the relative of absolute difference
using the decimal Csv numbers and the memory floats is greater than 10^-12.
There are two sources of error first the SAS floats are decimally rounded
and converted to decimal then the rounded decimal approximations are
converted into R floats.

Status of R Internal CSV
Csv 16 Byte Float

Csv corrupt 3FFAAAAAAAAAAAAB 1.66666666666667 >10^-12 different
Csv corrupt 4002AAAAAAAAAAAB 2.33333333333333
Csv corrupt 400D555555555555 3.66666666666667
Csv OK 3FF6666666666666 1.4
Csv OK 3FFCCCCCCCCCCCCD 1.8
Csv OK 400199999999999A 2.2
Csv OK 4004CCCCCCCCCCCD 2.6
Csv corrupt 3FF4924924924925 1.28571428571429
Csv corrupt 3FF9249249249249 1.57142857142857
Csv corrupt 3FFDB6DB6DB6DB6E 1.85714285714286
Csv corrupt 4001249249249249 2.14285714285714
Csv corrupt 3FF2E8BA2E8BA2E9 1.18181818181818
Csv corrupt 3FF5D1745D1745D1 1.36363636363636
Csv corrupt 3FF8BA2E8BA2E8BA 1.54545454545455
Csv corrupt 3FFBA2E8BA2E8BA3 1.72727272727273
Csv corrupt 3FF2762762762762 1.15384615384615
Csv corrupt 3FF4EC4EC4EC4EC5 1.30769230769231
Csv corrupt 3FF7627627627627 1.46153846153846
Csv corrupt 3FF9D89D89D89D8A 1.61538461538462
Csv corrupt 1.7976931348623E 1.7976931348623E+308
Csv corrupt 0010000000000000 2.2250738585072E-308

Bacground

1. Provide absolutely loss less transfer
of character(max 32756 bytes per character variable) and numeric data
from SAS to R
Since SAS has only two datatypes so this code should be exhaustive.

2. This code is useful because:
a. The SAS ODBC driver requires the user to not only have
SAS but the user must bring up a SAS session and
the session has to be closed manually. (SAS issue not a foreign
issue)
b. The foreign package also requires interaction with SAS. (SAS
issue)
c. SASxport only supports 8 character SAS names and a max of
200 byte character values. (This is a SAS issue not a SASxport
issue)
d. SASxport creates floating point doubles that have an 8 bit exponent
and 56 bit mantissa while IEEE is 11 bit exponent and 53 bit
mantissa
(sometimes defined slightly differently depending of where you
consider
the sign bits). This results is the loss of some very small and
very large numbers. ( SAS issue not a SASxport issue)

3. How this code overcomes the issues above for import only.

You need the dsread exec in the previous mesage. Also the input SAS
dataset must have
16 byte character representations for the floats. I am working with the
developer to see what we can do about this..
He will make it an option on the invocation to do the hex conversion for
numerics.

Here is the R code run inside a SAS datastep. Actually I can interact with
the output of the R code in the same dataqstep. It is also possible to run
perl, SAS procs and other SAS languages in the same datastep.
Note the input pipe, no physical CSV file is produced).

If there is interest I can provide the code that executes R.

data _null_;
length pgm $1250;
pgm=compbl("
library (SASxport);
library (foreign);
hexdigits <- function(s) {;
digits <- 0:15;
names(digits) <- c(0:9, LETTERS[1:6]);
digits[strsplit(s, '')[[1]]];
};
bytes <- function(s) {;
digits <- matrix(hexdigits(s), ncol=2, byrow=TRUE);
digits;
as.raw(digits %*% c(16,1));
};
todouble <- function(bytes) {;
con <- rawConnection(bytes);
val <- readBin(con, 'double', endian='big');
close(con);
val;
};
x <-c(1:21);
rc<-c(1:21);
ln<-c(1:21);
z<-read.table(pipe('C:\\tip\\dsread.exe -v C:\\tip\
\fix.sas7bdat'),header=TRUE,sep=',',colClasses='character');
st<-z$STR;
lin<-z$LIN;
d<-as.numeric(z$DECIMAL_REPRESENTATION);
h<-as.character(z$HEXIDECIMAL_REPRESENTATION);
for ( i in 1:21 ) {;
x[i] <- todouble(bytes(h[i]));
rc[i] <- if (((abs( x[i] - d[i] ) > 1E-12 )) || ;
(abs((x[i] - d[i])/x[i] ) > 1E-12 )) 0 else 1;
ln[i] <- nchar(st[i], type = 'bytes');
};
R_ntrnl <-h ;
SASntrnl <-h ;
R_deciml <-sprintf('%.14e',x);
SAS_deciml <-sprintf('%.14e',x);
Csv_stmat <-z$DECIMAL_UNTOUCHED;
Corrupt <-rc;
datfrm <-
data.frame(R_ntrnl ,SASntrnl ,R_deciml ,SAS_deciml ,Csv_stmat
,Corrupt,ln,lin);
write.xport(datfrm,file='C:\\utl\
\datfrm.xpt',autogen.formats=FALSE);
");
call rxeq(pgm);
call getxpt('datfrm');
run;

SAS code to create fix.sas7bdat

options xsync xwait;run;
%let fac=1000;
data "c:\tip\fix.sas7bdat"(drop=prime nonprime byt);
retain byt 0 str;
length str $%eval(&fac * 32);
do prime=3,5,7,11,13;
do nonprime=2,4,6,8;
byt+&fac;
str=repeat(byte(64+byt/&fac),byt);
decimal_representation =nonprime/prime+1;
hexidecimal_representation=put(decimal_representation,hex16.);
decimal_untouched =cats(put(round(decimal_representation,
1e-14),best32.));
lin=length(str);
if decimal_representation ne 3 then output;
end;
end;
decimal_representation =constant('big');
hexidecimal_representation=put(constant('big'),e20.);
decimal_untouched =cats(put(decimal_representation,e20.));
str=repeat('@',%eval(&fac * 30));
lin=length(str);
output;
decimal_representation =constant('small');
hexidecimal_representation=put(constant('small'),hex16.);
decimal_untouched =cats(put(decimal_representation,e20.));
str=repeat('@',%eval(&fac * 32));
lin=length(str);
output;
format _numeric_ e20.;
run;
From: xlr82sas on
On Mar 1, 1:52 pm, alan.church...(a)SAVIAN.NET (Alan Churchill) wrote:
> I am pretty sure I have this operational. Keep in mind that the need for
> lossless has to happen at the dataset level since the data will be read in
> before you want to export it. Hence, it is a property you can set at the
> dataset level and it will store the numerics as a string of hex values.
>
> It then will output the value in all of the exported values (delimited,
> Excel, XML).
>
> It will be in the next release.
>
> Alan
>
> Alan Churchill
> Savianwww.savian.net
> Office:   (719) 687-5954
> Cell:      (719) 310-4870
>
>
>
> -----Original Message-----
> From: SAS(r) Discussion [mailto:SA...(a)LISTSERV.UGA.EDU] On Behalf Of
>
> xlr82sas
> Sent: Monday, March 01, 2010 2:18 AM
> To: SA...(a)LISTSERV.UGA.EDU
> Subject: Re: dsread - Windows command-line utility for SAS7BDAT files
>
> On Feb 26, 10:09 pm, xlr82sas <xlr82...(a)aol.com> wrote:
> > On Feb 26, 7:29 pm, xlr82sas <xlr82...(a)aol.com> wrote:
>
> > > On Feb 26, 9:06 am, alan.church...(a)SAVIAN.NET (Alan Churchill) wrote:
>
> > > > Again, piping is easy. I'll try and toss something together this
> > > > weekend and pipe delimited, Excel, and XML.
>
> > > > I am a bit unclear on what it is you need. If I dump the data to a
> > > > delimited file, for example, I treat the values as doubles (do not
> > > > print hex
> > > > representation) so you should have better representation anyway.
> > > > You can always convert them as needed.
>
> > > > Alan
>
> > > > Alan Churchill
> > > > Savianwww.savian.net
> > > > Office:   (719) 687-5954
> > > > Cell:      (719) 310-4870
>
> > > > -----Original Message-----
> > > > From: SAS(r) Discussion [mailto:SA...(a)LISTSERV.UGA.EDU] On Behalf
> > > > Of
>
> > > > xlr82sas
> > > > Sent: Thursday, February 25, 2010 7:43 PM
> > > > To: SA...(a)LISTSERV.UGA.EDU
> > > > Subject: Re: dsread - Windows command-line utility for SAS7BDAT
> > > > files
>
> > > > Hi Alan and Chis
>
> > > >     What interests me is communications with open source products,
> > > > like R, perl and MySQL. Export and import needs to be lossless. My
> > > > understanding is that SAS -ODBC/OleDB... requires you to bring up
> > > > SAS(ODBC server) and the server has to be closed manually. I feel
> > > > this is unsatisfactory. The crux of the issue is IEEE floating
> > > > point. I think HEX16. would be lossless, all  I have to in R is
> > > > convert the character  hex16. to  floating point. The nice thing
> > > > about dsread is it is light, fast and you do not need SAS and it
> > > > looks like it supports pipes.
>
> > > > On Feb 25, 12:15 pm, xlr82sas <xlr82...(a)aol.com> wrote:
> > > > > On Feb 25, 10:43 am, xlr82sas <xlr82...(a)aol.com> wrote:
>
> > > > > > On Feb 25, 6:45 am, ChrisBLong <ch...(a)oview.co.uk> wrote:
>
> > > > > > > Hi,
>
> > > > > > > Readers of this group may be interested in dsread, available
> > > > athttp://www.oview.co.uk/dsread
>
> > > > > > > It's a command-line utility that understands the SAS7BDAT
> > > > > > > file format.  It lets you examine the structure of datasets
> > > > > > > conveniently from the command-line, and converts SAS7BDAT
> > > > > > > data into valid CSV format for import into other software.
>
> > > > > > > All comments and suggestions gratefully received,
>
> > > > > > > Chris.
>
> > > > > > Hi Chris,
>
> > > > > >    Congratulations for reading SAS datasets. WPS has powerfull
> > > > > > capabilities. Thanks!!! Competition is great.
>
> > > > > >    Even though you only create csv's, I see this as a great
> > > > > > product because you do not need SAS and with pipes users can
> > > > > > programtically get at SAS data from other languages. Also CSVs
> > > > > > are also very amenable to EXCEL.
>
> > > > ==================================================================
> > > > =========­­­ ===================================================
>
> > > > > > Just some thoughts:
>
> > > > > >    Any chance you could create a lossless output format, like
> > > > > > SAS export datasets, but allow for longer names and character
> > > > > > values greater than 200 bytes. This would open up SAS datasets
> > > > > > to other languages. The format would have to be open.
>
> > > > > >   This could be a really big deal, if instead of a csv, you
> > > > > > created R dataframes, if called from R. An even bigger deal
> > > > > > would be if you created a SAS dataset from an R dataframe.
>
> > > > > >   XML would be another nice output.
>
> > > > > >   A silient ODBC would also be great.
>
> > > > > >   I bet you can use pipes whith yow command line interface.
>
> > > > > My apologies.
>
> > > > > I assumed your site was somehow affiliated with WPS.
> > > > > It looks like it may not be.
>
> > > > > Which makes what you have done all the more remarkable.
>
> > > > > I was hoping you honored formats because I wanted to associate
> > > > > hex16 with the numeric columns so I could create a lossless csv,
> > > > > but it did not work.- Hide quoted text -
>
> > > > > - Show quoted text -- Hide quoted text -
>
> > > > - Show quoted text -
>
> > > Hi,
>
> > >   I think you may have misunderstood me, when I said pipe, I did not
> > > mean pipe delimited.
>
> > >  I meant
>
> > >  filename pyp pipe "dsread.exe < data.sas7bdat";
>
> > >   I do plan on using dsread this weekend.
>
> > >   I will share my experieces with you.- Hide quoted text -
>
> > > - Show quoted text -
>
> > FYI
>
> > I just posted this to the R group
>
> > If I do the following
>
> > sprintf("%A",pi)
> > "0X1.921FB54442D18"
>
> > I have this 16 byte character string
>
> > hx<-"400921FB54442D18"
>
> > This is the exact hex16 representation of PI in IEEE float that R uses
> > in Intel 32bit(little endian) Windows SAS uses the same
> > representation. 11 bit exponent and 53 bit mantissa.
>
> > I want to do is recreate the float exactly from the 16 char hex
>
> > something like
>
> > MyPI<-readChar(hx,numeric(),16)
>
> > or in SAS
>
> > MyPI=input("400921FB54442D18",hex16.);
> > put MyPI=;
>
> > MYPI=3.1415926536
>
> > What I am trying to do is set up a lossless transfer method from SAS
> > to R- Hide quoted text -
>
> > - Show quoted text -
>
> Hi,
>
>   You utility solves the 200 byte, 8 char name and potential precision
> errors with other methods of transfering SAS datasets to perl and R.
> Thanks.
>
> Importing SAS datasets(sas7bdat) into R
> (32 bit windows 2000, 32 bit SAS 9.2 and
> 32 bit R version 2.9.0 (2009-04-17)
>
>  Here is what I want to accomplish, the double floats below show data from
> SAS to R.
>  They are exactly the same in R and SAS memory, bit for bit.
>
>   R Internal         SAS Internal
>   16 Byte Float      16 byte Float
>
> 3FFAAAAAAAAAAAAB  3FFAAAAAAAAAAAAB
> 4002AAAAAAAAAAAB  4002AAAAAAAAAAAB
> 400D555555555555  400D555555555555
> 3FF6666666666666  3FF6666666666666
> 3FFCCCCCCCCCCCCD  3FFCCCCCCCCCCCCD
> 400199999999999A  400199999999999A
> 4004CCCCCCCCCCCD  4004CCCCCCCCCCCD
> 3FF4924924924925  3FF4924924924925
> 3FF9249249249249  3FF9249249249249
> 3FFDB6DB6DB6DB6E  3FFDB6DB6DB6DB6E
> 4001249249249249  4001249249249249
> 3FF2E8BA2E8BA2E9  3FF2E8BA2E8BA2E9
> 3FF5D1745D1745D1  3FF5D1745D1745D1
> 3FF8BA2E8BA2E8BA  3FF8BA2E8BA2E8BA
> 3FFBA2E8BA2E8BA3  3FFBA2E8BA2E8BA3
> 3FF2762762762762  3FF2762762762762
> 3FF4EC4EC4EC4EC5  3FF4EC4EC4EC4EC5
> 3FF7627627627627  3FF7627627627627
> 3FF9D89D89D89D8A  3FF9D89D89D89D8A
> 1.7976931348623E  1.7976931348623E
> 0010000000000000  0010000000000000
>
> I don't believe this high accuracy transfer is possible  with any other
> method except ODBC, but SAS ODBC is unsatisfactory for me. If you use CSV
> with the maximum assured decimal
> precision(15 significant digits?). The CSV decimal numbers will only
> approximate the double floats.
>
> I consider the Csv to be corrupt if the relative of absolute difference
> using the decimal Csv numbers and the memory floats is greater than 10^-12.
> There are two sources of error first the SAS floats are decimally rounded
> and converted to decimal then the rounded decimal approximations are
> converted into R floats.
>
> Status of     R Internal            CSV
> Csv           16 Byte Float
>
> Csv corrupt 3FFAAAAAAAAAAAAB   1.66666666666667    >10^-12 different
> Csv corrupt 4002AAAAAAAAAAAB   2.33333333333333
> Csv corrupt 400D555555555555   3.66666666666667
> Csv OK      3FF6666666666666   1.4
> Csv OK      3FFCCCCCCCCCCCCD   1.8
> Csv OK      400199999999999A   2.2
> Csv OK      4004CCCCCCCCCCCD   2.6
> Csv corrupt 3FF4924924924925   1.28571428571429
> Csv corrupt 3FF9249249249249   1.57142857142857
> Csv corrupt 3FFDB6DB6DB6DB6E   1.85714285714286
> Csv corrupt 4001249249249249   2.14285714285714
> Csv corrupt 3FF2E8BA2E8BA2E9   1.18181818181818
> Csv corrupt 3FF5D1745D1745D1   1.36363636363636
> Csv corrupt 3FF8BA2E8BA2E8BA   1.54545454545455
> Csv corrupt 3FFBA2E8BA2E8BA3   1.72727272727273
> Csv corrupt 3FF2762762762762   1.15384615384615
> Csv corrupt 3FF4EC4EC4EC4EC5   1.30769230769231
> Csv corrupt 3FF7627627627627   1.46153846153846
> Csv corrupt 3FF9D89D89D89D8A   1.61538461538462
> Csv corrupt 1.7976931348623E   1.7976931348623E+308
> Csv corrupt 0010000000000000   2.2250738585072E-308
>
> Bacground
>
>   1. Provide absolutely loss less transfer
>      of character(max 32756 bytes per character variable)  and numeric data
> from SAS to R
>      Since SAS has only two datatypes so this code should be exhaustive.
>
>   2. This code is useful because:
>      a. The SAS ODBC driver requires the user to not only have
>         SAS but the user must bring up a SAS session and
>         the session has to be closed manually. (SAS issue not a foreign
> issue)
>      b. The foreign package also requires interaction with SAS. (SAS
> issue)
>      c. SASxport only supports 8 character SAS names and a max of
>         200 byte character values. (This is a SAS issue not a SASxport
> issue)
>      d. SASxport creates floating point doubles that have an 8 bit exponent
>         and 56 bit mantissa while IEEE is 11 bit exponent and 53 bit
> mantissa
>         (sometimes defined slightly differently depending of where you
> consider
>         the sign bits). This results is the loss of some very small and
>         very large numbers. ( SAS issue not a SASxport issue)
>
>   3. How this code overcomes the issues above for import only.
>
>      You need the dsread exec in the previous mesage. Also the input SAS
> dataset must have
> 16 byte character representations for the floats. I am working with- Hide quoted text -
>
> - Show quoted text -...
>
> read more »

Hi Chris and Alan,

Looks like there is a lot of interest in the R community.

I may need to swap some bytes because it looks like your hex values
are 'little endian' and I need 'big endian'. Do not do anything until
a can test later tonight.

An additiona benefit of the hex values is that we can identify the 28
SAS missing values and set them to NaN on the R side.

You can test my list below in SAS by converting the 16 char floats to
ieee8. in SAS and then doing a put on the float. The result will be A,
B...Z, . and _.

SAS code that produced the listing is below.

Here are the 28 missing values in SAS and the corresponding floats.

A FFFFFD0000000000
B FFFFFC0000000000
C FFFFFB0000000000
D FFFFFA0000000000
E FFFFF90000000000
F FFFFF80000000000
G FFFFF70000000000
H FFFFF60000000000
I FFFFF50000000000
J FFFFF40000000000
K FFFFF30000000000
L FFFFF20000000000
M FFFFF10000000000
N FFFFF00000000000
O FFFFEF0000000000
P FFFFEE0000000000
Q FFFFED0000000000
R FFFFEC0000000000
S FFFFEB0000000000
T FFFFEA0000000000
U FFFFE90000000000
V FFFFE80000000000
W FFFFE70000000000
X FFFFE60000000000
Y FFFFE50000000000
Z FFFFE40000000000
_ FFFFFF0000000000
.. FFFFFE0000000000

data
mis;
retain A .A B .B C .C D .D E .E F .F G .G H .H I .I J .J K .K L .L
M .M
N .N O .O P .P Q .Q R .R S .S T .T U .U V .V W .W X .X Y .Y
Z .Z
_ ._
DOT .;
array mis[28] A B C D E F G H I J K L M N O P Q R S T U V W X Y Z _
DOT;
do idx=1 to
28;

hex=put(mis[idx],ieee8.);

xeh=put(hex,hex16.);
put @1 mis[idx] @6
xeh;
end;
run;

From: Alan Churchill on
I have implemented a big endian flag in the next release. Look at the Export
element.

<?xml version="1.0" encoding="utf-8" ?>
<Export lossless="true" endian="Big"> <!-- Little endian is default so no
need to specify endian attribute -->
<Xml file="c:\temp\sas7bdat.xml"/>
<Excel file="c:\temp\sas7bdat.xls">
<WorksheetName>Sas Data</WorksheetName>
<StartCell>R2C2</StartCell>
</Excel>
<Delimited file="c:\temp\sas7bdat.txt" delimiter="\t" headers="true"
surroundWithQuotes="true"/>
<Contents file="c:\temp\sas7bdatContents.txt"/>
</Export>

Alan

Alan Churchill
Savian
www.savian.net
Office: (719) 687-5954
Cell: (719) 310-4870


-----Original Message-----
From: SAS(r) Discussion [mailto:SAS-L(a)LISTSERV.UGA.EDU] On Behalf Of
xlr82sas
Sent: Monday, March 01, 2010 7:21 PM
To: SAS-L(a)LISTSERV.UGA.EDU
Subject: Re: dsread - Windows command-line utility for SAS7BDAT files

On Mar 1, 1:52 pm, alan.church...(a)SAVIAN.NET (Alan Churchill) wrote:
> I am pretty sure I have this operational. Keep in mind that the need
> for lossless has to happen at the dataset level since the data will be
> read in before you want to export it. Hence, it is a property you can
> set at the dataset level and it will store the numerics as a string of hex
values.
>
> It then will output the value in all of the exported values
> (delimited, Excel, XML).
>
> It will be in the next release.
>
> Alan
>
> Alan Churchill
> Savianwww.savian.net
> Office: (719) 687-5954
> Cell: (719) 310-4870
>
>
>
> -----Original Message-----
> From: SAS(r) Discussion [mailto:SA...(a)LISTSERV.UGA.EDU] On Behalf Of
>
> xlr82sas
> Sent: Monday, March 01, 2010 2:18 AM
> To: SA...(a)LISTSERV.UGA.EDU
> Subject: Re: dsread - Windows command-line utility for SAS7BDAT files
>
> On Feb 26, 10:09 pm, xlr82sas <xlr82...(a)aol.com> wrote:
> > On Feb 26, 7:29 pm, xlr82sas <xlr82...(a)aol.com> wrote:
>
> > > On Feb 26, 9:06 am, alan.church...(a)SAVIAN.NET (Alan Churchill) wrote:
>
> > > > Again, piping is easy. I'll try and toss something together this
> > > > weekend and pipe delimited, Excel, and XML.
>
> > > > I am a bit unclear on what it is you need. If I dump the data to
> > > > a delimited file, for example, I treat the values as doubles (do
> > > > not print hex
> > > > representation) so you should have better representation anyway.
> > > > You can always convert them as needed.
>
> > > > Alan
>
> > > > Alan Churchill
> > > > Savianwww.savian.net
> > > > Office: (719) 687-5954
> > > > Cell: (719) 310-4870
>
> > > > -----Original Message-----
> > > > From: SAS(r) Discussion [mailto:SA...(a)LISTSERV.UGA.EDU] On
> > > > Behalf Of
>
> > > > xlr82sas
> > > > Sent: Thursday, February 25, 2010 7:43 PM
> > > > To: SA...(a)LISTSERV.UGA.EDU
> > > > Subject: Re: dsread - Windows command-line utility for SAS7BDAT
> > > > files
>
> > > > Hi Alan and Chis
>
> > > > What interests me is communications with open source
> > > > products, like R, perl and MySQL. Export and import needs to be
> > > > lossless. My understanding is that SAS -ODBC/OleDB... requires
> > > > you to bring up SAS(ODBC server) and the server has to be closed
> > > > manually. I feel this is unsatisfactory. The crux of the issue
> > > > is IEEE floating point. I think HEX16. would be lossless, all I
> > > > have to in R is convert the character hex16. to floating
> > > > point. The nice thing about dsread is it is light, fast and you
> > > > do not need SAS and it looks like it supports pipes.
>
> > > > On Feb 25, 12:15 pm, xlr82sas <xlr82...(a)aol.com> wrote:
> > > > > On Feb 25, 10:43 am, xlr82sas <xlr82...(a)aol.com> wrote:
>
> > > > > > On Feb 25, 6:45 am, ChrisBLong <ch...(a)oview.co.uk> wrote:
>
> > > > > > > Hi,
>
> > > > > > > Readers of this group may be interested in dsread,
> > > > > > > available
> > > > athttp://www.oview.co.uk/dsread
>
> > > > > > > It's a command-line utility that understands the SAS7BDAT
> > > > > > > file format. It lets you examine the structure of
> > > > > > > datasets conveniently from the command-line, and converts
> > > > > > > SAS7BDAT data into valid CSV format for import into other
software.
>
> > > > > > > All comments and suggestions gratefully received,
>
> > > > > > > Chris.
>
> > > > > > Hi Chris,
>
> > > > > > Congratulations for reading SAS datasets. WPS has
> > > > > > powerfull capabilities. Thanks!!! Competition is great.
>
> > > > > > Even though you only create csv's, I see this as a great
> > > > > > product because you do not need SAS and with pipes users can
> > > > > > programtically get at SAS data from other languages. Also
> > > > > > CSVs are also very amenable to EXCEL.
>
> > > > ================================================================
> > > > == =========���
> > > > ===================================================
>
> > > > > > Just some thoughts:
>
> > > > > > Any chance you could create a lossless output format,
> > > > > > like SAS export datasets, but allow for longer names and
> > > > > > character values greater than 200 bytes. This would open up
> > > > > > SAS datasets to other languages. The format would have to be
open.
>
> > > > > > This could be a really big deal, if instead of a csv, you
> > > > > > created R dataframes, if called from R. An even bigger deal
> > > > > > would be if you created a SAS dataset from an R dataframe.
>
> > > > > > XML would be another nice output.
>
> > > > > > A silient ODBC would also be great.
>
> > > > > > I bet you can use pipes whith yow command line interface.
>
> > > > > My apologies.
>
> > > > > I assumed your site was somehow affiliated with WPS.
> > > > > It looks like it may not be.
>
> > > > > Which makes what you have done all the more remarkable.
>
> > > > > I was hoping you honored formats because I wanted to associate
> > > > > hex16 with the numeric columns so I could create a lossless
> > > > > csv, but it did not work.- Hide quoted text -
>
> > > > > - Show quoted text -- Hide quoted text -
>
> > > > - Show quoted text -
>
> > > Hi,
>
> > > I think you may have misunderstood me, when I said pipe, I did
> > > not mean pipe delimited.
>
> > > I meant
>
> > > filename pyp pipe "dsread.exe < data.sas7bdat";
>
> > > I do plan on using dsread this weekend.
>
> > > I will share my experieces with you.- Hide quoted text -
>
> > > - Show quoted text -
>
> > FYI
>
> > I just posted this to the R group
>
> > If I do the following
>
> > sprintf("%A",pi)
> > "0X1.921FB54442D18"
>
> > I have this 16 byte character string
>
> > hx<-"400921FB54442D18"
>
> > This is the exact hex16 representation of PI in IEEE float that R
> > uses in Intel 32bit(little endian) Windows SAS uses the same
> > representation. 11 bit exponent and 53 bit mantissa.
>
> > I want to do is recreate the float exactly from the 16 char hex
>
> > something like
>
> > MyPI<-readChar(hx,numeric(),16)
>
> > or in SAS
>
> > MyPI=input("400921FB54442D18",hex16.);
> > put MyPI=;
>
> > MYPI=3.1415926536
>
> > What I am trying to do is set up a lossless transfer method from SAS
> > to R- Hide quoted text -
>
> > - Show quoted text -
>
> Hi,
>
> You utility solves the 200 byte, 8 char name and potential precision
> errors with other methods of transfering SAS datasets to perl and R.
> Thanks.
>
> Importing SAS datasets(sas7bdat) into R
> (32 bit windows 2000, 32 bit SAS 9.2 and
> 32 bit R version 2.9.0 (2009-04-17)
>
> Here is what I want to accomplish, the double floats below show data
> from SAS to R.
> They are exactly the same in R and SAS memory, bit for bit.
>
> R Internal SAS Internal
> 16 Byte Float 16 byte Float
>
> 3FFAAAAAAAAAAAAB 3FFAAAAAAAAAAAAB
> 4002AAAAAAAAAAAB 4002AAAAAAAAAAAB
> 400D555555555555 400D555555555555
> 3FF6666666666666 3FF6666666666666
> 3FFCCCCCCCCCCCCD 3FFCCCCCCCCCCCCD
> 400199999999999A 400199999999999A
> 4004CCCCCCCCCCCD 4004CCCCCCCCCCCD
> 3FF4924924924925 3FF4924924924925
> 3FF9249249249249 3FF9249249249249
> 3FFDB6DB6DB6DB6E 3FFDB6DB6DB6DB6E
> 4001249249249249 4001249249249249
> 3FF2E8BA2E8BA2E9 3FF2E8BA2E8BA2E9
> 3FF5D1745D1745D1 3FF5D1745D1745D1
> 3FF8BA2E8BA2E8BA 3FF8BA2E8BA2E8BA
> 3FFBA2E8BA2E8BA3 3FFBA2E8BA2E8BA3
> 3FF2762762762762 3FF2762762762762
> 3FF4EC4EC4EC4EC5 3FF4EC4EC4EC4EC5
> 3FF7627627627627 3FF7627627627627
> 3FF9D89D89D89D8A 3FF9D89D89D89D8A
> 1.7976931348623E 1.7976931348623E
> 0010000000000000 0010000000000000
>
> I don't believe this high accuracy transfer is possible with any
> other method except ODBC, but SAS ODBC is unsatisfactory for me. If
> you use CSV with the maximum assured decimal
> precision(15 significant digits?). The CSV decimal numbers will only
> approximate the double floats.
>
> I consider the Csv to be corrupt if the relative of absolute
> difference using the decimal Csv numbers and the memory floats is greater
than 10^-12.
> There are two sources of error first the SAS floats are decimally
> rounded and converted to decimal then the rounded decimal
> approximations are converted into R floats.
>
> Status of R Internal CSV
> Csv 16 Byte Float
>
> Csv corrupt 3FFAAAAAAAAAAAAB 1.66666666666667 >10^-12 different
> Csv corrupt 4002AAAAAAAAAAAB 2.33333333333333
> Csv corrupt 400D555555555555 3.66666666666667
> Csv OK 3FF6666666666666 1.4
> Csv OK 3FFCCCCCCCCCCCCD 1.8
> Csv OK 400199999999999A 2.2
> Csv OK 4004CCCCCCCCCCCD 2.6
> Csv corrupt 3FF4924924924925 1.28571428571429
> Csv corrupt 3FF9249249249249 1.57142857142857
> Csv corrupt 3FFDB6DB6DB6DB6E 1.85714285714286
> Csv corrupt 4001249249249249 2.14285714285714
> Csv corrupt 3FF2E8BA2E8BA2E9 1.18181818181818
> Csv corrupt 3FF5D1745D1745D1 1.36363636363636
> Csv corrupt 3FF8BA2E8BA2E8BA 1.54545454545455
> Csv corrupt 3FFBA2E8BA2E8BA3 1.72727272727273
> Csv corrupt 3FF2762762762762 1.15384615384615
> Csv corrupt 3FF4EC4EC4EC4EC5 1.30769230769231
> Csv corrupt 3FF7627627627627 1.46153846153846
> Csv corrupt 3FF9D89D89D89D8A 1.61538461538462
> Csv corrupt 1.7976931348623E 1.7976931348623E+308
> Csv corrupt 0010000000000000 2.2250738585072E-308
>
> Bacground
>
> 1. Provide absolutely loss less transfer
> of character(max 32756 bytes per character variable) and numeric
> data from SAS to R
> Since SAS has only two datatypes so this code should be exhaustive.
>
> 2. This code is useful because:
> a. The SAS ODBC driver requires the user to not only have
> SAS but the user must bring up a SAS session and
> the session has to be closed manually. (SAS issue not a
> foreign
> issue)
> b. The foreign package also requires interaction with SAS. (SAS
> issue)
> c. SASxport only supports 8 character SAS names and a max of
> 200 byte character values. (This is a SAS issue not a SASxport
> issue)
> d. SASxport creates floating point doubles that have an 8 bit
exponent
> and 56 bit mantissa while IEEE is 11 bit exponent and 53 bit
> mantissa
> (sometimes defined slightly differently depending of where you
> consider
> the sign bits). This results is the loss of some very small and
> very large numbers. ( SAS issue not a SASxport issue)
>
> 3. How this code overcomes the issues above for import only.
>
> You need the dsread exec in the previous mesage. Also the input
> SAS dataset must have
> 16 byte character representations for the floats. I am working with-
> Hide quoted text -
>
> - Show quoted text -...
>
> read more �

Hi Chris and Alan,

Looks like there is a lot of interest in the R community.

I may need to swap some bytes because it looks like your hex values are
'little endian' and I need 'big endian'. Do not do anything until a can test
later tonight.

An additiona benefit of the hex values is that we can identify the 28 SAS
missing values and set them to NaN on the R side.

You can test my list below in SAS by converting the 16 char floats to ieee8.
in SAS and then doing a put on the float. The result will be A, B...Z, . and
_.

SAS code that produced the listing is below.

Here are the 28 missing values in SAS and the corresponding floats.

A FFFFFD0000000000
B FFFFFC0000000000
C FFFFFB0000000000
D FFFFFA0000000000
E FFFFF90000000000
F FFFFF80000000000
G FFFFF70000000000
H FFFFF60000000000
I FFFFF50000000000
J FFFFF40000000000
K FFFFF30000000000
L FFFFF20000000000
M FFFFF10000000000
N FFFFF00000000000
O FFFFEF0000000000
P FFFFEE0000000000
Q FFFFED0000000000
R FFFFEC0000000000
S FFFFEB0000000000
T FFFFEA0000000000
U FFFFE90000000000
V FFFFE80000000000
W FFFFE70000000000
X FFFFE60000000000
Y FFFFE50000000000
Z FFFFE40000000000
_ FFFFFF0000000000
.. FFFFFE0000000000

data
mis;
retain A .A B .B C .C D .D E .E F .F G .G H .H I .I J .J K .K L .L M .M
N .N O .O P .P Q .Q R .R S .S T .T U .U V .V W .W X .X Y .Y Z .Z
_ ._
DOT .;
array mis[28] A B C D E F G H I J K L M N O P Q R S T U V W X Y Z _ DOT;
do idx=1 to
28;

hex=put(mis[idx],ieee8.);

xeh=put(hex,hex16.);
put @1 mis[idx] @6
xeh;
end;
run;