From: Arthur Tabachneck on
Michael,

Priceless!

On Tue, 5 Jan 2010 10:58:01 -0500, Michael Raithel
<michaelraithel(a)WESTAT.COM> wrote:

<snip>
>
>Mark, as usual, you ask the tough questions. Sadly, SAS is lacking in
the PROC MANUAL LABOR areas. However, I did find another organization
that offers software that will actually clean your computer screen from
the inside! Check out:
>
>http://www.raincitystory.com/flash/screenclean.swf
From: Jonathan Goldberg on
No, all the office cleaning software is in the form of
iPhone apps. Sorry,if you don't have an iPhone you're SOL.

Jonathan

p.s. I too loved the screen-cleaning app.

On Tue, 5 Jan 2010 09:22:58 -0500, Keintz, H. Mark
<mkeintz(a)WHARTON.UPENN.EDU> wrote:

>But Michael ...
>
>
>Does SAS have no office cleaning software? Are there no apps? No
procs? No macros?
>
>Regards,
>Mark
From: Jonathan Goldberg on
Well, I can't complain that I didn't get responses to my query. True,
most of them were a bit snippy...

The messages mostly were about how big an investment a piledriver is when
all I'm looking for is a hammer. Our situation is relatively simple; we
don't have compliciated normalization schemes or most other possible
complications.

The idea of a tool is to put data cleaning as much as possible *in the
hands and under the control of the people who know the data*. Who are
also the people who will deal with any problems found. The need to
involve programmers slows down the projects and increases costs. Also,
there is no way whatsoever that a programmer could do data cleaning on
his/her own. I think trying to streamline that process is a quite
reasonable thing to do.

Besides, my management wants me to do it, with something we write
ourselves (which will involve non-SAS programming for the front end) or
with a third party product. So, reasonable or not, here we go.

My thanks for the pointers to Dataflux, SDD (no, I'm not familiar with all
the SI vertical-market products), and SAS Data Quality Solution.

Michael, I hope that when you finish your evaluation you will post your
conclusions here. Inquiring minds whan to know! And thanks again for the
screen cleaner.

Jonathan

ps
It's true my cubicle is somewhat messy... :-)

pps.
"(believe it or not SAS programmers are not necessarily the highest paid
employees in some organizations)."

In my case I guarentee it.
From: Nathaniel Wooding on
Jonathan

If you have SAS/FSEDIT, you could consider giving your users some proofing and analysis reports that would let them flag certain types of problems. You could then give them a tool that used FSEDIT to manually fix the problems.

Nat Wooding

-----Original Message-----
From: SAS(r) Discussion [mailto:SAS-L(a)LISTSERV.UGA.EDU] On Behalf Of Jonathan Goldberg
Sent: Tuesday, January 05, 2010 12:36 PM
To: SAS-L(a)LISTSERV.UGA.EDU
Subject: Re: Data Validation/Cleansing Tool Query

Well, I can't complain that I didn't get responses to my query. True,
most of them were a bit snippy...

The messages mostly were about how big an investment a piledriver is when
all I'm looking for is a hammer. Our situation is relatively simple; we
don't have compliciated normalization schemes or most other possible
complications.

The idea of a tool is to put data cleaning as much as possible *in the
hands and under the control of the people who know the data*. Who are
also the people who will deal with any problems found. The need to
involve programmers slows down the projects and increases costs. Also,
there is no way whatsoever that a programmer could do data cleaning on
his/her own. I think trying to streamline that process is a quite
reasonable thing to do.

Besides, my management wants me to do it, with something we write
ourselves (which will involve non-SAS programming for the front end) or
with a third party product. So, reasonable or not, here we go.

My thanks for the pointers to Dataflux, SDD (no, I'm not familiar with all
the SI vertical-market products), and SAS Data Quality Solution.

Michael, I hope that when you finish your evaluation you will post your
conclusions here. Inquiring minds whan to know! And thanks again for the
screen cleaner.

Jonathan

ps
It's true my cubicle is somewhat messy... :-)

pps.
"(believe it or not SAS programmers are not necessarily the highest paid
employees in some organizations)."

In my case I guarentee it.
CONFIDENTIALITY NOTICE: This electronic message contains
information which may be legally confidential and or privileged and
does not in any case represent a firm ENERGY COMMODITY bid or offer
relating thereto which binds the sender without an additional
express written confirmation to that effect. The information is
intended solely for the individual or entity named above and access
by anyone else is unauthorized. If you are not the intended
recipient, any disclosure, copying, distribution, or use of the
contents of this information is prohibited and may be unlawful. If
you have received this electronic transmission in error, please
reply immediately to the sender that you have received the message
in error, and delete it. Thank you.
From: Francois van der Walt on
Dear Jonathan and SAS-L (ers)

Data cleaning is certainly important and the value of clean data is often
underestimated. Interestingly it is for us (GJI) often the easiest service
to sell.

The biggest bang for buck that we use in the data cleaning process and that
I can recommend as an excellent starting point is Characterise under
Enterprise Guide. I am sure it use to be macro's developed and some SAS-L
ers will be able to refer you to it. (If you do not have Enterprise Guide
available let me know and I will provide you with an extract of the macro's)

Characterise provides a frequency analysis for all alpha fields (top 30 by
default) that we use to quickly identify problems like blank fields or lots
of "N/A", "TBA", "TEST", "HJKL" etc in fields. We ask business owners to
identify the valid versus invalid values in a extracted spreadsheet. We also
use it to generate a translation table that for example translate the
Australian state "Victoria", "VIC.", "V.I.C." etc to a consistent "VIC".

For numeric fields Characterise provides number of missings averages,
maximums, minimums etc.

Kind Regards
Francois (Brisbane)


On Tue, 5 Jan 2010 12:36:14 -0500, Jonathan Goldberg
<jgoldberg(a)BIOMEDSYS.COM> wrote:

>Well, I can't complain that I didn't get responses to my query. True,
>most of them were a bit snippy...
>
>The messages mostly were about how big an investment a piledriver is when
>all I'm looking for is a hammer. Our situation is relatively simple; we
>don't have compliciated normalization schemes or most other possible
>complications.
>
>The idea of a tool is to put data cleaning as much as possible *in the
>hands and under the control of the people who know the data*. Who are
>also the people who will deal with any problems found. The need to
>involve programmers slows down the projects and increases costs. Also,
>there is no way whatsoever that a programmer could do data cleaning on
>his/her own. I think trying to streamline that process is a quite
>reasonable thing to do.
>
>Besides, my management wants me to do it, with something we write
>ourselves (which will involve non-SAS programming for the front end) or
>with a third party product. So, reasonable or not, here we go.
>
>My thanks for the pointers to Dataflux, SDD (no, I'm not familiar with all
>the SI vertical-market products), and SAS Data Quality Solution.
>
>Michael, I hope that when you finish your evaluation you will post your
>conclusions here. Inquiring minds whan to know! And thanks again for the
>screen cleaner.
>
>Jonathan
>
>ps
>It's true my cubicle is somewhat messy... :-)
>
>pps.
>"(believe it or not SAS programmers are not necessarily the highest paid
>employees in some organizations)."
>
>In my case I guarentee it.