From: bnz6 on 3 Jan 2007 17:16 Jack, I may not have a clear understand of what you are asking or of what you are attempting to do, but it looks like SAS is working exactly as designed here. Looking at the code you submitted, you are requesting that SAS perform a stratified (on ClientID) simple random sample of size 1 without replacement from your dataset, with 2 replicates drawn independently. This is exactly what you are given in your output dataset. The code you submitted provides a seed value of 1234567890, which will ALWAYS produce the same results over and over again on this dataset. This explains why you ALWAYS get 2 ctrlid's for the last client ID. The replication selection just happens to select ZA8A9ABAA twice (for replicates 1 and 2) and will never change, unless you change the seed value. If you change the seed value, you may or may not get ZA8A9ABAA twice. I hope this helps. Sincerely yours, Mark J. Lamias SAIC Statistical Consultant Office of Informatics National Center for Preparedness, Detection, and Control of Infectious Diseases Coordinating Center for Infectious Diseases US Centers for Disease Control and Prevention w: (404) 639-0747 m: (404) 543-1394 f: (404) 639-1391 -----Original Message----- From: SAS(r) Discussion [mailto:SAS-L(a)LISTSERV.UGA.EDU] On Behalf Of waterleaf sas Sent: Wednesday, January 03, 2007 12:27 PM To: SAS-L(a)LISTSERV.UGA.EDU Subject: proc surveyselect problem please help me on the following code. I use SRS under proc surveyselect with rep = 2, however, the last clientid (15352) always select the duplicate ctrlid. the weird thing is: if I delect all records of clientid = 15053 (or other), it works fine. Thanks Jack * data *tmp; input ClientID CTRLID $*9*-* 18* ; cards ; 1010 ZBABAZZ77 1010 B9ZA0A80Z 1010 B9AA493Z6 1010 B900903A3 1010 B94096638 1021 4Z6394ZZ7 1021 ZB87AB849 1021 B94A4B864 1021 B9BA80373 1021 B94098AA3 1021 B89A0A93B 1021 B94A49430 1021 B9ZA848AA 1021 B93ZZ6770 1021 B94303607 1021 ZZA6697Z8 1021 B9AZ0A736 1021 B93BZ0604 1021 B900940BZ 1021 B9094Z8A9 1021 B90Z89Z3Z 7227 B9BA4A0Z9 7227 B906ZAZ48 7227 B9B0A9693 7227 Z6489BB0Z 7227 Z6Z934983 7227 B93A099Z0 7227 Z6Z8BAZ37 7227 4604774BZ 7227 34B8ZB34 7227 B9Z64066A 7227 B9A07BB43 14244 Z64897089 14244 B9409744A 14244 B93700A09 14244 AB8749A9B 14244 Z6B93Z047 14553 B94Z8B3BB 14553 ZBAB9AZ36 14553 B89A6A0Z7 14553 B9A48A936 14553 33B7Z4498 14553 B9046Z780 14553 B9BA8Z708 14674 B9BZ063ZZ 14674 B9BA4BAB9 14674 B9444ZA04 14674 B8940B806 14674 B9ZZZ06ZZ 14674 B9A364986 14674 B90B8ZZ6B 14674 B9B36Z848 14778 ZB7434A67 14778 B903807Z6 14778 B9AA477BA 14821 Z6B7BBB47 14821 Z6B693979 14821 B90A6Z088 14821 Z6673Z94A 15053 B940B3BAB 15053 Z6A93B49B 15053 Z679BA986 15352 ZA8A9ABAA 15352 B933Z047B ; * run *; * PROC **SORT*; BY clientid;* RUN *;* PROC **SURVEYSELECT* DATA = tmp SAMPSIZE= *1* METHOD = SRS REP = *2* SEED = *1234567890* OUT=SELECT_tmp; STRATA clientid;* RUN *;* proc **freq* ; tables ctrlid;* run *;
From: David L Cassell on 4 Jan 2007 02:10 waterleaf.sas(a)GMAIL.COM wrote: > >please help me on the following code. > >I use SRS under proc surveyselect with rep = 2, however, the last >clientid >(15352) always select the duplicate ctrlid. the weird thing is: if I delect >all records of clientid = 15053 (or other), it works fine. > >Thanks >Jack > > >* > >data >*tmp; > >input >ClientID CTRLID $*9*-* 18* ; > >cards >; [DATASET ELIDED BY ME] >; >* > >run >*; > > >* > >PROC >**SORT*; > >BY >clientid;* > >RUN >*;* > >PROC >**SURVEYSELECT* DATA = tmp > >SAMPSIZE= >*1* > >METHOD = SRS REP = >*2* > >SEED = >*1234567890* > >OUT=SELECT_tmp; > >STRATA clientid;* > >RUN >*;* > >proc >**freq* ; > >tables >ctrlid;* > >run >*; Perhaps, if you explain what it is that you are REALLY trying to accomplish, someone here could help you more. As things are, the proc is doing exactly what you told it to do. Why do you want 2 reps? Why do you want only 1 sample per clientid? That seems like a really bad sample design. What sort of sample are you really trying to achieve? And are you needing to worry about the fact that multiple reps may or may not give you the same point multiple times, since the selection process is independent on each replication? It seems like you were not expecting this to happen. HTH, David -- David L. Cassell mathematical statistician Design Pathways 3115 NW Norwood Pl. Corvallis OR 97330 _________________________________________________________________ Dave vs. Carl: The Insignificant Championship Series. Who will win? http://clk.atdmt.com/MSN/go/msnnkwsp0070000001msn/direct/01/?href=http://davevscarl.spaces.live.com/?icid=T001MSN38C07001
|
Pages: 1 Prev: retaining Next: clear the libref after calling a stored macro |