From: David Johnson on 23 Jan 2007 17:36 Actually Dave, I'm not convinced that it isn't a SAS problem. I have a track open with SAS on a similar issue involving a rename from the restructuring of a small data set. It is one within some hundreds of data sets created and modified in a work library as part of 46 programs included in a batch sequence. At irregular times, the "rename of temporary member" message comes up, SAS goes into syntax checking mode and sets Obs to 0, the batch manager detects an error and terminates the sequence. It doesn't appear to be the same place twice, it isn't practical to replace every data step with a new output table followed by a delete step, and it isn't an issue with space or permissions. So most of the usual diagnoses are irrelevant. Synchio is turned on, and the tables use V7 compliant naming and structure, so V6.12 library definition is out as well. It has been plaguing us for months, and seems very similar to the issue described here by Curtis, and similar issues by other correspondents for quite some time. The difference is that the work directory is on a virtual drive created on a Raid 5 array in a high end workstation. Yes, I know Raid slows performance on work libraries, and it isn't my choice, but it's the way this machine has been built, and I don't have the option to change it to JBOD. The core issue seems to be: the V8 engine, when talking to a Network drive, a NAS drive or a Raid array is expecting a process to be finished before it has physically completed. How many times have we had to code delays into programs to deal with OS response times? I have a hunch this is similar, and the SAS V8 engine is expecting something Windows architecture cannot always deliver. Incidentally, since it is irregular, and since it is a batch process with small included code objects, I am looking at the batch manager resubmitting the same code block if it fails with an error of this type. Now I only need to be able to reset the SAS error flags. Unfortunately, since I can't predict when the error will occur, it is going to be some time before I will know if the changes to the batch manager work. Kind regards David On Fri, 14 Jan 2005 14:56:18 -0800, David L. Cassell <cassell.david(a)EPAMAIL.EPA.GOV> wrote: >Curtis Amick <curtis(a)SC.RR.COM> wrote: >> Got a difficult problem here. Recently my company upgraded network >storage >> to an EMC NAS (Network Attached Storage), from a non-NAS system. Now, >those >> of us who store SAS data sets on the network are encountering a >serious >> problem. When updating data sets, sometimes (rarely) those data sets >will be >> deleted. The error message looks like: >> "ERROR: Rename of temporary member for (data set name) failed. File >may >> be found in a directory (your directory)"and the permanent data set >is >> gone. >> >> This happens randomly, and (apparently) only when the data set already >> exists. That is, when doing like this: >> >> DATA NETDRIVE.DATASET; SET DATASET2; RUN; If netdrive.dataset >already >> exists (it's being "updated" by work.dataset2), then this error >*might* >> occur. If netdrive.dataset does not yet exist (it's being created by >> work.dataset2), then problem will not occur. >> >> From SI Tech Support: They've seen this before (see SAS NOTE 005781, >link >> here: http://support.sas.com/techsup/unotes/SN/005/005781.html ), but >can't >> fix it because (according to TS rep) once SAS wants to write to NAS, >they >> "hand it off" to the network. And that's when the problem occurs. >> >> Here's what I think: When SAS updates a data set, it creates a >temporary >> data set to work on, keeping the original intact. When the step ends, >(think >> PROC SORT DATA=ND.dataset; RUN; (this killed me on Saturday. Had a >macro >> that sorted 20+ data sets, and lost 4!!! of them.)) the original data >set is >> over-written by the temp, taking on the name of the original. And I'm >> thinking it's during that writing/re-naming process that the storage >system >> is losing our data sets. (SI calls it a "timing issue"). Doesn't >happen when >> working on local drives, and, like I mentioned earlier, hasn't >happened yet >> when *creating* permanent data sets; only when updating. >> >> Some suggestions (from SITS): change engines (v8, v612) (doesn't work, >not >> feasible), use -SYNCHIO (have tried it; doesn't seem to help), remove >SAS >> data sets from on-line virus scanning in the NAS (our IS dept is leery >of >> that one). Personally, I'd like to go back to previous storage >(non-NAS, IS >> dept isn't thrilled with that one). >> >> Probably can get around this problem by programming like so: >> DATA ABC; >> SET ND.DATASET; >> (play with data set ABC...) >> RUN: >> >> (delete ND.DATASET) >> >> DATA ND.DATASET; >> SET ABC; >> RUN: >> >> But I'd prefer something cleaner, less intrusive (especially for our >less >> "sophisticated" users). Plus, we've got LOTS of programs that are run >daily, >> weekly, monthly, etc that contain steps like: "proc sort data=ND.xxxx; >run;" >> and/or "data ND.xxxx; set ND.xxxx abc; run;" and/or (well, you get the >> picture). >> >> To the point: has anyone else had this problem, and (if so) what did >you do >> to solve it? > >I haven't seen this problem before. But I'd just like to vent. > >How can your IS people not be responsive on this? Go to your bosses and >show them how EXPENSIVE this is going to be. If your IS won't or can't >fix this problem (scrap NAS or get it fixed), then you and all your >other >SAS people will have to re-write every bit of your SAS code to only >create >new data sets: this means sorting from the old set to a new one using >the >OUT= option. This will explode the disk space requirements on the >network, >costing the company *more* money, on top of the cost of all the >programmer >hours to alter and then test and then debug all the SAS code. Make it >into a >business case, and show your bosses that this problem with NAS is going >to >cost them hundreds of thousands of dollars in this fiscal year alone, as >well >as wrecking the schedule for any new programming projects (factor in all >costs for that as well). > >There is no excuse for your IS not to have EMC all over this. EMC has a >rep as a really responsive solutions provider, and I can't believe they >got that rep by letting stuff like this happen. > >I wish I had better advice, but this isn't a SAS problem. > >David >-- >David Cassell, CSC >Cassell.David(a)epa.gov >Senior computing specialist >mathematical statistician
From: Richard A. DeVenezia on 23 Jan 2007 19:12 David Johnson wrote: > Actually Dave, I'm not convinced that it isn't a SAS problem. > > I have a track open with SAS on a similar issue involving a rename > from the restructuring of a small data set. It is one within some > hundreds of data sets created and modified in a work library as part > of 46 programs included in a batch sequence. > > At irregular times, the "rename of temporary member" message comes > up, SAS goes into syntax checking mode and sets Obs to 0, the batch > manager detects an error and terminates the sequence. Your SAS session is likely stressing hardware, and/or, your device subsystem/driver arrangement/configuration can not keep up with what it is being called to do. http://www.devenezia.com/downloads/sas/rename-error/ I never got to the bottom of it. Some candidates are low level caches or timing races in device driver issuing/handling system semaphores. -- Richard A. DeVenezia
From: "Johnson, David" on 23 Jan 2007 20:20 Thank you Richard, Until now, if a program exhibited misbehaviour above a given level (input data error, undefined outcome warning, SAS Warning, SAS Error) then the Batch Manager would identify the issue had occurred and terminate the batch. As misfortune would have it, this usually occurred before the three longest running jobs had completed, leaving the largest amount of processing to be completed manually during the day. Naturally, no problems would manifest when the machine was being more closely monitored. Where am I going with that? If there is a systemic issue, then we might expect it to manifest repeatedly, and 10-12 hours later the cause of the problem may have gone. So, if I can resubmit failing processes immediately then I might either see the rerun complete effortlessly, or the problem persist. While it seems that one may have a hardware or configuration problem as you suggest, there is a confounding influence that is either often absent or often lacks influence. Knowing what this was would make it more likely that I could create the problem on demand and allow me to test various fixes. I think I have completed recoding the Batch Manager to resubmit rather than abandon failed code and if I can get any further I'll advise. I had scoped changes to my resource monitoring program to also trap active process/thread and memory data but haven't coded up the APIs yet to do that. I might do that now, just to exclude the suspicion that another process like an AV application may be causing the problem. I note in Curtis' original note that this was a possibility offered by SAS. Kind regards David -----Original Message----- From: SAS(r) Discussion [mailto:SAS-L(a)LISTSERV.UGA.EDU] On Behalf Of Richard A. DeVenezia Sent: Wednesday, 24 January 2007 11:13 AM To: SAS-L(a)LISTSERV.UGA.EDU Subject: Re: ERROR: Rename... Losing data sets from network drives David Johnson wrote: > Actually Dave, I'm not convinced that it isn't a SAS problem. > > I have a track open with SAS on a similar issue involving a rename > from the restructuring of a small data set. It is one within some > hundreds of data sets created and modified in a work library as part > of 46 programs included in a batch sequence. > > At irregular times, the "rename of temporary member" message comes up, > SAS goes into syntax checking mode and sets Obs to 0, the batch > manager detects an error and terminates the sequence. Your SAS session is likely stressing hardware, and/or, your device subsystem/driver arrangement/configuration can not keep up with what it is being called to do. http://www.devenezia.com/downloads/sas/rename-error/ I never got to the bottom of it. Some candidates are low level caches or timing races in device driver issuing/handling system semaphores. -- Richard A. DeVenezia ************** IMPORTANT MESSAGE ***************************** This e-mail message is intended only for the addressee(s) and contains information which may be confidential. If you are not the intended recipient please advise the sender by return email, do not use or disclose the contents, and delete the message and any attachments from your system. Unless specifically indicated, this email does not constitute formal advice or commitment by the sender or the Commonwealth Bank of Australia (ABN 48 123 123 124) or its subsidiaries. We can be contacted through our web site: commbank.com.au. If you no longer wish to receive commercial electronic messages from us, please reply to this e-mail by typing Unsubscribe in the subject line. **************************************************************
From: LouisBB on 24 Jan 2007 01:10 Dear Curtis, If the trouble is in the number of simultaneous open files you could take a look at the section "SPLITTING A SAS FILE DYNAMICALLY USING THE .OUTPUT() METHOD"of Sugi31 paper 241 "Data Step Hash Objects as Programming Tools" by Paul M. Dorfman and Koen Vyverman. http://www2.sas.com/proceedings/sugi31/241-31.pdf They give an example for sorted input data: data _null_ ; dcl hash hid (ordered: 'a') ; hid.definekey ('id', 'transid', 'amt', '_n_') ; hid.definedata ('id', 'transid', 'amt' ) ; hid.definedone ( ) ; do _n_ = 1 by 1 until ( last.id ) ; set sample ; by id ; hid.add() ; end ; hid.output (dataset: 'OUT' || put (id, best.-l)) ; run ; And one for unsorted input data, using the hash of hashes method. I hope this can offer an alternative, assuming you are using Sas9. LouisBB. "Richard A. DeVenezia" <rdevenezia(a)wildblue.net> wrote in message news:51nmnvF1l6lvrU1(a)mid.individual.net... > David Johnson wrote: >> Actually Dave, I'm not convinced that it isn't a SAS problem. >> >> I have a track open with SAS on a similar issue involving a rename >> from the restructuring of a small data set. It is one within some >> hundreds of data sets created and modified in a work library as part >> of 46 programs included in a batch sequence. >> >> At irregular times...
From: David L Cassell on 24 Jan 2007 02:12 david.johnson(a)CBA.COM.AU wrote back: >On Fri, 14 Jan 2005 14:56:18 -0800, David L. Cassell ><cassell.david(a)EPAMAIL.EPA.GOV> wrote: > > >Curtis Amick <curtis(a)SC.RR.COM> wrote: > >> Got a difficult problem here. Recently my company upgraded network > >storage > >> to an EMC NAS (Network Attached Storage), from a non-NAS system. Now, > >those > >> of us who store SAS data sets on the network are encountering a > >serious > >> problem. When updating data sets, sometimes (rarely) those data sets > >will be > >> deleted. The error message looks like: > >> "ERROR: Rename of temporary member for (data set name) failed. File > >may > >> be found in a directory (your directory)"and the permanent data set > >is > >> gone. > >> > >> This happens randomly, and (apparently) only when the data set already > >> exists. That is, when doing like this: > >> > >> DATA NETDRIVE.DATASET; SET DATASET2; RUN; If netdrive.dataset > >already > >> exists (it's being "updated" by work.dataset2), then this error > >*might* > >> occur. If netdrive.dataset does not yet exist (it's being created by > >> work.dataset2), then problem will not occur. > >> > >> From SI Tech Support: They've seen this before (see SAS NOTE 005781, > >link > >> here: http://support.sas.com/techsup/unotes/SN/005/005781.html ), but > >can't > >> fix it because (according to TS rep) once SAS wants to write to NAS, > >they > >> "hand it off" to the network. And that's when the problem occurs. > >> > >> Here's what I think: When SAS updates a data set, it creates a > >temporary > >> data set to work on, keeping the original intact. When the step ends, > >(think > >> PROC SORT DATA=ND.dataset; RUN; (this killed me on Saturday. Had a > >macro > >> that sorted 20+ data sets, and lost 4!!! of them.)) the original data > >set is > >> over-written by the temp, taking on the name of the original. And I'm > >> thinking it's during that writing/re-naming process that the storage > >system > >> is losing our data sets. (SI calls it a "timing issue"). Doesn't > >happen when > >> working on local drives, and, like I mentioned earlier, hasn't > >happened yet > >> when *creating* permanent data sets; only when updating. > >> > >> Some suggestions (from SITS): change engines (v8, v612) (doesn't work, > >not > >> feasible), use -SYNCHIO (have tried it; doesn't seem to help), remove > >SAS > >> data sets from on-line virus scanning in the NAS (our IS dept is leery > >of > >> that one). Personally, I'd like to go back to previous storage > >(non-NAS, IS > >> dept isn't thrilled with that one). > >> > >> Probably can get around this problem by programming like so: > >> DATA ABC; > >> SET ND.DATASET; > >> (play with data set ABC...) > >> RUN: > >> > >> (delete ND.DATASET) > >> > >> DATA ND.DATASET; > >> SET ABC; > >> RUN: > >> > >> But I'd prefer something cleaner, less intrusive (especially for our > >less > >> "sophisticated" users). Plus, we've got LOTS of programs that are run > >daily, > >> weekly, monthly, etc that contain steps like: "proc sort data=ND.xxxx; > >run;" > >> and/or "data ND.xxxx; set ND.xxxx abc; run;" and/or (well, you get the > >> picture). > >> > >> To the point: has anyone else had this problem, and (if so) what did > >you do > >> to solve it? > > > >I haven't seen this problem before. But I'd just like to vent. > > > >How can your IS people not be responsive on this? Go to your bosses and > >show them how EXPENSIVE this is going to be. If your IS won't or can't > >fix this problem (scrap NAS or get it fixed), then you and all your > >other > >SAS people will have to re-write every bit of your SAS code to only > >create > >new data sets: this means sorting from the old set to a new one using > >the > >OUT= option. This will explode the disk space requirements on the > >network, > >costing the company *more* money, on top of the cost of all the > >programmer > >hours to alter and then test and then debug all the SAS code. Make it > >into a > >business case, and show your bosses that this problem with NAS is going > >to > >cost them hundreds of thousands of dollars in this fiscal year alone, as > >well > >as wrecking the schedule for any new programming projects (factor in all > >costs for that as well). > > > >There is no excuse for your IS not to have EMC all over this. EMC has a > >rep as a really responsive solutions provider, and I can't believe they > >got that rep by letting stuff like this happen. > > > >I wish I had better advice, but this isn't a SAS problem. > > > >David > >-- > >David Cassell, CSC > >Cassell.David(a)epa.gov > >Senior computing specialist > >mathematical statistician > >Actually Dave, I'm not convinced that it isn't a SAS problem. > >I have a track open with SAS on a similar issue involving a rename from >the restructuring of a small data set. It is one within some hundreds of >data sets created and modified in a work library as part of 46 programs >included in a batch sequence. > >At irregular times, the "rename of temporary member" message comes up, SAS >goes into syntax checking mode and sets Obs to 0, the batch manager >detects an error and terminates the sequence. > >It doesn't appear to be the same place twice, it isn't practical to >replace every data step with a new output table followed by a delete step, >and it isn't an issue with space or permissions. So most of the usual >diagnoses are irrelevant. > >Synchio is turned on, and the tables use V7 compliant naming and >structure, so V6.12 library definition is out as well. > >It has been plaguing us for months, and seems very similar to the issue >described here by Curtis, and similar issues by other correspondents for >quite some time. > >The difference is that the work directory is on a virtual drive created on >a Raid 5 array in a high end workstation. Yes, I know Raid slows >performance on work libraries, and it isn't my choice, but it's the way >this machine has been built, and I don't have the option to change it to >JBOD. > >The core issue seems to be: the V8 engine, when talking to a Network >drive, a NAS drive or a Raid array is expecting a process to be finished >before it has physically completed. > >How many times have we had to code delays into programs to deal with OS >response times? I have a hunch this is similar, and the SAS V8 engine is >expecting something Windows architecture cannot always deliver. > > >Incidentally, since it is irregular, and since it is a batch process with >small included code objects, I am looking at the batch manager >resubmitting the same code block if it fails with an error of this type. >Now I only need to be able to reset the SAS error flags. Unfortunately, >since I can't predict when the error will occur, it is going to be some >time before I will know if the changes to the batch manager work. > >Kind regards > >David I suspect that it *is* a SAS-related problem. But that does not make it a SAS problem. Right? Do you have other apps which sufficiently stress the disk I/O and buffering of the system? You might have to write one yourself in C, because SAS is pretty darn efficient at read/write, and it may be overtaxing your system components. If nothing else - even highly tuned code to pump streams of data in and out of your I/O subsystems - can cause this problem, then I would have to point a finger at SAS. But if other high-end I/O apps can cause similar problems, then it's the system. Pinning this down may be a *major* pain in the NAS. :-) HTH, David -- David L. Cassell mathematical statistician Design Pathways 3115 NW Norwood Pl. Corvallis OR 97330 _________________________________________________________________ Get Hilary Duff�s homepage with her photos, music, and more. http://celebrities.live.com
|
Next
|
Last
Pages: 1 2 3 Prev: Default Character Set = Japanese???? Next: How to reset fmtsearch= option to default catalog |