From: J.B. Brown on 22 Jul 2010 10:51 Hello everyone, and thanks for your time to read this. For quite some time, I have had a problem using Python's shell execution facilities in combination with a cluster computer environment (such as Sun Grid Engine (SGE)). In particular, I wish to repeatedly execute a number of commands in sub-shells or pipes within a single function, and the repeated execution is depending on the previous execution, so just writing a brute force script file and executing commands is not an option for me. To isolate and exemplify my problem, I have created three files: (1) one which exemplifies the spirit of the code I wish to execute in Python (2) one which serves as the SGE execution script file, and actually calls python to execute the code in (1) (3) a simple shell script which executes (2) a sufficient number of times that it fills all processors on my computing cluster and leaves an additional number of jobs in the queue. Here is the spirit of the experiment/problem: generateTest.py: ---------------------------------------------- # Constants numParallelJobs = 100 testCommand = "continue" #"os.popen( \"clear\" )" loopSize = "1000" # First, write file with test script. pythonScript = file( "testScript.py", "w" ) pythonScript.write( """ import os for i in range( 0, """ + loopSize + """ ): for j in range( 0, """ + loopSize + """ ): for k in range( 0, """ + loopSize + """ ): for l in range( 0, """ + loopSize + """ ): """ + testCommand + """ """ ) pythonScript.close() # Second, write SGE script file to execute the Python script. sgeScript = file( "testScript.sge", "w" ) sgeScript.write ( """ #$ -cwd #$ -N pythonTest #$ -e /export/home/jbbrown/errorLog #$ -o /export/home/jbbrown/outputLog python testScript.py """ ) sgeScript.close() # Finally, write script to run SGE script a specified number of times. import os launchScript = file( "testScript.sh", "w" ) for i in range( 0, numParallelJobs ): launchScript.write( "qsub testScript.sge" + os.linesep ) launchScript.close() ---------------------------------------------- Now, let's assume that I have about 50 processors available across 8 compute nodes, with one NFS-mounted disk. If I run the code as above, simply executing Python "continue" statements and do nothing, the cluster head node reports no serious NFS daemon load. However - if I change the code to use the os.popen() call shown as a comment above, or use os.system(), the NFS daemon load on my system skyrockets within seconds of distributing the jobs to the compute nodes -- even though I'm doing nothing but executing the clear screen command, which technically doesn't pipe any output to the location for logging stdout. Even if I change the SGE script file to redirect standard output and error to explicitly go to /dev/null, I still have the same problem. I believe the source of this problem is that os.popen() or os.system() calls spawn subshells which then reference my shell resource files (.zshrc, .cshrc, .bashrc, etc.). But I don't see an alternative to os.popen{234} or os.system(). os.exec*() cannot solve my problem, because it transfers execution to that program and stops executing the script which called os.exec*(). Without having to rewrite a considerable amount of code (which performs cross validation by repeatedly executing in a subshell) in terms of a shell script language filled with a large number of conditional statements, does anyone know of a way to execute external programs in the middle of a script without referencing the shell resource file located on an NFS mounted directory? I have read through the >help(os) documentation repeatedly, but just can't find a solution. Even a small lead or thought would be greatly appreciated. With thanks from humid Kyoto, J.B. Brown
From: MRAB on 22 Jul 2010 11:31 J.B. Brown wrote: > Hello everyone, and thanks for your time to read this. > > For quite some time, I have had a problem using Python's shell > execution facilities in combination with a cluster computer > environment (such as Sun Grid Engine (SGE)). > In particular, I wish to repeatedly execute a number of commands in > sub-shells or pipes within a single function, and the repeated > execution is depending on the previous execution, so just writing a > brute force script file and executing commands is not an option for > me. > > To isolate and exemplify my problem, I have created three files: > (1) one which exemplifies the spirit of the code I wish to execute in Python > (2) one which serves as the SGE execution script file, and actually > calls python to execute the code in (1) > (3) a simple shell script which executes (2) a sufficient number of > times that it fills all processors on my computing cluster and leaves > an additional number of jobs in the queue. > > Here is the spirit of the experiment/problem: > generateTest.py: > ---------------------------------------------- > # Constants > numParallelJobs = 100 > testCommand = "continue" #"os.popen( \"clear\" )" > loopSize = "1000" > > # First, write file with test script. > pythonScript = file( "testScript.py", "w" ) > pythonScript.write( > """ > import os > for i in range( 0, """ + loopSize + """ ): > for j in range( 0, """ + loopSize + """ ): > for k in range( 0, """ + loopSize + """ ): > for l in range( 0, """ + loopSize + """ ): > """ + testCommand + """ > """ ) > pythonScript.close() > > # Second, write SGE script file to execute the Python script. > sgeScript = file( "testScript.sge", "w" ) > sgeScript.write ( > """ > #$ -cwd > #$ -N pythonTest > #$ -e /export/home/jbbrown/errorLog > #$ -o /export/home/jbbrown/outputLog > python testScript.py > """ ) > sgeScript.close() > > # Finally, write script to run SGE script a specified number of times. > import os > launchScript = file( "testScript.sh", "w" ) > for i in range( 0, numParallelJobs ): > launchScript.write( "qsub testScript.sge" + os.linesep ) > launchScript.close() > > ---------------------------------------------- > > Now, let's assume that I have about 50 processors available across 8 > compute nodes, with one NFS-mounted disk. > If I run the code as above, simply executing Python "continue" > statements and do nothing, the cluster head node reports no serious > NFS daemon load. > > However - if I change the code to use the os.popen() call shown as a > comment above, or use os.system(), > the NFS daemon load on my system skyrockets within seconds of > distributing the jobs to the compute nodes -- even though I'm doing > nothing but executing the clear screen command, which technically > doesn't pipe any output to the location for logging stdout. > Even if I change the SGE script file to redirect standard output and > error to explicitly go to /dev/null, I still have the same problem. > > I believe the source of this problem is that os.popen() or os.system() > calls spawn subshells which then reference my shell resource files > (.zshrc, .cshrc, .bashrc, etc.). > But I don't see an alternative to os.popen{234} or os.system(). > os.exec*() cannot solve my problem, because it transfers execution to > that program and stops executing the script which called os.exec*(). > > Without having to rewrite a considerable amount of code (which > performs cross validation by repeatedly executing in a subshell) in > terms of a shell script language filled with a large number of > conditional statements, does anyone know of a way to execute external > programs in the middle of a script without referencing the shell > resource file located on an NFS mounted directory? > I have read through the >help(os) documentation repeatedly, but just > can't find a solution. > > Even a small lead or thought would be greatly appreciated. > Have you looked at the 'subprocess' module?
From: Neil Hodgson on 22 Jul 2010 19:19 J.B. Brown: > I believe the source of this problem is that os.popen() or os.system() > calls spawn subshells which then reference my shell resource files > (.zshrc, .cshrc, .bashrc, etc.). > But I don't see an alternative to os.popen{234} or os.system(). > os.exec*() cannot solve my problem, because it transfers execution to > that program and stops executing the script which called os.exec*(). Call fork then call exec from the new process. Search the web for "fork exec" to find examples in C. Neil
|
Pages: 1 Prev: Convert Unix timestamp to Readable Date/time Next: Improper Backtraces in Exec'd Code |