From: Steven Lord on 12 Jan 2010 09:15 "Amir Homayoun " <a.h.javadi(a)gmail.com> wrote in message news:hihr9i$iv2$1(a)fred.mathworks.com... > Dear Titus > > Thanks. I tried both "sched = findResource('scheduler', 'type', > 'jobmanager');" and "sched = findResource('scheduler', 'configuration', > 'jobmanager');" and I received the following error message, > Warning: Could not contact any job manager lookup process. You may not > have started a job manager, or multicast protocols may be failing on your > network. If you are > certain that a job manager is running, try findResource with a 'lookupURL' > input. >> In findResource>iCreateAccessor at 305 > In findResource>iFindJobManagers at 160 > In findResource>iFindScheduler at 263 > In findResource at 139 > > I copied the file "distcompUserConfig.m" to my working directory (current > directory) and tried again but still nothing. > > I looked up in the forum and I saw that another person had the same > problem in 2007. > Before I continue let me tell you about my system setup. I have one Mac > computer with 8 cores and I want to run 8 parallel jobs on the same > machine. It is not a network connection. > There, in 2007, you mentioned that the distributing computing engine must > be started using "mdce" command. I navigated to ...toolbox/distcomp/bin > and entered the following command from MATLAB command window > > !mdce install > > and it returned me the following error message > > /bin/bash: mdce: command not found Try: !./mdce install *snip* -- Steve Lord slord(a)mathworks.com comp.soft-sys.matlab (CSSM) FAQ: http://matlabwiki.mathworks.com/MATLAB_FAQ
From: Titus Edelhofer on 12 Jan 2010 09:32 Hi, yes, haven't thought of that one as an error source. By the way, on Mac you probably only need "!./mdce start", not "!./mdce install". The install is only on Windows ... Titus "Steven Lord" <slord(a)mathworks.com> schrieb im Newsbeitrag news:hii06a$4bq$1(a)fred.mathworks.com... > > "Amir Homayoun " <a.h.javadi(a)gmail.com> wrote in message > news:hihr9i$iv2$1(a)fred.mathworks.com... >> Dear Titus >> >> Thanks. I tried both "sched = findResource('scheduler', 'type', >> 'jobmanager');" and "sched = findResource('scheduler', 'configuration', >> 'jobmanager');" and I received the following error message, >> Warning: Could not contact any job manager lookup process. You may not >> have started a job manager, or multicast protocols may be failing on your >> network. If you are >> certain that a job manager is running, try findResource with a >> 'lookupURL' input. >>> In findResource>iCreateAccessor at 305 >> In findResource>iFindJobManagers at 160 >> In findResource>iFindScheduler at 263 >> In findResource at 139 >> >> I copied the file "distcompUserConfig.m" to my working directory (current >> directory) and tried again but still nothing. >> >> I looked up in the forum and I saw that another person had the same >> problem in 2007. >> Before I continue let me tell you about my system setup. I have one Mac >> computer with 8 cores and I want to run 8 parallel jobs on the same >> machine. It is not a network connection. >> There, in 2007, you mentioned that the distributing computing engine must >> be started using "mdce" command. I navigated to ...toolbox/distcomp/bin >> and entered the following command from MATLAB command window >> >> !mdce install >> >> and it returned me the following error message >> >> /bin/bash: mdce: command not found > > Try: > > !./mdce install > > *snip* > > -- > Steve Lord > slord(a)mathworks.com > comp.soft-sys.matlab (CSSM) FAQ: > http://matlabwiki.mathworks.com/MATLAB_FAQ > >
From: Amir Homayoun on 13 Jan 2010 03:55 Hi Thanks for your reply. OK. Finally, I could run the engine. I had to make some changes to the file "mdce_def.sh" so that it refers to valid directory names. Here are the messages which I received during and after starting the engine. I started one job manager and one worker. >> !./startjobmanager < no message > >> !./startworker Warning: Using multicast to locate the job manager because neither the command line arguments nor the mdce_def file specifies the job manager hostname or how to locate the job manager. This warning will not appear if you use either the -jobmanagerhost or -multicast flag. For more information, run the startworker command with the -help flag. >> !./nodestatus -infolevel 3 Job manager lookup process: Status Running Job manager: Name default_jobmanager Running on host Amir-H-Javadi.local < it is my name > Number of workers 1 Worker names and host names Amir-H-Javadi.local_worker, Amir-H-Javadi.local Start time Wed Jan 13 08:20:10 GMT 2010 Port 27355 Requested job manager lookup processes Amir-H-Javadi.local:27350 < it is my name > Registered with job manager lookup processes on hosts Amir-H-Javadi.local:27350 < it is my name > Database size in bytes 227515 VM heap size in bytes 5353472 Worker lease timeout in msec 60000 Network addresses of host 127.0.0.1 fe80:0:0:0:0:0:0:1%1 0:0:0:0:0:0:0:1 fe80:0:0:0:21f:5bff:fe3b:cce8%4 128.40.254.221 < it is the static IP address of my machine > Worker: Name Amir-H-Javadi.local_worker Running on host Amir-H-Javadi.local Status Idle Job manager default_jobmanager Connection with job manager Connected Job manager hostname Amir-H-Javadi.local Start time Wed Jan 13 08:21:28 GMT 2010 Port 27356 Requested job manager lookup processes Using multicast Registered with job manager lookup processes on hosts Amir-H-Javadi.local:27350 File dependencies directory /applications/matlab74/toolbox/distcomp/user/lib/Amir-H-Javadi.local_Amir-H-Javadi.local_worker_mlworker_log/matlabDependencyDir Worker startup directory /applications/matlab74/toolbox/distcomp/user/lib/Amir-H-Javadi.local_Amir-H-Javadi.local_worker_mlworker_log/work Network addresses of host 127.0.0.1 fe80:0:0:0:0:0:0:1%1 0:0:0:0:0:0:0:1 fe80:0:0:0:21f:5bff:fe3b:cce8%4 128.40.254.221 < it is the static IP address of my machine > Summary: The mdce service on Amir-H-Javadi.local manages the following processes: Job manager lookup processes 1 Job managers 1 Workers 1 When I want to find the resources, it gives me a warning message in red color as following, >> jm = findResource('scheduler','configuration','jobmanager') The job manager computer is unable to open a TCP connection back to this computer. You will not be able to transfer data of size greater than 246723 bytes between this computer and the job manager. Callback functions will also not work. ==================================================== Possible reasons for this problem are: 1. The job manager cannot resolve the short hostname of this computer. 2. This computer has multiple hostnames and the Distributed Computing Toolbox is using one that is unresolvable on the job manager. 3. A firewall is blocking communication between the job manager and this computer. 4. Network routers are unable to route traffic from the job manager to this computer. Refer to the Troubleshooting section of the documentation for detailed debugging instructions. The hostname used by the Distributed Computing Toolbox on this computer is: Amir-H-Javadi The fully qualified hostname of this computer is unknown The IP addresses of this computer are: 127.0.0.1, fe80:0:0:0:0:0:0:1%1, 0:0:0:0:0:0:0:1, fe80:0:0:0:21f:5bff:fe3b:cce8%4, 128.40.254.221 < it is the static IP address of my machine > The job manager name is: default_jobmanager The hostname of the job manager computer is: Amir-H-Javadi.local which resolves to the fully qualified hostname: 128.40.254.221 < it is the static IP address of my machine > The IP addresses of the job manager computer are: 127.0.0.1, fe80:0:0:0:0:0:0:1%1, 0:0:0:0:0:0:0:1, fe80:0:0:0:21f:5bff:fe3b:cce8%4, 128.40.254.221 < it is the static IP address of my machine > ==================================================== java.rmi.UnknownHostException: Unknown host: Amir-H-Javadi; nested exception is: java.net.UnknownHostException: Amir-H-Javadi The cause of this problem is: ==================================================== Amir-H-Javadi This is causing: Unknown host: Amir-H-Javadi; nested exception is: java.net.UnknownHostException: Amir-H-Javadi ==================================================== jm = Jobmanager Information ====================== Type : jobmanager ClusterOsType : unix DataLocation : database on default_jobmanager(a)Amir-H-Javadi.l... - Assigned Jobs Number Pending : 0 Number Queued : 0 Number Running : 0 Number Finished : 0 - Jobmanager Specific Properties Name : default_jobmanager Hostname : Amir-H-Javadi.local HostAddress(s) : fe80:0:0:0:0:0:0:1%1 : fe80:0:0:0:21f:5bff:fe3b:cce8%4 : 128.40.254.221 State : running NumberOfIdleWorkers : 1 NumberOfBusyWorkers : 0 OK. I can also create simple and parallel jobs as >> j = createParallelJob(jm) j = Parallel Job ID 5 Information ============================= UserName : ajavadi State : pending SubmitTime : StartTime : Running Duration : - Data Dependencies FileDependencies : {} PathDependencies : {} - Associated Task(s) Number Pending : 0 Number Running : 0 Number Finished : 0 TaskID of errors : - Jobmanager Dependent Properties MaximumNumberOfWorkers : Inf MinimumNumberOfWorkers : 1 Timeout : Inf RestartWorker : false QueuedFcn : RunningFcn : FinishedFcn : But when I want to create a task, it gives me the following error message, >> createTask(j, @Permutation, 1, {InputVar}); ??? Error using ==> distcomp.job.pCreateTask at 92 The job manager could not contact this MATLAB session on hostname Amir-H-Javadi and port 27370. Using the findResource command to find the job manager may provide a more detailed error message. What should I do now? As I mentioned before, I want to run all the tasks on my local machine. Sorry that my message got so long. Thanks again, Have a good time Amir "Titus Edelhofer" <titus.edelhofer(a)mathworks.de> wrote in message <hii15v$8uf$1(a)fred.mathworks.com>... > Hi, > yes, haven't thought of that one as an error source. > By the way, on Mac you probably only need "!./mdce start", not "!./mdce > install". The install is only on Windows ... > > Titus >
From: Titus Edelhofer on 13 Jan 2010 08:17
Hi Amir, hmm, now it becomes difficult ;-). It looks like a mismatch in host names: Amir-H-Javadi and Amir-H-Javadi.local There are only two suggestions left that I could give: - try to start the worker with the jobmanagerhost parameter set: !./stopworker !./startworker -jobmanagerhost 128.40.254.221 or !./startworker -jobmanagerhost Amir-H-Javadi - if this doesn't work. contact the Technical Support from The MathWorks Titus "Amir Homayoun " <a.h.javadi(a)gmail.com> schrieb im Newsbeitrag news:hik1pp$a1$1(a)fred.mathworks.com... > Hi > > Thanks for your reply. OK. Finally, I could run the engine. I had to make > some changes to the file "mdce_def.sh" so that it refers to valid > directory names. Here are the messages which I received during and after > starting the engine. I started one job manager and one worker. > >>> !./startjobmanager > < no message > > >>> !./startworker > Warning: Using multicast to locate the job manager because neither the > command > line arguments nor the mdce_def file specifies the job manager hostname or > how to locate the job manager. This warning will not appear if you use > either the -jobmanagerhost or -multicast flag. > For more information, run the startworker command with the -help flag. > >>> !./nodestatus -infolevel 3 > Job manager lookup process: > Status Running > Job manager: > Name default_jobmanager > Running on host Amir-H-Javadi.local < it is my name > > Number of workers 1 > Worker names and host names Amir-H-Javadi.local_worker, > Amir-H-Javadi.local > Start time Wed Jan 13 08:20:10 GMT 2010 > Port 27355 > Requested job manager lookup processes > Amir-H-Javadi.local:27350 < it is my name > > Registered with job manager lookup processes on hosts > Amir-H-Javadi.local:27350 < it is my name > > Database size in bytes 227515 > VM heap size in bytes 5353472 > Worker lease timeout in msec 60000 > Network addresses of host 127.0.0.1 > fe80:0:0:0:0:0:0:1%1 > 0:0:0:0:0:0:0:1 > fe80:0:0:0:21f:5bff:fe3b:cce8%4 > 128.40.254.221 < it is the static IP > address of my machine > > Worker: > Name Amir-H-Javadi.local_worker > Running on host Amir-H-Javadi.local > Status Idle > Job manager default_jobmanager > Connection with job manager Connected > Job manager hostname Amir-H-Javadi.local > Start time Wed Jan 13 08:21:28 GMT 2010 > Port 27356 > Requested job manager lookup processes Using > multicast > Registered with job manager lookup processes on hosts > Amir-H-Javadi.local:27350 > File dependencies directory > /applications/matlab74/toolbox/distcomp/user/lib/Amir-H-Javadi.local_Amir-H-Javadi.local_worker_mlworker_log/matlabDependencyDir > Worker startup directory > /applications/matlab74/toolbox/distcomp/user/lib/Amir-H-Javadi.local_Amir-H-Javadi.local_worker_mlworker_log/work > Network addresses of host 127.0.0.1 > fe80:0:0:0:0:0:0:1%1 > 0:0:0:0:0:0:0:1 > fe80:0:0:0:21f:5bff:fe3b:cce8%4 > 128.40.254.221 < it is the static IP > address of my machine > > Summary: > The mdce service on Amir-H-Javadi.local manages the following processes: > Job manager lookup processes 1 > Job managers 1 > Workers 1 > > > > > When I want to find the resources, it gives me a warning message in red > color as following, > >>> jm = findResource('scheduler','configuration','jobmanager') > The job manager computer is unable to open a TCP connection back to this > computer. > You will not be able to transfer data of size greater than 246723 bytes > between this computer and the job manager. Callback functions will also > not work. > ==================================================== > Possible reasons for this problem are: > 1. The job manager cannot resolve the short hostname of this computer. > 2. This computer has multiple hostnames and the Distributed Computing > Toolbox is using one that is unresolvable on the job manager. > 3. A firewall is blocking communication between the job manager and this > computer. > 4. Network routers are unable to route traffic from the job manager to > this computer. > Refer to the Troubleshooting section of the documentation for detailed > debugging instructions. > The hostname used by the Distributed Computing Toolbox on this computer > is: Amir-H-Javadi > The fully qualified hostname of this computer is unknown > The IP addresses of this computer are: 127.0.0.1, fe80:0:0:0:0:0:0:1%1, > 0:0:0:0:0:0:0:1, fe80:0:0:0:21f:5bff:fe3b:cce8%4, 128.40.254.221 < it is > the static IP address of my machine > > > The job manager name is: default_jobmanager > The hostname of the job manager computer is: Amir-H-Javadi.local > which resolves to the fully qualified hostname: 128.40.254.221 < it is > the static IP address of my machine > > The IP addresses of the job manager computer are: 127.0.0.1, > fe80:0:0:0:0:0:0:1%1, 0:0:0:0:0:0:0:1, fe80:0:0:0:21f:5bff:fe3b:cce8%4, > 128.40.254.221 < it is the static IP address of my machine > > ==================================================== > java.rmi.UnknownHostException: Unknown host: Amir-H-Javadi; nested > exception is: java.net.UnknownHostException: Amir-H-Javadi > The cause of this problem is: > ==================================================== > Amir-H-Javadi > This is causing: > Unknown host: Amir-H-Javadi; nested exception is: > java.net.UnknownHostException: Amir-H-Javadi > ==================================================== > > jm = > Jobmanager Information > ====================== > Type : jobmanager > ClusterOsType : unix > DataLocation : database on > default_jobmanager(a)Amir-H-Javadi.l... > - Assigned Jobs > Number Pending : 0 > Number Queued : 0 > Number Running : 0 > Number Finished : 0 > - Jobmanager Specific Properties > Name : default_jobmanager > Hostname : Amir-H-Javadi.local > HostAddress(s) : fe80:0:0:0:0:0:0:1%1 > : fe80:0:0:0:21f:5bff:fe3b:cce8%4 > : 128.40.254.221 > State : running > NumberOfIdleWorkers : 1 > NumberOfBusyWorkers : 0 > > > > > OK. I can also create simple and parallel jobs as > >>> j = createParallelJob(jm) > j = > Parallel Job ID 5 Information > ============================= > UserName : ajavadi > State : pending > SubmitTime : StartTime : Running Duration : - Data > Dependencies > FileDependencies : {} > PathDependencies : {} > - Associated Task(s) > Number Pending : 0 > Number Running : 0 > Number Finished : 0 > TaskID of errors : - Jobmanager Dependent Properties > MaximumNumberOfWorkers : Inf > MinimumNumberOfWorkers : 1 > Timeout : Inf > RestartWorker : false > QueuedFcn : RunningFcn : FinishedFcn : > > > But when I want to create a task, it gives me the following error message, > >>> createTask(j, @Permutation, 1, {InputVar}); > ??? Error using ==> distcomp.job.pCreateTask at 92 > The job manager could not contact this MATLAB session on hostname > Amir-H-Javadi and port 27370. > Using the findResource command to find the job manager may provide a more > detailed error message. > > What should I do now? As I mentioned before, I want to run all the tasks > on my local machine. Sorry that my message got so long. > > Thanks again, > > Have a good time > Amir > > > "Titus Edelhofer" <titus.edelhofer(a)mathworks.de> wrote in message > <hii15v$8uf$1(a)fred.mathworks.com>... >> Hi, >> yes, haven't thought of that one as an error source. >> By the way, on Mac you probably only need "!./mdce start", not "!./mdce >> install". The install is only on Windows ... >> >> Titus >> |