For sure I modified it But the wrong one. :\
Emmanuel
HTCondor batch system for job submission to a cluster
Re: HTCondor batch system for job submission to a cluster
Last edited by Emmanuel on Fri 15 Nov 2024 07:15, edited 1 time in total.
Re: HTCondor batch system for job submission to a cluster
I found out that I modified a copy of tw_job_run.sh... :/
However, it appears that the modifications of tw_job_run.sh (introducing pwd) aren't good, and TraceWin still fails at launching tracelx64 on the cluster. I get the message "[27] Exec 'tracelx64' ->Failed [Run Failed]". Why TraceWin fails at running tracelx64? It should let the submit script run tracelx64. Perhaps I misunderstand the message?
The path seem to be correct and not changing in during the runtime (after and an update of TraceWin in the afternoon) [update : noticed that issue again, so TW update didn't solve it]. I noticed that the number of remote directories opened on the server corresponds to the number of cores set for this server, and it's not related to the number of studies required. I wonder if the number of remote directories is equal to the number of cores in case of use of several servers.
I also have messages in the terminal windows where I'm connected to the submission server :
Submitting job(s)
ERROR: Executable file /pole_acc_ssi/froidef/TW/TWServer/tracelx64 does not exist
Emmanuel
However, it appears that the modifications of tw_job_run.sh (introducing pwd) aren't good, and TraceWin still fails at launching tracelx64 on the cluster. I get the message "[27] Exec 'tracelx64' ->Failed [Run Failed]". Why TraceWin fails at running tracelx64? It should let the submit script run tracelx64. Perhaps I misunderstand the message?
The path seem to be correct and not changing in during the runtime (after and an update of TraceWin in the afternoon) [update : noticed that issue again, so TW update didn't solve it]. I noticed that the number of remote directories opened on the server corresponds to the number of cores set for this server, and it's not related to the number of studies required. I wonder if the number of remote directories is equal to the number of cores in case of use of several servers.
I also have messages in the terminal windows where I'm connected to the submission server :
Submitting job(s)
ERROR: Executable file /pole_acc_ssi/froidef/TW/TWServer/tracelx64 does not exist
Emmanuel
Re: HTCondor batch system for job submission to a cluster
Dear Didier,
Now, the scripts produce the outputs from the command "echo" as required by TraceWin. I can launch a job, get its status and kill it with scripts in command line. But TraceWin still fails at launching jobs for error studies (message "Exec 'tracelx64' ->Failed [Run Failed]" looking at remote computers's tab messages).
In TraceWin.log, I also find these two lines :
Cannot find init file
From->[PROCESS_INIT] : Constructor
I can't figure out when TraceWin loses the path to the .ini file. So I'm still working on it.
Emmanuel
Now, the scripts produce the outputs from the command "echo" as required by TraceWin. I can launch a job, get its status and kill it with scripts in command line. But TraceWin still fails at launching jobs for error studies (message "Exec 'tracelx64' ->Failed [Run Failed]" looking at remote computers's tab messages).
In TraceWin.log, I also find these two lines :
Cannot find init file
From->[PROCESS_INIT] : Constructor
I can't figure out when TraceWin loses the path to the .ini file. So I'm still working on it.
Emmanuel
Re: HTCondor batch system for job submission to a cluster
Dear Didier,
What's the type of protocole used by Tracewin when testing a server ?
I want to use the cluster of CC-IN2P3 in Lyon which runs under Slurm. I can connect to the server via a ssh connection. But I want to test it with TraceWin, It doesn't find it.
Thank you in advance for any help.
Emmanuel
What's the type of protocole used by Tracewin when testing a server ?
I want to use the cluster of CC-IN2P3 in Lyon which runs under Slurm. I can connect to the server via a ssh connection. But I want to test it with TraceWin, It doesn't find it.
Thank you in advance for any help.
Emmanuel