HTCondor batch system for job submission to a cluster

https://www.dacm-logiciels.fr/tracewin
FranceEmmanuel
Initiated
Initiated
Posts: 35
Joined: Thu 22 Sep 2022 08:45
Country:
France (fr)
France

Re: HTCondor batch system for job submission to a cluster

Post by FranceEmmanuel »

For sure I modified it ;) But the wrong one. :\

Emmanuel
Last edited by FranceEmmanuel on Fri 15 Nov 2024 07:15, edited 1 time in total.
FranceEmmanuel
Initiated
Initiated
Posts: 35
Joined: Thu 22 Sep 2022 08:45
Country:
France (fr)
France

Re: HTCondor batch system for job submission to a cluster

Post by FranceEmmanuel »

I found out that I modified a copy of tw_job_run.sh... :/
However, it appears that the modifications of tw_job_run.sh (introducing pwd) aren't good, and TraceWin still fails at launching tracelx64 on the cluster. I get the message "[27] Exec 'tracelx64' ->Failed [Run Failed]". Why TraceWin fails at running tracelx64? It should let the submit script run tracelx64. Perhaps I misunderstand the message?
The path seem to be correct and not changing in during the runtime (after and an update of TraceWin in the afternoon) [update : noticed that issue again, so TW update didn't solve it]. I noticed that the number of remote directories opened on the server corresponds to the number of cores set for this server, and it's not related to the number of studies required. I wonder if the number of remote directories is equal to the number of cores in case of use of several servers.
I also have messages in the terminal windows where I'm connected to the submission server :

Submitting job(s)
ERROR: Executable file /pole_acc_ssi/froidef/TW/TWServer/tracelx64 does not exist


Emmanuel
FranceEmmanuel
Initiated
Initiated
Posts: 35
Joined: Thu 22 Sep 2022 08:45
Country:
France (fr)
France

Re: HTCondor batch system for job submission to a cluster

Post by FranceEmmanuel »

Dear Didier,

Now, the scripts produce the outputs from the command "echo" as required by TraceWin. I can launch a job, get its status and kill it with scripts in command line. But TraceWin still fails at launching jobs for error studies (message "Exec 'tracelx64' ->Failed [Run Failed]" looking at remote computers's tab messages).
In TraceWin.log, I also find these two lines :
Cannot find init file
From->[PROCESS_INIT] : Constructor
I can't figure out when TraceWin loses the path to the .ini file. So I'm still working on it.

Emmanuel
FranceEmmanuel
Initiated
Initiated
Posts: 35
Joined: Thu 22 Sep 2022 08:45
Country:
France (fr)
France

Re: HTCondor batch system for job submission to a cluster

Post by FranceEmmanuel »

Dear Didier,
What's the type of protocole used by Tracewin when testing a server ?
I want to use the cluster of CC-IN2P3 in Lyon which runs under Slurm. I can connect to the server via a ssh connection. But I want to test it with TraceWin, It doesn't find it.
Thank you in advance for any help.

Emmanuel
Post Reply