HTCondor batch system for job submission to a cluster

https://www.dacm-logiciels.fr/tracewin
Post Reply
FranceEmmanuel
Apprentice
Apprentice
Posts: 22
Joined: Thu 22 Sep 2022 08:45
Country:
France (fr)
France

HTCondor batch system for job submission to a cluster

Post by FranceEmmanuel »

Hello Didier,

The batch system that allows us to use the servers dedicated to simulation in my lab is HTCondor.

In the TraceWin help, at the section "For all other clusters", twserver is meant to be started with the argument called "cluster", that is ess or isipic as I guess from the other clusters examples. With HTCondor, I don't have such a name to feed twserver with. So I simply type "cluster"; I also tried to use the name of cluster node server, but "Incorrect argument" is then returned when trying to run twserver.

And, by the way, when I try to submit the job to the cluster in the way recommended by our IT team, no job is launched. To submit a job I created a test_TraceWin.submit file that is an argument for condor_submit command - download it here https://mycore.core-cloud.net/index.php ... rgWd0qALVw . In the twserver.log (attached), I don't find any information that would help.

I'm not sure that this kind of job submission is allowed with TraceWin. Can you tell me how to proceed, if you ever had to use HTCondor, or would it be possible to allow such a job submission method ?

Thank you in advance for any help.

Emmanuel
Attachments
twserver.log
(888 Bytes) Downloaded 3 times
User avatar
FranceDidier
Administrator
Administrator
Posts: 959
Joined: Wed 26 Aug 2020 14:40
Country:
France (fr)
France

Re: HTCondor batch system for job submission to a cluster

Post by FranceDidier »

Hi Emmanuel,

The 3 scripts "tw_job_run.sh", "tw_job_status.sh", "tw_job_kill.sh", given as examples in the manual must be modified to suit your cluster.
Then, the first step is to check that the 3 routines work correctly by hand without going through twserver.
- Can you launch a job via ‘tw_job_run.sh’ and give me the screen output.
- Check its status with ‘tw_job_status.sh’ and give me the screen output.
- Be able to kill the job via ‘tw_job_kill.sh’

Regards,

Didier
FranceEmmanuel
Apprentice
Apprentice
Posts: 22
Joined: Thu 22 Sep 2022 08:45
Country:
France (fr)
France

Re: HTCondor batch system for job submission to a cluster

Post by FranceEmmanuel »

Well, there are no equivalent standard commands to sbatch, squeue or scancel that come with the slurm package. I can retrieve the job ID or put a kill -9/-15 instead of scancel (I'm not sure that's equivalent). But squeue has strictly no similar commands.

At least, if I simply write the following lines in tw_job_run.sh, I can launch the code and retrieve the job ID :
...
$1 $2 $3 $4 &
ID=$(pidof $full_name_code)

But I'm not sur that the code runs the servers listed : I couldn't see any sign of use of the other servers of the cluster's servers, even if they appear in the computer list. It seems that the job I can see running, run only on the submission server.

The submit file include a list of servers. The code isn't meant to choose the servers where it runs. This is HTCondor's job. The cluster's servers aren't reachable with the base communication protocole of TraceWin ssh. Finally, the .submit file does not run TraceWin.
Then I don't know if HTCondor and the slurm package are compatible.
I'm still clueless on the way to use this cluster for TraceWin

Emmanuel
User avatar
FranceDidier
Administrator
Administrator
Posts: 959
Joined: Wed 26 Aug 2020 14:40
Country:
France (fr)
France

Re: HTCondor batch system for job submission to a cluster

Post by FranceDidier »

Dear Emmanuel,

I don't think we understand each other, so if you could give me a call, it would be easier,

Regards,

Didier
FranceEmmanuel
Apprentice
Apprentice
Posts: 22
Joined: Thu 22 Sep 2022 08:45
Country:
France (fr)
France

Re: HTCondor batch system for job submission to a cluster

Post by FranceEmmanuel »

It seems I was considering things the wrong way.

I had an answer from the IT team saying that slurm and htcondor are two totally different batch systems.

However, they tell me that there might exist ways to convert the scripts from one system to the other (check the web site here https://portal.osg-htc.org/documentatio ... _HTCondor/).

I'm working on it.

Emmanuel
User avatar
FranceDidier
Administrator
Administrator
Posts: 959
Joined: Wed 26 Aug 2020 14:40
Country:
France (fr)
France

Re: HTCondor batch system for job submission to a cluster

Post by FranceDidier »

Dear Emmanuel,

Cripts, I've given are only examples, they obviously need to be adapted keeping the inputs and outputs compatible.

Regards,

Didier
FranceEmmanuel
Apprentice
Apprentice
Posts: 22
Joined: Thu 22 Sep 2022 08:45
Country:
France (fr)
France

Re: HTCondor batch system for job submission to a cluster

Post by FranceEmmanuel »

Ok, this something I understood from the very beginning ;)

So it requires from me a learning stage.

Emmanuel
Post Reply