On 28/02/18 11:32, PEDRO PABLO GOMEZ MARTIN wrote:
Hello,
Hi Pedro,
I've been using DOMjudge since the old 3.x days, and in the last few years I've been applying a couple of changes in my new installations to overcome some problems that, IMHO, have arisen with the new versions. I want to share them in case they are useful for others, they can be considered to by applied to the mainstream version, or to get feedback if I've been doing something wrong.
The first one is regarding CPU vs. wall limit times. When runguard is invoked from testcase_run.sh, the script provides it with the parameters --cputime and --walltime. Years ago the walltime was set with 2*${TIMELIMIT}. But at some point in the past, the format of $(TIMELIMIT) was changed to "<soft>:<hard>" and that expression stop working. The script was updated and now the --walltime is exactly the same as --cputime.
I experienced difficulties because of that some time ago, because runguard reported Time Limit due walltime exceeded even when the CPU limit was far to be reached. This occurred when the server was suffering a high load, or the first time a big testcase had to be recover from the database.
The first case (high server load) can indeed happen. It is a case that we don't really consider, since in a contest setup we'd expect dedicated machines for running the judgedaemon(s) on.
The case of downloading testcases from the database shouldn't be affected, because this happens before running the submission on which the timelimit is enforced.
My solution has been to change that script (testcase_run.sh) so the walltime is scaled up. The official script is:
runcheck ./run testdata.in http://testdata.in program.out \ $GAINROOT "$RUNGUARD" ${DEBUG:+-v} $CPUSET_OPT \ ${USE_CHROOT:+-r "$PWD/.."} \ --nproc=$PROCLIMIT \ --no-core --streamsize=$FILELIMIT \ --user="$RUNUSER" --group="$RUNGROUP" \ *--walltime=$TIMELIMIT --cputime=$TIMELIMIT * --memsize=$MEMLIMIT --filesize=$FILELIMIT \ --stderr=program.err --outmeta=program.meta -- \ "$PREFIX/$PROGRAM" 2>runguard.err
and my change modifies the --walltime line:
... --walltime=*$((4*${TIMELIMIT%:*}))* --cputime=$TIMELIMIT \ ...
Note that I'm using a factor of 4 instead of 2 used long ago by DOMjudge. This change works for versions up to 5.1.x. Since DOMjudge 5.2 timelimits can be float values, and my expression is not working anymore. Now bc is needed in order to be able to use float arithmetic:
... --walltime=*$(echo "scale=2;4*${TIMELIMIT%:*}" | bc)* --cputime=$TIMELIMIT \ ...
This last patch is due to my college Joan Rodríguez, and adds a new package dependence (in judgehosts) with the bc program.
Note that although we set the CPU and wall time options equal, you can still tweak the difference between soft and hard timelimits with the "timelimit overshoot" configuration setting. You can set the hard timelimit a lot larger than the soft (real) one, and both the CPU and wall time will only trigger killing the program when the hard limit is reached. The CPU time is still used to determine whether the submission should get a time limit exceed verdict. This effectively gives the same behaviour, but the only disadvantage of this method is that now a slow solution will be allowed to consume way more CPU time too, before it gets killed.
The second change that I've been applying to DOMjudge is in runguard itself, and it is also related with CPU time, although now I'm unsure about the reasons causing my problems. Sometimes, when evaluating java submissions, instead of Time Limit verdict we got Run Error. We solved this issue increasing the real cputime hard limit in runguard. The official runguard adds one second to the CPU time limit, and for some reasons that seems to be insufficient in some cases so we change it to add 2 seconds. In the runguard.c source code:
if ( use_cputime ) { /* The CPU-time resource limit can only be specified in seconds, so round up: we can measure actual CPU time used more accurately. Also set the real hard limit one second higher: at the soft limit the kernel will send SIGXCPU at the hard limit a SIGKILL. The SIGXCPU can be caught, but is not by default and gives us a reliable way to detect if the CPU-time limit was reached. */ rlim_t cputime_limit = (rlim_t)ceil(cputime[1]); verbose("setting hard CPU-time limit to %d(+*2*) seconds",(int)cputime_limit); lim.rlim_cur = cputime_limit; ^ lim.rlim_max = cputime_limit+*2*; | setlim(CPU); ^ | } +---------- CHANGED
This change avoids those false RUN ERROR verdicts we sporadically suffered. Unfortunally I cannot provide a way to replicate the error because it is quite erratic. As judgehosts we use virtual machines, in case that is important.
Ok, that's interesting to know, but indeed difficult to act upon if you do know have a way of reproducing it. The use of VMs might affect this, but I'm not sure. We've certainly seen that timings on VMs are not quite as consistent as on bare metal.
Thank you for your great DOMjudge.
Best regards, Pedro Pablo
It's good to hear that we have such long-time users :-)
Best, Jaap