Hi,
Since I have noticed that we are doing very poorly for problems with many test cases, I have looked into what causes how much of the slowdown (compared to verifyproblem it's 50x for the insane WF problem). The least invasive improvement we can do is in making some changes to the judgehost API. I would like to do those changes before we make the next release aka before NWERC so that we don't (have to) change the API in the next release again.
Currently the API requests for one judging look like:
API request POST judgehosts/next-judging/bar-0
*Judging submission s92 (endpoint default) (t1/p9/cpp), id j257...* API request GET config data=name=script_timelimit API request GET config data=name=script_memory_limit API request GET config data=name=script_filesize_limit API request GET config data=name=process_limit API request GET config data=name=output_storage_limit *^-- above requests could be grouped together already today, but that doesn't bring a big win (I'll do it anyway)*
Working directory: /home/sitowert/domjudge/output/judgings/bar-0/endpoint-default/c2-s92-j257 API request GET contests/2/submissions/92/source-code API request PUT judgehosts/update-judging/bar-0/257 data=compile_success=1&output_compile=a25pZ2h0REsuY2M6IEluIGZ1bmN0aW9uICdpbnQgbWFpbigpJzoKa25pZ2h0REsuY2M... executing chroot script: 'chroot-startstop.sh start' API request GET config data=name=timelimit_overshoot *^--- could also be grouped to the config requests above*
API request GET testcases/next-to-judge/257 Running testcase 1... API request POST judgehosts/add-judging-run/bar-0/257 data=testcaseid=94&runresult=correct&runtime=0.001&output_run=MTIK&output_error=&output_system=Q29ycmVjdC... Testcase 1 done, result: correct API request GET testcases/next-to-judge/257 Running testcase 2... API request POST judgehosts/add-judging-run/bar-0/257 data=testcaseid=95&runresult=correct&runtime=0.001&output_run=Ngo%3D&output_error=&output_system=Q29ycmVj... Testcase 2 done, result: correct API request GET testcases/next-to-judge/257 Running testcase 3... API request GET testcases/95/file/input API request GET testcases/95/file/output *^--- this is only done for unknown/changed testdata*
*-----------------------------------------------------------------------------------------------------------*
In the future, I want to have it rather like this flow: API request POST judgehosts/next-judging/bar-0 *^--- this will additionally return all the md5sums for all inputs/outputs of the problem*
*Judging submission s92 (endpoint default) (t1/p9/cpp), id j257...* API request GET config data=name=script_timelimit,script_memory_limit,script_filesize_limit...timelimit_overshoot Working directory: /home/sitowert/domjudge/output/judgings/bar-0/endpoint-default/c2-s92-j257 API request GET contests/2/submissions/92/source-code API request PUT judgehosts/update-judging/bar-0/257 data=compile_success=1&output_compile=a25pZ2h0REsuY2M6IEluIGZ1bmN0aW9uICdpbnQgbWFpbigpJzoKa25pZ2h0REsuY2M... executing chroot script: 'chroot-startstop.sh start' API request GET testcases/95/file/input API request GET testcases/95/file/output *^--- request all unknown/changed testdata* Running testcase 1... Testcase 1 done, result: correct Running testcase 2... Testcase 2 done, result: correct Running testcase 3... Testcase 3 done, result: wrong-answer API request POST judgehosts/add-judging-run/bar-0/257 data=TBD *^--- this is actually the result of three testcases* API request GET testcases/next-to-judge/257 *^--- this is the check whether we should continue judging*
*-----------------------------------------------------------------------------------------------------------*
The main idea is to not do the next-to-judge calls (as long as possible) and to group the add-judging-run calls. The next-to-judge calls are not necessary as long as all previous test cases are correct. For the most commonly used judging model (fail on first error) this is true for at least N-1 of N test cases. Now we still want to ping back from time to time to a) signal progress, and b) not to post too much data back in the database at once.
So I could imagine reasonable default values is to post back at least every 10s and if we accumulated more than twice the output limit but that should be configurable.
For the actual grouping/batching I can see two options: either allow the one POST add-judging-run call to allow multiple results to be posted or follow something like this: https://developers.facebook.com/docs/graph-api/making-multiple-requests to do batched requests across the API.
For a problem with N test cases and M unknown test cases, this approach will bring us down from 9 + N*2 + M*2 API calls to 4 + M*2 + X/10 + 2 API calls (<-- in the best case of a correct problem of course with X seconds of total judging time)
Any objections or thoughts?
Tobi
https://github.com/DOMjudge/domjudge/pull/444 contains an unpolished implementation prototype (which also throws an exception in the end). Brings down judging time from 165 to 65s.
Tobias Werth tobias.werth@gmail.com schrieb am Mi., 31. Okt. 2018 um 19:59 Uhr:
Hi,
Since I have noticed that we are doing very poorly for problems with many test cases, I have looked into what causes how much of the slowdown (compared to verifyproblem it's 50x for the insane WF problem). The least invasive improvement we can do is in making some changes to the judgehost API. I would like to do those changes before we make the next release aka before NWERC so that we don't (have to) change the API in the next release again.
Currently the API requests for one judging look like:
API request POST judgehosts/next-judging/bar-0
*Judging submission s92 (endpoint default) (t1/p9/cpp), id j257...* API request GET config data=name=script_timelimit API request GET config data=name=script_memory_limit API request GET config data=name=script_filesize_limit API request GET config data=name=process_limit API request GET config data=name=output_storage_limit *^-- above requests could be grouped together already today, but that doesn't bring a big win (I'll do it anyway)*
Working directory: /home/sitowert/domjudge/output/judgings/bar-0/endpoint-default/c2-s92-j257 API request GET contests/2/submissions/92/source-code API request PUT judgehosts/update-judging/bar-0/257 data=compile_success=1&output_compile=a25pZ2h0REsuY2M6IEluIGZ1bmN0aW9uICdpbnQgbWFpbigpJzoKa25pZ2h0REsuY2M... executing chroot script: 'chroot-startstop.sh start' API request GET config data=name=timelimit_overshoot *^--- could also be grouped to the config requests above*
API request GET testcases/next-to-judge/257 Running testcase 1... API request POST judgehosts/add-judging-run/bar-0/257 data=testcaseid=94&runresult=correct&runtime=0.001&output_run=MTIK&output_error=&output_system=Q29ycmVjdC... Testcase 1 done, result: correct API request GET testcases/next-to-judge/257 Running testcase 2... API request POST judgehosts/add-judging-run/bar-0/257 data=testcaseid=95&runresult=correct&runtime=0.001&output_run=Ngo%3D&output_error=&output_system=Q29ycmVj... Testcase 2 done, result: correct API request GET testcases/next-to-judge/257 Running testcase 3... API request GET testcases/95/file/input API request GET testcases/95/file/output *^--- this is only done for unknown/changed testdata*
*-----------------------------------------------------------------------------------------------------------*
In the future, I want to have it rather like this flow: API request POST judgehosts/next-judging/bar-0 *^--- this will additionally return all the md5sums for all inputs/outputs of the problem*
*Judging submission s92 (endpoint default) (t1/p9/cpp), id j257...* API request GET config data=name=script_timelimit,script_memory_limit,script_filesize_limit...timelimit_overshoot Working directory: /home/sitowert/domjudge/output/judgings/bar-0/endpoint-default/c2-s92-j257 API request GET contests/2/submissions/92/source-code API request PUT judgehosts/update-judging/bar-0/257 data=compile_success=1&output_compile=a25pZ2h0REsuY2M6IEluIGZ1bmN0aW9uICdpbnQgbWFpbigpJzoKa25pZ2h0REsuY2M... executing chroot script: 'chroot-startstop.sh start' API request GET testcases/95/file/input API request GET testcases/95/file/output *^--- request all unknown/changed testdata* Running testcase 1... Testcase 1 done, result: correct Running testcase 2... Testcase 2 done, result: correct Running testcase 3... Testcase 3 done, result: wrong-answer API request POST judgehosts/add-judging-run/bar-0/257 data=TBD *^--- this is actually the result of three testcases* API request GET testcases/next-to-judge/257 *^--- this is the check whether we should continue judging*
*-----------------------------------------------------------------------------------------------------------*
The main idea is to not do the next-to-judge calls (as long as possible) and to group the add-judging-run calls. The next-to-judge calls are not necessary as long as all previous test cases are correct. For the most commonly used judging model (fail on first error) this is true for at least N-1 of N test cases. Now we still want to ping back from time to time to a) signal progress, and b) not to post too much data back in the database at once.
So I could imagine reasonable default values is to post back at least every 10s and if we accumulated more than twice the output limit but that should be configurable.
For the actual grouping/batching I can see two options: either allow the one POST add-judging-run call to allow multiple results to be posted or follow something like this: https://developers.facebook.com/docs/graph-api/making-multiple-requests to do batched requests across the API.
For a problem with N test cases and M unknown test cases, this approach will bring us down from 9 + N*2 + M*2 API calls to 4 + M*2 + X/10 + 2 API calls (<-- in the best case of a correct problem of course with X seconds of total judging time)
Any objections or thoughts?
Tobi
Hi Tobi,
On Wed, October 31, 2018 19:59, Tobias Werth wrote:
The main idea is to not do the next-to-judge calls (as long as possible) and to group the add-judging-run calls. The next-to-judge calls are not necessary as long as all previous test cases are correct. For the most commonly used judging model (fail on first error) this is true for at least N-1 of N test cases. Now we still want to ping back from time to time to a) signal progress, and b) not to post too much data back in the database at once.
So I could imagine reasonable default values is to post back at least every 10s and if we accumulated more than twice the output limit but that should be configurable.
yes, that seems like a good approach. It does diminish a bit the experience of watching some judging being judged "live" in the DJ interface; you now get updates less frequently when you watch a submission page of a submission being judged.
For the actual grouping/batching I can see two options: either allow the one POST add-judging-run call to allow multiple results to be posted or
I think I prefer this approach.
Was also wondering if you'd want to take fetch_executable() outside of the inner loop; I know it does not fire off API calls each time but it seems unlikely or even undesirable to have executables change between testcases - call that once before judging starts?
Also I think we could be saving some time by not rebuilding the cURL connection each and every request. Made a pull request about that here: https://github.com/DOMjudge/domjudge/pull/445
Not really tested since my MySQL on my real development instance is broken due to dependency hell. But maybe you can measure that with your testset. Should do it in any case I guess.
Cheers, Thijs