New subject: Apache/DOMjudge Tuning

9 Nov 2018


      Probably I've broken the mailing list thread, but I wanted to reply to
you before its too late.
...
In the interest of avoiding the issue we had last year we want to
ensure we have plenty of web service resources available for DOMjudge
this weekend.  I have installed DOMjudge 6.0.2 and it seems to be
working.
When I look at the current processes running (with the default
configuration) I see five httpd processes and five php-fpm processes.
 From what I can tell, it looks like I should use php-fpm tuning
parameters to greatly increase the number of php-fpm processes
running.  (We have 16 GB of RAM, so I see no problem with doing so.)
Does the DOMjudge team have any recommendations on which parameters
to change to support 100 teams?  Am I looking in the right place to
ensure that we don't run out of web service resources again?
100 teams should be pretty easy to support on hardware with those
specifications.
For PHP-FPM, there are a couple of relevant parameters to tune based
on your system. The first is pm = static. This disables scaling of fpm
workers so there are always a fixed number of workers available. To
control how many fixed workers, that's the pm.max_children variable.
It looks like the domjudge-fpm.conf file that ships with domjudge
seems to set these, and gives some guidance for how to set it, but I
think the comment might be a little bit out of date since the move to
symfony which requires a bit more memory.
Running domjudge on master a few weeks back, I seem to recall
100-150mb per fpm worker on average. So the value you choose for
max_children should be based on this. E.g. if you want to use 10gb of
your server memory then you'd want to set it to around 70(which is
probably plenty for 100 teams). You should also make sure that
pm.max_requests is set to some reasonably large non-zero value. This
just protects you from unknown memory leaks by restarting fpm worker
processes after some number of requests(I use 5000). I've never used
apache in front of php-fpm, but you may want to check the
MaxRequestWorkers/KeepAliveTimeout/KeepAlive setting. If you are on a
local network/not using ssl probably you can just disable keepalive to
ensure you don't run into the errors encountered last year.
The best thing you can do though is to do some load testing to see
what happens under load. You can probably accomplish this with just
apachebench, but make sure you point it at a real page(i.e. something
that isn't just a 302 redirect or a static file or something like
that). A good page is probably the login page; do a curl -I on it
first to make sure it returns a 200 OK + some actual content. For 100
teams you're probably only looking at like 10 requests/second
tops(team pages refresh every 30 seconds, which is like 3 reqs/sec.
Add in some spectators and people who open multiple tabs and that gets
you to 10ish). That said your server should be able to do way more
requests/sec than that.
An apachebench command that would simulate using keepalive + 150
concurrent users would be something like this:
ab -k -c 150 -n 15000 example.com/domjudge/login
In the results you'll want to pay attention to a few things:
The 99th/all percentiles at the bottom, and the max time for each.
You'd want to see something like 95% of requests less than a couple
hundred ms, and 100% of requests under 1-2 seconds or so. I think
there is also a count for errors(if it happens to blow up), but you
want that to be zero. Play with the values for the -c argument to test
more/less concurrent "users" so you can get a good idea for what your
server can handle(I would keep increasing this until you discover at
what point it starts to fail). You might also want to watch memory
usage of the php-fpm workers and tune the pm.max_children values as
needed.
An aside, I don’t run apache(opting for nginx instead), but I try to
configure it to set caching headers for css/png/jpg/etc, as this will
reduce the load on the server. Make sure you’re all set with
customizations as it’ll be difficult to css/js after the fact. You can
skip this step if you’d like, 100 teams is not that many and a
properly configured server should be able to easily serve that many
requests.
The judge hosts check in periodically, so make sure you account for
them in your load testing. More judge hosts = more requests making it
to the domjudge server.  Sometimes substantially so, e.g. in the case
of a problem with ~100 test cases and a tiny runtime you’ll probably
see multiple requests per second just to judge that. So if you have
lots of judge hosts that can negatively impact the performance of your
web server.
I hope that rambling is helpful in some way and provides you some
useful tuning options.
-Keith

Re: Apache/DOMjudge Tuning