Re: Checktestdata as EDSL

11 Dec 2016

      Hi Jeroen,
On 10-12-16 13:18, Jeroen Bransen wrote:
...
Hi all,
Checktestdata is a nice DSL and has proved helpful in many situation, 
but from the moment I used it I was missing features and wondering why 
it wasn't implemented as an EDSL (Embedded Domain Specific Language). 
Now, many years later, I've finally found some time to create it. I've 
implemented a little library in Haskell, and as a sneak preview I've 
taken the interception problem from NCPC 2016 as referenced to in [1] 
and attached it. The first question is: where shall I put the sources to 
the library? Somewhere in the checktestdata repository (and if so, 
where), or maybe create a checktestdata2 repository?
I'd say just put them in the checktestdata repository, simply in the
root. We can always move stuff later.
Btw, this reminds me of code that Gerben Stavenga once wrote: he also
wrote a reimplementation of checktestdata in Haskell, and it was
similarly short, but also a preliminary version lacking good error
handling/reporting.
I think this can be useful, especially for specific problems where
either speed or extensibility of the syntax is needed. On the other
hand, I think that the requirement of having GHC around is quite heavy,
while a C++ compiler with boost is more standard, even on Windows I'd think?
Do you think it would be easy to automatically convert the original
checktestdata syntax scripts into this EDSL to then compile them? That
way, we could support both syntaxes and users would be able to choose
which method to use.
Jaap
...
Let me give a little more information on why I think an EDSL is a good 
idea here, and why I choose for Haskell (other than my background):

With an EDSL, you get so much for free from the underlying language,

such as (typed) variables, all sorts of abstraction mechanisms, compiler 
optimizations, etc. For example, some testdata script may consist of 
some repeated parts, which in an EDSL can simply be given a name to get 
rid of the duplication.

Compiler optimizations, which of course also exist for C++ in which

checktestdata is built now, don't only optimize the checktestdata 
runtime, but also the script itself since it is just a program and not 
some custom script anymore. Especially with a functional language such 
as Haskell this means definitions of primitives can be inlined and whole 
parts of the script may get fused to a very efficient version.

Maintainability: my library is now +- 200 LOC, including docs and

empty lines, and contains quite a lot of the checktestdata language 
already. Because Haskell has many well optimized parsing libraries, we 
get a lot for free, and only have to create a nice generalization for 
this particular use case.

Efficiency: the current checktestdata is just too slow, as mentioned

in [1]. As a reference, on my machine the checktestdata script for 
interception 09.max_ans.in takes 35.5s, the python script takes 2.2s, 
and the attached version only 2s. And I haven't done anything to 
optimize specifically for this case, I've just translated the ctd script 
to the syntax of my EDSL.

With this, the script can do more than just checking the syntax. I.e.

when the problem specifies that it contains a connected graph, one can 
check the syntax, but also the connectedness of the graph, while still 
separating concerns, i.e. the syntax of the input can be separated from 
the semantic checks, but the semantic checks don't need to come in a 
separate program which also has to read the input again, like we should 
now. All can be combined naturally.

Create your own combinators: the primitives can be easily combined

into more complex combinators which you can then reuse. I.e. think of a 
'point2d' combinator that reads two space separated integers.

Haskell Platform needs to be available, this is easy on all major OS,

but is a quite large dependency. Then again, GCC and Boost are large as 
well, and are less Windows-friendly.

Slighly different syntax: although Haskell is quite liberal in its

syntax for EDSL's, users have to get used to particularities of the 
language/

Data generation is not easily possible, if not infeasible. There

should be some way of embedding the language to make this possible, and 
this is actually what I've been working on for the past few years when 
thinking about this problem, but since we mostly use it for checking 
data I went with a monadic embedding which is very easy for data checking.

Backwards compatibility: parsing existing scripts and running it with

the new backend is not going to be easy. I don't expect show stoppers, 
but it definitely won't be as efficient as writing scripts within the 
language because of variable bindings. I need to work on this a bit to 
be sure though.
That's it, of course my libary is not done yet (no good error messages, 
no regex yet), but it's a proof of concept that is already usable for 
simpler problems. I hope to convince you that this is the way to go, and 
to get some ideas to create the POC into a working tool. Also, 
suggestions on 'interesting use cases' are welcome, the interception is 
a nice one but I am sure there are many more problems for which the 
standard checktestdata just not works.
Yours truly,
Jeroen
[1] https://github.com/DOMjudge/checktestdata/issues/3

DOMjudge-devel mailing list
DOMjudge-devel@domjudge.org
https://www.domjudge.org/mailman/listinfo/domjudge-devel

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

Re: Checktestdata as EDSL