Re: Checktestdata as EDSL

30 May 2017


      Op 18-5-2017 om 12:45 schreef Jaap Eldering:
...
On 18-05-17 00:36, Jaap Eldering wrote:
...
On 16-05-17 19:00, Jeroen Bransen wrote:
...
Op 12-5-2017 om 14:27 schreef Jaap Eldering:
...
On 12-05-17 12:13, Jaap Eldering wrote:
...
On 08-05-17 12:28, Jeroen Bransen wrote:
[...]
...
I tried to fix the Haskell code, see attached diff, but that didn't work
(at least for me).
Hmm... I guess I was fuzzing around too much. I tested this patch again
today, and it yields the expected result: the Haskell code makes a dot
match newlines as POSIX extended regex prescribes.
Running through the tests, it did uncover a later test that does fail
(in a minor way):
./checktestdata-haskell  tests/testprog29.err2 tests/testdata29.in
Testdata OK
but the minimum number of digits (-1) is out of range:
FLOATP(10,100,-1,3) NEWLINE
I'll commit changes to fix both. Jeroen, can you verify that my changes
are correct?
Well, it's Haskell code and it compiles, so it must be right... ;-) But
yes, I thought this was more complicated, but now that I look at your
changes this is quite straightforward. I'll try to add some examples soon.
Jeroen
...
Jaap
...
...
...
I'm also open to suggestions to change to a different regex standard.
I am confused. What standard are you using/implementing now? The docs 
don't seem to mention one, and I think that looking at the code 
previously I concluded it was POSIX-extended. I am fine with changing to 
a different one, but I'd suggest either PCRE, POSIX-basic or 
POSIX-extended. These three are all implemented in Boost, which I assume 
you can easily use as the code already depends on Boost. Anyway, it 
would be good to pick one and make it explicit in the docs.
So I was aiming for POSIX extended, but was inadvertently using C++
regex which defaults to ECMAscript.
I'd prefer to stick to the C++ STL regex library, but that has the
disadvantage I mentioned above. So that leaves me wondering what the
best choice is.
Jaap
...
...
...
...
[1]
http://stackoverflow.com/questions/399078/what-special-characters-must-be-es...
[2] https://wiki.haskell.org/Regex_Posix
Op 13-12-2016 om 19:29 schreef Jaap Eldering:
> On 13-12-16 12:27, Jeroen Bransen wrote:
>> Hi Jaap,
>>
>> Op 11-12-2016 om 11:42 schreef Jaap Eldering:
>>> Hi Jeroen,
>>>
>>> On 10-12-16 13:18, Jeroen Bransen wrote:
>>>> Hi all,
>>>>
>>>> Checktestdata is a nice DSL and has proved helpful in many situation,
>>>> but from the moment I used it I was missing features and wondering why
>>>> it wasn't implemented as an EDSL (Embedded Domain Specific Language).
>>>> Now, many years later, I've finally found some time to create it. I've
>>>> implemented a little library in Haskell, and as a sneak preview I've
>>>> taken the interception problem from NCPC 2016 as referenced to in [1]
>>>> and attached it. The first question is: where shall I put the sources to
>>>> the library? Somewhere in the checktestdata repository (and if so,
>>>> where), or maybe create a checktestdata2 repository?
>>> I'd say just put them in the checktestdata repository, simply in the
>>> root. We can always move stuff later.
>> Done, in a haskell_edsl subdirectory to avoid cluttering of the current
>> code a bit.
>>
>>> Btw, this reminds me of code that Gerben Stavenga once wrote: he also
>>> wrote a reimplementation of checktestdata in Haskell, and it was
>>> similarly short,
>> Would his implementation still be available somewhere? I'd be curious to
>> see it.
> I think I still have a copy lying around, but on my desktop that's in
> storage in the Netherlands... I'll send Gerben an email, but I doubt he
> still has it.
>
>>> but also a preliminary version lacking good error
>>> handling/reporting.
>> The version I've commited now does have proper error reporting, similar
>> to the original
>> checktestdata tool. Of course it may need some polishing here and there,
>> but the basis
>> is there and suffices for most cases I think.
>>> I think this can be useful, especially for specific problems where
>>> either speed or extensibility of the syntax is needed. On the other
>>> hand, I think that the requirement of having GHC around is quite heavy,
>>> while a C++ compiler with boost is more standard, even on Windows I'd think?
>> What is your definition of standard? In number of people that have them
>> installed: honestly no clue, could be that boost is far more popular. I
>> have only installed boost in order to build checktestdata, and this has
>> always taken me several hours. The long list of instructions [2] make it
>> clear that getting boost to work on Windows is far from trivial, while
>> Haskell platform has a native binary installer for all popular platforms
>> and works pretty much out of the box. Yes, it is a large package, but
>> disk space is not so much an issue nowadays, and as a Haskell programmer
>> (ouch, I start to sound like a Haskell evangelist) I  think it's worth
>> the extra bytes on disk, as I really believe this can improve the
>> checktestdata tool.
> Ok, I wasn't aware that Boost was so tricky to install on Windows, on
> Debian it's just 'apt-get install libboost-dev' ;-)
>
>> Btw, my main point is that I think we should create an EDSL. Because of
>> my background and because of its features I've chosen Haskell, but we
>> could have look at other languages. Python crossed my mind but I'm
>> afraid the syntax will be ugly and it won't be such a nice embedding as
>> I've created now.
> I like the idea of a EDSL and I think that Haskell is quite good to use
> in terms of syntax. I'm just wondering whether that benefit outweighs
> the disadvantage that now to run it, you actually have to compile your
> specific testcase description EDSL code instead of parsing the current
> script with a precompiled binary.
>
>>> Do you think it would be easy to automatically convert the original
>>> checktestdata syntax scripts into this EDSL to then compile them? That
>>> way, we could support both syntaxes and users would be able to choose
>>> which method to use.
>> I guess this is pretty as hard as parsing the old syntax directly. The
>> main challenge is the variable binding handling, your suggestion solves
>> that for simple variables, but for array-indices we still need to take
>> extra care. I'd also see this as a form of backwards compatibility, and
>> not as a choice for the user, as this will never fully use the
>> capabilities of the EDSL, and then one could as well just use the old
>> version. The whole point is that you can do more with this EDSL (and of
>> course speed improvement and easy maintainability are nice side-effects
>> of that). I'll think about this a bit more.
> My point is that I'd rather not dump the current syntax specification,
> so therefore I think backward compatibility is an important feature.
> Secondly, I think the current syntax is I think easier to start using
> than your EDSL, because then you need some knowledge of the underlying
> language.
>
> But it might be best to try to keep both around for a bit to compare
> their advantages.
>
> Jaap
>
>
>> [2] http://www.boost.org/doc/libs/1_62_0/more/getting_started/windows.html
>>
>>>> Let me give a little more information on why I think an EDSL is a good
>>>> idea here, and why I choose for Haskell (other than my background):
>>>>
>>>> + With an EDSL, you get so much for free from the underlying language,
>>>> such as (typed) variables, all sorts of abstraction mechanisms, compiler
>>>> optimizations, etc. For example, some testdata script may consist of
>>>> some repeated parts, which in an EDSL can simply be given a name to get
>>>> rid of the duplication.
>>>> + Compiler optimizations, which of course also exist for C++ in which
>>>> checktestdata is built now, don't only optimize the checktestdata
>>>> runtime, but also the script itself since it is just a program and not
>>>> some custom script anymore. Especially with a functional language such
>>>> as Haskell this means definitions of primitives can be inlined and whole
>>>> parts of the script may get fused to a very efficient version.
>>>> + Maintainability: my library is now +- 200 LOC, including docs and
>>>> empty lines, and contains quite a lot of the checktestdata language
>>>> already. Because Haskell has many well optimized parsing libraries, we
>>>> get a lot for free, and only have to create a nice generalization for
>>>> this particular use case.
>>>> + Efficiency: the current checktestdata is just too slow, as mentioned
>>>> in [1]. As a reference, on my machine the checktestdata script for
>>>> interception 09.max_ans.in takes 35.5s, the python script takes 2.2s,
>>>> and the attached version only 2s. And I haven't done anything to
>>>> optimize specifically for this case, I've just translated the ctd script
>>>> to the syntax of my EDSL.
>>>> + With this, the script can do more than just checking the syntax. I.e.
>>>> when the problem specifies that it contains a connected graph, one can
>>>> check the syntax, but also the connectedness of the graph, while still
>>>> separating concerns, i.e. the syntax of the input can be separated from
>>>> the semantic checks, but the semantic checks don't need to come in a
>>>> separate program which also has to read the input again, like we should
>>>> now. All can be combined naturally.
>>>> + Create your own combinators: the primitives can be easily combined
>>>> into more complex combinators which you can then reuse. I.e. think of a
>>>> 'point2d' combinator that reads two space separated integers.
>>>>
>>>> - Haskell Platform needs to be available, this is easy on all major OS,
>>>> but is a quite large dependency. Then again, GCC and Boost are large as
>>>> well, and are less Windows-friendly.
>>>> - Slighly different syntax: although Haskell is quite liberal in its
>>>> syntax for EDSL's, users have to get used to particularities of the
>>>> language/
>>>> - Data generation is not easily possible, if not infeasible. There
>>>> should be some way of embedding the language to make this possible, and
>>>> this is actually what I've been working on for the past few years when
>>>> thinking about this problem, but since we mostly use it for checking
>>>> data I went with a monadic embedding which is very easy for data checking.
>>>> - Backwards compatibility: parsing existing scripts and running it with
>>>> the new backend is not going to be easy. I don't expect show stoppers,
>>>> but it definitely won't be as efficient as writing scripts within the
>>>> language because of variable bindings. I need to work on this a bit to
>>>> be sure though.
>>>>
>>>> That's it, of course my libary is not done yet (no good error messages,
>>>> no regex yet), but it's a proof of concept that is already usable for
>>>> simpler problems. I hope to convince you that this is the way to go, and
>>>> to get some ideas to create the POC into a working tool. Also,
>>>> suggestions on 'interesting use cases' are welcome, the interception is
>>>> a nice one but I am sure there are many more problems for which the
>>>> standard checktestdata just not works.
>>>>
>>>> Yours truly,
>>>> Jeroen
>>>>
>>>> [1] https://github.com/DOMjudge/checktestdata/issues/3
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> DOMjudge-devel mailing list
>>>> DOMjudge-devel@domjudge.org
>>>> https://www.domjudge.org/mailman/listinfo/domjudge-devel
>>>>
>>> _______________________________________________
>>> DOMjudge-devel mailing list
>>> DOMjudge-devel@domjudge.org
>>> https://www.domjudge.org/mailman/listinfo/domjudge-devel
>> _______________________________________________
>> DOMjudge-devel mailing list
>> DOMjudge-devel@domjudge.org
>> https://www.domjudge.org/mailman/listinfo/domjudge-devel
>>
> _______________________________________________
> DOMjudge-devel mailing list
> DOMjudge-devel@domjudge.org
> https://www.domjudge.org/mailman/listinfo/domjudge-devel
_______________________________________________
DOMjudge-devel mailing list
DOMjudge-devel@domjudge.org
https://www.domjudge.org/mailman/listinfo/domjudge-devel

DOMjudge-devel mailing list
DOMjudge-devel@domjudge.org
https://www.domjudge.org/mailman/listinfo/domjudge-devel

DOMjudge-devel mailing list
DOMjudge-devel@domjudge.org
https://www.domjudge.org/mailman/listinfo/domjudge-devel

DOMjudge-devel mailing list
DOMjudge-devel@domjudge.org
https://www.domjudge.org/mailman/listinfo/domjudge-devel

DOMjudge-devel mailing list
DOMjudge-devel@domjudge.org
https://www.domjudge.org/mailman/listinfo/domjudge-devel

DOMjudge-devel mailing list
DOMjudge-devel@domjudge.org
https://www.domjudge.org/mailman/listinfo/domjudge-devel

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

Re: Checktestdata as EDSL