Hi all,
Checktestdata is a nice DSL and has proved helpful in many situation, but from the moment I used it I was missing features and wondering why it wasn't implemented as an EDSL (Embedded Domain Specific Language). Now, many years later, I've finally found some time to create it. I've implemented a little library in Haskell, and as a sneak preview I've taken the interception problem from NCPC 2016 as referenced to in [1] and attached it. The first question is: where shall I put the sources to the library? Somewhere in the checktestdata repository (and if so, where), or maybe create a checktestdata2 repository?
Let me give a little more information on why I think an EDSL is a good idea here, and why I choose for Haskell (other than my background):
+ With an EDSL, you get so much for free from the underlying language, such as (typed) variables, all sorts of abstraction mechanisms, compiler optimizations, etc. For example, some testdata script may consist of some repeated parts, which in an EDSL can simply be given a name to get rid of the duplication. + Compiler optimizations, which of course also exist for C++ in which checktestdata is built now, don't only optimize the checktestdata runtime, but also the script itself since it is just a program and not some custom script anymore. Especially with a functional language such as Haskell this means definitions of primitives can be inlined and whole parts of the script may get fused to a very efficient version. + Maintainability: my library is now +- 200 LOC, including docs and empty lines, and contains quite a lot of the checktestdata language already. Because Haskell has many well optimized parsing libraries, we get a lot for free, and only have to create a nice generalization for this particular use case. + Efficiency: the current checktestdata is just too slow, as mentioned in [1]. As a reference, on my machine the checktestdata script for interception 09.max_ans.in takes 35.5s, the python script takes 2.2s, and the attached version only 2s. And I haven't done anything to optimize specifically for this case, I've just translated the ctd script to the syntax of my EDSL. + With this, the script can do more than just checking the syntax. I.e. when the problem specifies that it contains a connected graph, one can check the syntax, but also the connectedness of the graph, while still separating concerns, i.e. the syntax of the input can be separated from the semantic checks, but the semantic checks don't need to come in a separate program which also has to read the input again, like we should now. All can be combined naturally. + Create your own combinators: the primitives can be easily combined into more complex combinators which you can then reuse. I.e. think of a 'point2d' combinator that reads two space separated integers.
- Haskell Platform needs to be available, this is easy on all major OS, but is a quite large dependency. Then again, GCC and Boost are large as well, and are less Windows-friendly. - Slighly different syntax: although Haskell is quite liberal in its syntax for EDSL's, users have to get used to particularities of the language/ - Data generation is not easily possible, if not infeasible. There should be some way of embedding the language to make this possible, and this is actually what I've been working on for the past few years when thinking about this problem, but since we mostly use it for checking data I went with a monadic embedding which is very easy for data checking. - Backwards compatibility: parsing existing scripts and running it with the new backend is not going to be easy. I don't expect show stoppers, but it definitely won't be as efficient as writing scripts within the language because of variable bindings. I need to work on this a bit to be sure though.
That's it, of course my libary is not done yet (no good error messages, no regex yet), but it's a proof of concept that is already usable for simpler problems. I hope to convince you that this is the way to go, and to get some ideas to create the POC into a working tool. Also, suggestions on 'interesting use cases' are welcome, the interception is a nice one but I am sure there are many more problems for which the standard checktestdata just not works.
Yours truly, Jeroen
Hi Jeroen,
On 10-12-16 13:18, Jeroen Bransen wrote:
Hi all,
Checktestdata is a nice DSL and has proved helpful in many situation, but from the moment I used it I was missing features and wondering why it wasn't implemented as an EDSL (Embedded Domain Specific Language). Now, many years later, I've finally found some time to create it. I've implemented a little library in Haskell, and as a sneak preview I've taken the interception problem from NCPC 2016 as referenced to in [1] and attached it. The first question is: where shall I put the sources to the library? Somewhere in the checktestdata repository (and if so, where), or maybe create a checktestdata2 repository?
I'd say just put them in the checktestdata repository, simply in the root. We can always move stuff later.
Btw, this reminds me of code that Gerben Stavenga once wrote: he also wrote a reimplementation of checktestdata in Haskell, and it was similarly short, but also a preliminary version lacking good error handling/reporting.
I think this can be useful, especially for specific problems where either speed or extensibility of the syntax is needed. On the other hand, I think that the requirement of having GHC around is quite heavy, while a C++ compiler with boost is more standard, even on Windows I'd think?
Do you think it would be easy to automatically convert the original checktestdata syntax scripts into this EDSL to then compile them? That way, we could support both syntaxes and users would be able to choose which method to use.
Jaap
Let me give a little more information on why I think an EDSL is a good idea here, and why I choose for Haskell (other than my background):
- With an EDSL, you get so much for free from the underlying language,
such as (typed) variables, all sorts of abstraction mechanisms, compiler optimizations, etc. For example, some testdata script may consist of some repeated parts, which in an EDSL can simply be given a name to get rid of the duplication.
- Compiler optimizations, which of course also exist for C++ in which
checktestdata is built now, don't only optimize the checktestdata runtime, but also the script itself since it is just a program and not some custom script anymore. Especially with a functional language such as Haskell this means definitions of primitives can be inlined and whole parts of the script may get fused to a very efficient version.
- Maintainability: my library is now +- 200 LOC, including docs and
empty lines, and contains quite a lot of the checktestdata language already. Because Haskell has many well optimized parsing libraries, we get a lot for free, and only have to create a nice generalization for this particular use case.
- Efficiency: the current checktestdata is just too slow, as mentioned
in [1]. As a reference, on my machine the checktestdata script for interception 09.max_ans.in takes 35.5s, the python script takes 2.2s, and the attached version only 2s. And I haven't done anything to optimize specifically for this case, I've just translated the ctd script to the syntax of my EDSL.
- With this, the script can do more than just checking the syntax. I.e.
when the problem specifies that it contains a connected graph, one can check the syntax, but also the connectedness of the graph, while still separating concerns, i.e. the syntax of the input can be separated from the semantic checks, but the semantic checks don't need to come in a separate program which also has to read the input again, like we should now. All can be combined naturally.
- Create your own combinators: the primitives can be easily combined
into more complex combinators which you can then reuse. I.e. think of a 'point2d' combinator that reads two space separated integers.
- Haskell Platform needs to be available, this is easy on all major OS,
but is a quite large dependency. Then again, GCC and Boost are large as well, and are less Windows-friendly.
- Slighly different syntax: although Haskell is quite liberal in its
syntax for EDSL's, users have to get used to particularities of the language/
- Data generation is not easily possible, if not infeasible. There
should be some way of embedding the language to make this possible, and this is actually what I've been working on for the past few years when thinking about this problem, but since we mostly use it for checking data I went with a monadic embedding which is very easy for data checking.
- Backwards compatibility: parsing existing scripts and running it with
the new backend is not going to be easy. I don't expect show stoppers, but it definitely won't be as efficient as writing scripts within the language because of variable bindings. I need to work on this a bit to be sure though.
That's it, of course my libary is not done yet (no good error messages, no regex yet), but it's a proof of concept that is already usable for simpler problems. I hope to convince you that this is the way to go, and to get some ideas to create the POC into a working tool. Also, suggestions on 'interesting use cases' are welcome, the interception is a nice one but I am sure there are many more problems for which the standard checktestdata just not works.
Yours truly, Jeroen
[1] https://github.com/DOMjudge/checktestdata/issues/3
DOMjudge-devel mailing list DOMjudge-devel@domjudge.org https://www.domjudge.org/mailman/listinfo/domjudge-devel
Hi Jaap,
Op 11-12-2016 om 11:42 schreef Jaap Eldering:
Hi Jeroen,
On 10-12-16 13:18, Jeroen Bransen wrote:
Hi all,
Checktestdata is a nice DSL and has proved helpful in many situation, but from the moment I used it I was missing features and wondering why it wasn't implemented as an EDSL (Embedded Domain Specific Language). Now, many years later, I've finally found some time to create it. I've implemented a little library in Haskell, and as a sneak preview I've taken the interception problem from NCPC 2016 as referenced to in [1] and attached it. The first question is: where shall I put the sources to the library? Somewhere in the checktestdata repository (and if so, where), or maybe create a checktestdata2 repository?
I'd say just put them in the checktestdata repository, simply in the root. We can always move stuff later.
Done, in a haskell_edsl subdirectory to avoid cluttering of the current code a bit.
Btw, this reminds me of code that Gerben Stavenga once wrote: he also wrote a reimplementation of checktestdata in Haskell, and it was similarly short,
Would his implementation still be available somewhere? I'd be curious to see it.
but also a preliminary version lacking good error handling/reporting.
The version I've commited now does have proper error reporting, similar to the original checktestdata tool. Of course it may need some polishing here and there, but the basis is there and suffices for most cases I think.
I think this can be useful, especially for specific problems where either speed or extensibility of the syntax is needed. On the other hand, I think that the requirement of having GHC around is quite heavy, while a C++ compiler with boost is more standard, even on Windows I'd think?
What is your definition of standard? In number of people that have them installed: honestly no clue, could be that boost is far more popular. I have only installed boost in order to build checktestdata, and this has always taken me several hours. The long list of instructions [2] make it clear that getting boost to work on Windows is far from trivial, while Haskell platform has a native binary installer for all popular platforms and works pretty much out of the box. Yes, it is a large package, but disk space is not so much an issue nowadays, and as a Haskell programmer (ouch, I start to sound like a Haskell evangelist) I think it's worth the extra bytes on disk, as I really believe this can improve the checktestdata tool.
Btw, my main point is that I think we should create an EDSL. Because of my background and because of its features I've chosen Haskell, but we could have look at other languages. Python crossed my mind but I'm afraid the syntax will be ugly and it won't be such a nice embedding as I've created now.
Do you think it would be easy to automatically convert the original checktestdata syntax scripts into this EDSL to then compile them? That way, we could support both syntaxes and users would be able to choose which method to use.
I guess this is pretty as hard as parsing the old syntax directly. The main challenge is the variable binding handling, your suggestion solves that for simple variables, but for array-indices we still need to take extra care. I'd also see this as a form of backwards compatibility, and not as a choice for the user, as this will never fully use the capabilities of the EDSL, and then one could as well just use the old version. The whole point is that you can do more with this EDSL (and of course speed improvement and easy maintainability are nice side-effects of that). I'll think about this a bit more.
Jaap
Jeroen
[2] http://www.boost.org/doc/libs/1_62_0/more/getting_started/windows.html
Let me give a little more information on why I think an EDSL is a good idea here, and why I choose for Haskell (other than my background):
- With an EDSL, you get so much for free from the underlying language,
such as (typed) variables, all sorts of abstraction mechanisms, compiler optimizations, etc. For example, some testdata script may consist of some repeated parts, which in an EDSL can simply be given a name to get rid of the duplication.
- Compiler optimizations, which of course also exist for C++ in which
checktestdata is built now, don't only optimize the checktestdata runtime, but also the script itself since it is just a program and not some custom script anymore. Especially with a functional language such as Haskell this means definitions of primitives can be inlined and whole parts of the script may get fused to a very efficient version.
- Maintainability: my library is now +- 200 LOC, including docs and
empty lines, and contains quite a lot of the checktestdata language already. Because Haskell has many well optimized parsing libraries, we get a lot for free, and only have to create a nice generalization for this particular use case.
- Efficiency: the current checktestdata is just too slow, as mentioned
in [1]. As a reference, on my machine the checktestdata script for interception 09.max_ans.in takes 35.5s, the python script takes 2.2s, and the attached version only 2s. And I haven't done anything to optimize specifically for this case, I've just translated the ctd script to the syntax of my EDSL.
- With this, the script can do more than just checking the syntax. I.e.
when the problem specifies that it contains a connected graph, one can check the syntax, but also the connectedness of the graph, while still separating concerns, i.e. the syntax of the input can be separated from the semantic checks, but the semantic checks don't need to come in a separate program which also has to read the input again, like we should now. All can be combined naturally.
- Create your own combinators: the primitives can be easily combined
into more complex combinators which you can then reuse. I.e. think of a 'point2d' combinator that reads two space separated integers.
- Haskell Platform needs to be available, this is easy on all major OS,
but is a quite large dependency. Then again, GCC and Boost are large as well, and are less Windows-friendly.
- Slighly different syntax: although Haskell is quite liberal in its
syntax for EDSL's, users have to get used to particularities of the language/
- Data generation is not easily possible, if not infeasible. There
should be some way of embedding the language to make this possible, and this is actually what I've been working on for the past few years when thinking about this problem, but since we mostly use it for checking data I went with a monadic embedding which is very easy for data checking.
- Backwards compatibility: parsing existing scripts and running it with
the new backend is not going to be easy. I don't expect show stoppers, but it definitely won't be as efficient as writing scripts within the language because of variable bindings. I need to work on this a bit to be sure though.
That's it, of course my libary is not done yet (no good error messages, no regex yet), but it's a proof of concept that is already usable for simpler problems. I hope to convince you that this is the way to go, and to get some ideas to create the POC into a working tool. Also, suggestions on 'interesting use cases' are welcome, the interception is a nice one but I am sure there are many more problems for which the standard checktestdata just not works.
Yours truly, Jeroen
[1] https://github.com/DOMjudge/checktestdata/issues/3
DOMjudge-devel mailing list DOMjudge-devel@domjudge.org https://www.domjudge.org/mailman/listinfo/domjudge-devel
DOMjudge-devel mailing list DOMjudge-devel@domjudge.org https://www.domjudge.org/mailman/listinfo/domjudge-devel
On 13-12-16 12:27, Jeroen Bransen wrote:
Hi Jaap,
Op 11-12-2016 om 11:42 schreef Jaap Eldering:
Hi Jeroen,
On 10-12-16 13:18, Jeroen Bransen wrote:
Hi all,
Checktestdata is a nice DSL and has proved helpful in many situation, but from the moment I used it I was missing features and wondering why it wasn't implemented as an EDSL (Embedded Domain Specific Language). Now, many years later, I've finally found some time to create it. I've implemented a little library in Haskell, and as a sneak preview I've taken the interception problem from NCPC 2016 as referenced to in [1] and attached it. The first question is: where shall I put the sources to the library? Somewhere in the checktestdata repository (and if so, where), or maybe create a checktestdata2 repository?
I'd say just put them in the checktestdata repository, simply in the root. We can always move stuff later.
Done, in a haskell_edsl subdirectory to avoid cluttering of the current code a bit.
Btw, this reminds me of code that Gerben Stavenga once wrote: he also wrote a reimplementation of checktestdata in Haskell, and it was similarly short,
Would his implementation still be available somewhere? I'd be curious to see it.
I think I still have a copy lying around, but on my desktop that's in storage in the Netherlands... I'll send Gerben an email, but I doubt he still has it.
but also a preliminary version lacking good error handling/reporting.
The version I've commited now does have proper error reporting, similar to the original checktestdata tool. Of course it may need some polishing here and there, but the basis is there and suffices for most cases I think.
I think this can be useful, especially for specific problems where either speed or extensibility of the syntax is needed. On the other hand, I think that the requirement of having GHC around is quite heavy, while a C++ compiler with boost is more standard, even on Windows I'd think?
What is your definition of standard? In number of people that have them installed: honestly no clue, could be that boost is far more popular. I have only installed boost in order to build checktestdata, and this has always taken me several hours. The long list of instructions [2] make it clear that getting boost to work on Windows is far from trivial, while Haskell platform has a native binary installer for all popular platforms and works pretty much out of the box. Yes, it is a large package, but disk space is not so much an issue nowadays, and as a Haskell programmer (ouch, I start to sound like a Haskell evangelist) I think it's worth the extra bytes on disk, as I really believe this can improve the checktestdata tool.
Ok, I wasn't aware that Boost was so tricky to install on Windows, on Debian it's just 'apt-get install libboost-dev' ;-)
Btw, my main point is that I think we should create an EDSL. Because of my background and because of its features I've chosen Haskell, but we could have look at other languages. Python crossed my mind but I'm afraid the syntax will be ugly and it won't be such a nice embedding as I've created now.
I like the idea of a EDSL and I think that Haskell is quite good to use in terms of syntax. I'm just wondering whether that benefit outweighs the disadvantage that now to run it, you actually have to compile your specific testcase description EDSL code instead of parsing the current script with a precompiled binary.
Do you think it would be easy to automatically convert the original checktestdata syntax scripts into this EDSL to then compile them? That way, we could support both syntaxes and users would be able to choose which method to use.
I guess this is pretty as hard as parsing the old syntax directly. The main challenge is the variable binding handling, your suggestion solves that for simple variables, but for array-indices we still need to take extra care. I'd also see this as a form of backwards compatibility, and not as a choice for the user, as this will never fully use the capabilities of the EDSL, and then one could as well just use the old version. The whole point is that you can do more with this EDSL (and of course speed improvement and easy maintainability are nice side-effects of that). I'll think about this a bit more.
My point is that I'd rather not dump the current syntax specification, so therefore I think backward compatibility is an important feature. Secondly, I think the current syntax is I think easier to start using than your EDSL, because then you need some knowledge of the underlying language.
But it might be best to try to keep both around for a bit to compare their advantages.
Jaap
[2] http://www.boost.org/doc/libs/1_62_0/more/getting_started/windows.html
Let me give a little more information on why I think an EDSL is a good idea here, and why I choose for Haskell (other than my background):
- With an EDSL, you get so much for free from the underlying language,
such as (typed) variables, all sorts of abstraction mechanisms, compiler optimizations, etc. For example, some testdata script may consist of some repeated parts, which in an EDSL can simply be given a name to get rid of the duplication.
- Compiler optimizations, which of course also exist for C++ in which
checktestdata is built now, don't only optimize the checktestdata runtime, but also the script itself since it is just a program and not some custom script anymore. Especially with a functional language such as Haskell this means definitions of primitives can be inlined and whole parts of the script may get fused to a very efficient version.
- Maintainability: my library is now +- 200 LOC, including docs and
empty lines, and contains quite a lot of the checktestdata language already. Because Haskell has many well optimized parsing libraries, we get a lot for free, and only have to create a nice generalization for this particular use case.
- Efficiency: the current checktestdata is just too slow, as mentioned
in [1]. As a reference, on my machine the checktestdata script for interception 09.max_ans.in takes 35.5s, the python script takes 2.2s, and the attached version only 2s. And I haven't done anything to optimize specifically for this case, I've just translated the ctd script to the syntax of my EDSL.
- With this, the script can do more than just checking the syntax. I.e.
when the problem specifies that it contains a connected graph, one can check the syntax, but also the connectedness of the graph, while still separating concerns, i.e. the syntax of the input can be separated from the semantic checks, but the semantic checks don't need to come in a separate program which also has to read the input again, like we should now. All can be combined naturally.
- Create your own combinators: the primitives can be easily combined
into more complex combinators which you can then reuse. I.e. think of a 'point2d' combinator that reads two space separated integers.
- Haskell Platform needs to be available, this is easy on all major OS,
but is a quite large dependency. Then again, GCC and Boost are large as well, and are less Windows-friendly.
- Slighly different syntax: although Haskell is quite liberal in its
syntax for EDSL's, users have to get used to particularities of the language/
- Data generation is not easily possible, if not infeasible. There
should be some way of embedding the language to make this possible, and this is actually what I've been working on for the past few years when thinking about this problem, but since we mostly use it for checking data I went with a monadic embedding which is very easy for data checking.
- Backwards compatibility: parsing existing scripts and running it with
the new backend is not going to be easy. I don't expect show stoppers, but it definitely won't be as efficient as writing scripts within the language because of variable bindings. I need to work on this a bit to be sure though.
That's it, of course my libary is not done yet (no good error messages, no regex yet), but it's a proof of concept that is already usable for simpler problems. I hope to convince you that this is the way to go, and to get some ideas to create the POC into a working tool. Also, suggestions on 'interesting use cases' are welcome, the interception is a nice one but I am sure there are many more problems for which the standard checktestdata just not works.
Yours truly, Jeroen
[1] https://github.com/DOMjudge/checktestdata/issues/3
DOMjudge-devel mailing list DOMjudge-devel@domjudge.org https://www.domjudge.org/mailman/listinfo/domjudge-devel
DOMjudge-devel mailing list DOMjudge-devel@domjudge.org https://www.domjudge.org/mailman/listinfo/domjudge-devel
DOMjudge-devel mailing list DOMjudge-devel@domjudge.org https://www.domjudge.org/mailman/listinfo/domjudge-devel
Hi all,
It's been a while, but I've finally competed the Haskell EDSL together with an executable that provides backwards compatibility with the custom language. It passes all tests, except one which I think is incorrect. One todo left is to provide examples in my actual EDSL, which I think is much more interesting than the backwards compatibility. However, some simple tests suggest that this implementation may also be more efficient than the 'old' version. Features that are missing are:
- The -w option to ignore whitespace differences. This should be quite trivial actually, I just haven't done this yet - The generation of data. This is harder, although a solution similar to the existing implementation is probably not too difficult. I still believe this EDSL could benefit good testcase generation though.
The test that fails is testprog25.in, which appears to contain an invalid (POSIX) regex. The '{' character is escaped, which is not allowed according to this [1] source (which is not reliable, but I couldn't find a better one). It seems likely that the fact that the current tests passes is due to a bug in some POSIX regex library, see [2]. I suppose that the escaping of '{' was a not on purpuse but just a mistake, can you confirm this Jaap?
Regards, Jeroen
[1] http://stackoverflow.com/questions/399078/what-special-characters-must-be-es... [2] https://wiki.haskell.org/Regex_Posix
Op 13-12-2016 om 19:29 schreef Jaap Eldering:
On 13-12-16 12:27, Jeroen Bransen wrote:
Hi Jaap,
Op 11-12-2016 om 11:42 schreef Jaap Eldering:
Hi Jeroen,
On 10-12-16 13:18, Jeroen Bransen wrote:
Hi all,
Checktestdata is a nice DSL and has proved helpful in many situation, but from the moment I used it I was missing features and wondering why it wasn't implemented as an EDSL (Embedded Domain Specific Language). Now, many years later, I've finally found some time to create it. I've implemented a little library in Haskell, and as a sneak preview I've taken the interception problem from NCPC 2016 as referenced to in [1] and attached it. The first question is: where shall I put the sources to the library? Somewhere in the checktestdata repository (and if so, where), or maybe create a checktestdata2 repository?
I'd say just put them in the checktestdata repository, simply in the root. We can always move stuff later.
Done, in a haskell_edsl subdirectory to avoid cluttering of the current code a bit.
Btw, this reminds me of code that Gerben Stavenga once wrote: he also wrote a reimplementation of checktestdata in Haskell, and it was similarly short,
Would his implementation still be available somewhere? I'd be curious to see it.
I think I still have a copy lying around, but on my desktop that's in storage in the Netherlands... I'll send Gerben an email, but I doubt he still has it.
but also a preliminary version lacking good error handling/reporting.
The version I've commited now does have proper error reporting, similar to the original checktestdata tool. Of course it may need some polishing here and there, but the basis is there and suffices for most cases I think.
I think this can be useful, especially for specific problems where either speed or extensibility of the syntax is needed. On the other hand, I think that the requirement of having GHC around is quite heavy, while a C++ compiler with boost is more standard, even on Windows I'd think?
What is your definition of standard? In number of people that have them installed: honestly no clue, could be that boost is far more popular. I have only installed boost in order to build checktestdata, and this has always taken me several hours. The long list of instructions [2] make it clear that getting boost to work on Windows is far from trivial, while Haskell platform has a native binary installer for all popular platforms and works pretty much out of the box. Yes, it is a large package, but disk space is not so much an issue nowadays, and as a Haskell programmer (ouch, I start to sound like a Haskell evangelist) I think it's worth the extra bytes on disk, as I really believe this can improve the checktestdata tool.
Ok, I wasn't aware that Boost was so tricky to install on Windows, on Debian it's just 'apt-get install libboost-dev' ;-)
Btw, my main point is that I think we should create an EDSL. Because of my background and because of its features I've chosen Haskell, but we could have look at other languages. Python crossed my mind but I'm afraid the syntax will be ugly and it won't be such a nice embedding as I've created now.
I like the idea of a EDSL and I think that Haskell is quite good to use in terms of syntax. I'm just wondering whether that benefit outweighs the disadvantage that now to run it, you actually have to compile your specific testcase description EDSL code instead of parsing the current script with a precompiled binary.
Do you think it would be easy to automatically convert the original checktestdata syntax scripts into this EDSL to then compile them? That way, we could support both syntaxes and users would be able to choose which method to use.
I guess this is pretty as hard as parsing the old syntax directly. The main challenge is the variable binding handling, your suggestion solves that for simple variables, but for array-indices we still need to take extra care. I'd also see this as a form of backwards compatibility, and not as a choice for the user, as this will never fully use the capabilities of the EDSL, and then one could as well just use the old version. The whole point is that you can do more with this EDSL (and of course speed improvement and easy maintainability are nice side-effects of that). I'll think about this a bit more.
My point is that I'd rather not dump the current syntax specification, so therefore I think backward compatibility is an important feature. Secondly, I think the current syntax is I think easier to start using than your EDSL, because then you need some knowledge of the underlying language.
But it might be best to try to keep both around for a bit to compare their advantages.
Jaap
[2] http://www.boost.org/doc/libs/1_62_0/more/getting_started/windows.html
Let me give a little more information on why I think an EDSL is a good idea here, and why I choose for Haskell (other than my background):
- With an EDSL, you get so much for free from the underlying language,
such as (typed) variables, all sorts of abstraction mechanisms, compiler optimizations, etc. For example, some testdata script may consist of some repeated parts, which in an EDSL can simply be given a name to get rid of the duplication.
- Compiler optimizations, which of course also exist for C++ in which
checktestdata is built now, don't only optimize the checktestdata runtime, but also the script itself since it is just a program and not some custom script anymore. Especially with a functional language such as Haskell this means definitions of primitives can be inlined and whole parts of the script may get fused to a very efficient version.
- Maintainability: my library is now +- 200 LOC, including docs and
empty lines, and contains quite a lot of the checktestdata language already. Because Haskell has many well optimized parsing libraries, we get a lot for free, and only have to create a nice generalization for this particular use case.
- Efficiency: the current checktestdata is just too slow, as mentioned
in [1]. As a reference, on my machine the checktestdata script for interception 09.max_ans.in takes 35.5s, the python script takes 2.2s, and the attached version only 2s. And I haven't done anything to optimize specifically for this case, I've just translated the ctd script to the syntax of my EDSL.
- With this, the script can do more than just checking the syntax. I.e.
when the problem specifies that it contains a connected graph, one can check the syntax, but also the connectedness of the graph, while still separating concerns, i.e. the syntax of the input can be separated from the semantic checks, but the semantic checks don't need to come in a separate program which also has to read the input again, like we should now. All can be combined naturally.
- Create your own combinators: the primitives can be easily combined
into more complex combinators which you can then reuse. I.e. think of a 'point2d' combinator that reads two space separated integers.
- Haskell Platform needs to be available, this is easy on all major OS,
but is a quite large dependency. Then again, GCC and Boost are large as well, and are less Windows-friendly.
- Slighly different syntax: although Haskell is quite liberal in its
syntax for EDSL's, users have to get used to particularities of the language/
- Data generation is not easily possible, if not infeasible. There
should be some way of embedding the language to make this possible, and this is actually what I've been working on for the past few years when thinking about this problem, but since we mostly use it for checking data I went with a monadic embedding which is very easy for data checking.
- Backwards compatibility: parsing existing scripts and running it with
the new backend is not going to be easy. I don't expect show stoppers, but it definitely won't be as efficient as writing scripts within the language because of variable bindings. I need to work on this a bit to be sure though.
That's it, of course my libary is not done yet (no good error messages, no regex yet), but it's a proof of concept that is already usable for simpler problems. I hope to convince you that this is the way to go, and to get some ideas to create the POC into a working tool. Also, suggestions on 'interesting use cases' are welcome, the interception is a nice one but I am sure there are many more problems for which the standard checktestdata just not works.
Yours truly, Jeroen
[1] https://github.com/DOMjudge/checktestdata/issues/3
DOMjudge-devel mailing list DOMjudge-devel@domjudge.org https://www.domjudge.org/mailman/listinfo/domjudge-devel
DOMjudge-devel mailing list DOMjudge-devel@domjudge.org https://www.domjudge.org/mailman/listinfo/domjudge-devel
DOMjudge-devel mailing list DOMjudge-devel@domjudge.org https://www.domjudge.org/mailman/listinfo/domjudge-devel
DOMjudge-devel mailing list DOMjudge-devel@domjudge.org https://www.domjudge.org/mailman/listinfo/domjudge-devel
Hi Jeroen,
Thanks for the work on this!
On 08-05-17 12:28, Jeroen Bransen wrote:
Hi all,
It's been a while, but I've finally competed the Haskell EDSL together with an executable that provides backwards compatibility with the custom language. It passes all tests, except one which I think is incorrect. One todo left is to provide examples in my actual EDSL, which I think is much more interesting than the backwards compatibility. However, some simple tests suggest that this implementation may also be more efficient than the 'old' version.
If it turns out to be faster, then this would be very nice! And some examples would also be welcome. Certainly to me, and I suppose to others who would be interested in using it, but are not experienced with Haskell.
Features that are missing are:
- The -w option to ignore whitespace differences. This should be quite
trivial actually, I just haven't done this yet
- The generation of data. This is harder, although a solution similar to
the existing implementation is probably not too difficult. I still believe this EDSL could benefit good testcase generation though.
I think both would be nice to have, but not crucial.
The test that fails is testprog25.in, which appears to contain an invalid (POSIX) regex. The '{' character is escaped, which is not allowed according to this [1] source (which is not reliable, but I couldn't find a better one). It seems likely that the fact that the current tests passes is due to a bug in some POSIX regex library, see [2]. I suppose that the escaping of '{' was a not on purpuse but just a mistake, can you confirm this Jaap?
I think instead that this is due to the escaping being interpreted twice, since also the '\\' actually is supposed to encode one literal backslash. I will have a look.
Best, Jaap
[1] http://stackoverflow.com/questions/399078/what-special-characters-must-be-es... [2] https://wiki.haskell.org/Regex_Posix
Op 13-12-2016 om 19:29 schreef Jaap Eldering:
On 13-12-16 12:27, Jeroen Bransen wrote:
Hi Jaap,
Op 11-12-2016 om 11:42 schreef Jaap Eldering:
Hi Jeroen,
On 10-12-16 13:18, Jeroen Bransen wrote:
Hi all,
Checktestdata is a nice DSL and has proved helpful in many situation, but from the moment I used it I was missing features and wondering why it wasn't implemented as an EDSL (Embedded Domain Specific Language). Now, many years later, I've finally found some time to create it. I've implemented a little library in Haskell, and as a sneak preview I've taken the interception problem from NCPC 2016 as referenced to in [1] and attached it. The first question is: where shall I put the sources to the library? Somewhere in the checktestdata repository (and if so, where), or maybe create a checktestdata2 repository?
I'd say just put them in the checktestdata repository, simply in the root. We can always move stuff later.
Done, in a haskell_edsl subdirectory to avoid cluttering of the current code a bit.
Btw, this reminds me of code that Gerben Stavenga once wrote: he also wrote a reimplementation of checktestdata in Haskell, and it was similarly short,
Would his implementation still be available somewhere? I'd be curious to see it.
I think I still have a copy lying around, but on my desktop that's in storage in the Netherlands... I'll send Gerben an email, but I doubt he still has it.
but also a preliminary version lacking good error handling/reporting.
The version I've commited now does have proper error reporting, similar to the original checktestdata tool. Of course it may need some polishing here and there, but the basis is there and suffices for most cases I think.
I think this can be useful, especially for specific problems where either speed or extensibility of the syntax is needed. On the other hand, I think that the requirement of having GHC around is quite heavy, while a C++ compiler with boost is more standard, even on Windows I'd think?
What is your definition of standard? In number of people that have them installed: honestly no clue, could be that boost is far more popular. I have only installed boost in order to build checktestdata, and this has always taken me several hours. The long list of instructions [2] make it clear that getting boost to work on Windows is far from trivial, while Haskell platform has a native binary installer for all popular platforms and works pretty much out of the box. Yes, it is a large package, but disk space is not so much an issue nowadays, and as a Haskell programmer (ouch, I start to sound like a Haskell evangelist) I think it's worth the extra bytes on disk, as I really believe this can improve the checktestdata tool.
Ok, I wasn't aware that Boost was so tricky to install on Windows, on Debian it's just 'apt-get install libboost-dev' ;-)
Btw, my main point is that I think we should create an EDSL. Because of my background and because of its features I've chosen Haskell, but we could have look at other languages. Python crossed my mind but I'm afraid the syntax will be ugly and it won't be such a nice embedding as I've created now.
I like the idea of a EDSL and I think that Haskell is quite good to use in terms of syntax. I'm just wondering whether that benefit outweighs the disadvantage that now to run it, you actually have to compile your specific testcase description EDSL code instead of parsing the current script with a precompiled binary.
Do you think it would be easy to automatically convert the original checktestdata syntax scripts into this EDSL to then compile them? That way, we could support both syntaxes and users would be able to choose which method to use.
I guess this is pretty as hard as parsing the old syntax directly. The main challenge is the variable binding handling, your suggestion solves that for simple variables, but for array-indices we still need to take extra care. I'd also see this as a form of backwards compatibility, and not as a choice for the user, as this will never fully use the capabilities of the EDSL, and then one could as well just use the old version. The whole point is that you can do more with this EDSL (and of course speed improvement and easy maintainability are nice side-effects of that). I'll think about this a bit more.
My point is that I'd rather not dump the current syntax specification, so therefore I think backward compatibility is an important feature. Secondly, I think the current syntax is I think easier to start using than your EDSL, because then you need some knowledge of the underlying language.
But it might be best to try to keep both around for a bit to compare their advantages.
Jaap
[2] http://www.boost.org/doc/libs/1_62_0/more/getting_started/windows.html
Let me give a little more information on why I think an EDSL is a good idea here, and why I choose for Haskell (other than my background):
- With an EDSL, you get so much for free from the underlying language,
such as (typed) variables, all sorts of abstraction mechanisms, compiler optimizations, etc. For example, some testdata script may consist of some repeated parts, which in an EDSL can simply be given a name to get rid of the duplication.
- Compiler optimizations, which of course also exist for C++ in which
checktestdata is built now, don't only optimize the checktestdata runtime, but also the script itself since it is just a program and not some custom script anymore. Especially with a functional language such as Haskell this means definitions of primitives can be inlined and whole parts of the script may get fused to a very efficient version.
- Maintainability: my library is now +- 200 LOC, including docs and
empty lines, and contains quite a lot of the checktestdata language already. Because Haskell has many well optimized parsing libraries, we get a lot for free, and only have to create a nice generalization for this particular use case.
- Efficiency: the current checktestdata is just too slow, as mentioned
in [1]. As a reference, on my machine the checktestdata script for interception 09.max_ans.in takes 35.5s, the python script takes 2.2s, and the attached version only 2s. And I haven't done anything to optimize specifically for this case, I've just translated the ctd script to the syntax of my EDSL.
- With this, the script can do more than just checking the syntax. I.e.
when the problem specifies that it contains a connected graph, one can check the syntax, but also the connectedness of the graph, while still separating concerns, i.e. the syntax of the input can be separated from the semantic checks, but the semantic checks don't need to come in a separate program which also has to read the input again, like we should now. All can be combined naturally.
- Create your own combinators: the primitives can be easily combined
into more complex combinators which you can then reuse. I.e. think of a 'point2d' combinator that reads two space separated integers.
- Haskell Platform needs to be available, this is easy on all major OS,
but is a quite large dependency. Then again, GCC and Boost are large as well, and are less Windows-friendly.
- Slighly different syntax: although Haskell is quite liberal in its
syntax for EDSL's, users have to get used to particularities of the language/
- Data generation is not easily possible, if not infeasible. There
should be some way of embedding the language to make this possible, and this is actually what I've been working on for the past few years when thinking about this problem, but since we mostly use it for checking data I went with a monadic embedding which is very easy for data checking.
- Backwards compatibility: parsing existing scripts and running it with
the new backend is not going to be easy. I don't expect show stoppers, but it definitely won't be as efficient as writing scripts within the language because of variable bindings. I need to work on this a bit to be sure though.
That's it, of course my libary is not done yet (no good error messages, no regex yet), but it's a proof of concept that is already usable for simpler problems. I hope to convince you that this is the way to go, and to get some ideas to create the POC into a working tool. Also, suggestions on 'interesting use cases' are welcome, the interception is a nice one but I am sure there are many more problems for which the standard checktestdata just not works.
Yours truly, Jeroen
[1] https://github.com/DOMjudge/checktestdata/issues/3
DOMjudge-devel mailing list DOMjudge-devel@domjudge.org https://www.domjudge.org/mailman/listinfo/domjudge-devel
DOMjudge-devel mailing list DOMjudge-devel@domjudge.org https://www.domjudge.org/mailman/listinfo/domjudge-devel
DOMjudge-devel mailing list DOMjudge-devel@domjudge.org https://www.domjudge.org/mailman/listinfo/domjudge-devel
DOMjudge-devel mailing list DOMjudge-devel@domjudge.org https://www.domjudge.org/mailman/listinfo/domjudge-devel
DOMjudge-devel mailing list DOMjudge-devel@domjudge.org https://www.domjudge.org/mailman/listinfo/domjudge-devel
On 12-05-17 12:13, Jaap Eldering wrote:
On 08-05-17 12:28, Jeroen Bransen wrote:
The test that fails is testprog25.in, which appears to contain an invalid (POSIX) regex. The '{' character is escaped, which is not allowed according to this [1] source (which is not reliable, but I couldn't find a better one). It seems likely that the fact that the current tests passes is due to a bug in some POSIX regex library, see [2]. I suppose that the escaping of '{' was a not on purpuse but just a mistake, can you confirm this Jaap?
I think instead that this is due to the escaping being interpreted twice, since also the '\\' actually is supposed to encode one literal backslash. I will have a look.
Ok, I had a look and committed some fixes.
First, I found that the C++ implementation was implicitly using ECMAscript regexes, since that's the default in the STL. I fixed that to match the documentation. That does imply that a dot now matches newlines. I see no easy way to fix that. I also fixed the regex tests, and indeed the escapes inside character ranges had to be removed (both for the '.' and the '}'.
This did uncover two other issues:
The regex generator code in the C++ implementation does not treat '^' at the beginning of a character range as a special character, so it generates the wrong output on tests/testdata28.in.
The Haskell implementation fails on tests/testprog23.in with the message
!?$$^[a-z]\n ^ Error in line 2 character 23 EOF expected
It seems that the Haskell regex code does not match newlines on dots (which is consistent with the old documentation).
I'm also open to suggestions to change to a different regex standard.
Jaap
[1] http://stackoverflow.com/questions/399078/what-special-characters-must-be-es... [2] https://wiki.haskell.org/Regex_Posix
Op 13-12-2016 om 19:29 schreef Jaap Eldering:
On 13-12-16 12:27, Jeroen Bransen wrote:
Hi Jaap,
Op 11-12-2016 om 11:42 schreef Jaap Eldering:
Hi Jeroen,
On 10-12-16 13:18, Jeroen Bransen wrote:
Hi all,
Checktestdata is a nice DSL and has proved helpful in many situation, but from the moment I used it I was missing features and wondering why it wasn't implemented as an EDSL (Embedded Domain Specific Language). Now, many years later, I've finally found some time to create it. I've implemented a little library in Haskell, and as a sneak preview I've taken the interception problem from NCPC 2016 as referenced to in [1] and attached it. The first question is: where shall I put the sources to the library? Somewhere in the checktestdata repository (and if so, where), or maybe create a checktestdata2 repository?
I'd say just put them in the checktestdata repository, simply in the root. We can always move stuff later.
Done, in a haskell_edsl subdirectory to avoid cluttering of the current code a bit.
Btw, this reminds me of code that Gerben Stavenga once wrote: he also wrote a reimplementation of checktestdata in Haskell, and it was similarly short,
Would his implementation still be available somewhere? I'd be curious to see it.
I think I still have a copy lying around, but on my desktop that's in storage in the Netherlands... I'll send Gerben an email, but I doubt he still has it.
but also a preliminary version lacking good error handling/reporting.
The version I've commited now does have proper error reporting, similar to the original checktestdata tool. Of course it may need some polishing here and there, but the basis is there and suffices for most cases I think.
I think this can be useful, especially for specific problems where either speed or extensibility of the syntax is needed. On the other hand, I think that the requirement of having GHC around is quite heavy, while a C++ compiler with boost is more standard, even on Windows I'd think?
What is your definition of standard? In number of people that have them installed: honestly no clue, could be that boost is far more popular. I have only installed boost in order to build checktestdata, and this has always taken me several hours. The long list of instructions [2] make it clear that getting boost to work on Windows is far from trivial, while Haskell platform has a native binary installer for all popular platforms and works pretty much out of the box. Yes, it is a large package, but disk space is not so much an issue nowadays, and as a Haskell programmer (ouch, I start to sound like a Haskell evangelist) I think it's worth the extra bytes on disk, as I really believe this can improve the checktestdata tool.
Ok, I wasn't aware that Boost was so tricky to install on Windows, on Debian it's just 'apt-get install libboost-dev' ;-)
Btw, my main point is that I think we should create an EDSL. Because of my background and because of its features I've chosen Haskell, but we could have look at other languages. Python crossed my mind but I'm afraid the syntax will be ugly and it won't be such a nice embedding as I've created now.
I like the idea of a EDSL and I think that Haskell is quite good to use in terms of syntax. I'm just wondering whether that benefit outweighs the disadvantage that now to run it, you actually have to compile your specific testcase description EDSL code instead of parsing the current script with a precompiled binary.
Do you think it would be easy to automatically convert the original checktestdata syntax scripts into this EDSL to then compile them? That way, we could support both syntaxes and users would be able to choose which method to use.
I guess this is pretty as hard as parsing the old syntax directly. The main challenge is the variable binding handling, your suggestion solves that for simple variables, but for array-indices we still need to take extra care. I'd also see this as a form of backwards compatibility, and not as a choice for the user, as this will never fully use the capabilities of the EDSL, and then one could as well just use the old version. The whole point is that you can do more with this EDSL (and of course speed improvement and easy maintainability are nice side-effects of that). I'll think about this a bit more.
My point is that I'd rather not dump the current syntax specification, so therefore I think backward compatibility is an important feature. Secondly, I think the current syntax is I think easier to start using than your EDSL, because then you need some knowledge of the underlying language.
But it might be best to try to keep both around for a bit to compare their advantages.
Jaap
[2] http://www.boost.org/doc/libs/1_62_0/more/getting_started/windows.html
Let me give a little more information on why I think an EDSL is a good idea here, and why I choose for Haskell (other than my background):
- With an EDSL, you get so much for free from the underlying language,
such as (typed) variables, all sorts of abstraction mechanisms, compiler optimizations, etc. For example, some testdata script may consist of some repeated parts, which in an EDSL can simply be given a name to get rid of the duplication.
- Compiler optimizations, which of course also exist for C++ in which
checktestdata is built now, don't only optimize the checktestdata runtime, but also the script itself since it is just a program and not some custom script anymore. Especially with a functional language such as Haskell this means definitions of primitives can be inlined and whole parts of the script may get fused to a very efficient version.
- Maintainability: my library is now +- 200 LOC, including docs and
empty lines, and contains quite a lot of the checktestdata language already. Because Haskell has many well optimized parsing libraries, we get a lot for free, and only have to create a nice generalization for this particular use case.
- Efficiency: the current checktestdata is just too slow, as mentioned
in [1]. As a reference, on my machine the checktestdata script for interception 09.max_ans.in takes 35.5s, the python script takes 2.2s, and the attached version only 2s. And I haven't done anything to optimize specifically for this case, I've just translated the ctd script to the syntax of my EDSL.
- With this, the script can do more than just checking the syntax. I.e.
when the problem specifies that it contains a connected graph, one can check the syntax, but also the connectedness of the graph, while still separating concerns, i.e. the syntax of the input can be separated from the semantic checks, but the semantic checks don't need to come in a separate program which also has to read the input again, like we should now. All can be combined naturally.
- Create your own combinators: the primitives can be easily combined
into more complex combinators which you can then reuse. I.e. think of a 'point2d' combinator that reads two space separated integers.
- Haskell Platform needs to be available, this is easy on all major OS,
but is a quite large dependency. Then again, GCC and Boost are large as well, and are less Windows-friendly.
- Slighly different syntax: although Haskell is quite liberal in its
syntax for EDSL's, users have to get used to particularities of the language/
- Data generation is not easily possible, if not infeasible. There
should be some way of embedding the language to make this possible, and this is actually what I've been working on for the past few years when thinking about this problem, but since we mostly use it for checking data I went with a monadic embedding which is very easy for data checking.
- Backwards compatibility: parsing existing scripts and running it with
the new backend is not going to be easy. I don't expect show stoppers, but it definitely won't be as efficient as writing scripts within the language because of variable bindings. I need to work on this a bit to be sure though.
That's it, of course my libary is not done yet (no good error messages, no regex yet), but it's a proof of concept that is already usable for simpler problems. I hope to convince you that this is the way to go, and to get some ideas to create the POC into a working tool. Also, suggestions on 'interesting use cases' are welcome, the interception is a nice one but I am sure there are many more problems for which the standard checktestdata just not works.
Yours truly, Jeroen
[1] https://github.com/DOMjudge/checktestdata/issues/3
DOMjudge-devel mailing list DOMjudge-devel@domjudge.org https://www.domjudge.org/mailman/listinfo/domjudge-devel
DOMjudge-devel mailing list DOMjudge-devel@domjudge.org https://www.domjudge.org/mailman/listinfo/domjudge-devel
DOMjudge-devel mailing list DOMjudge-devel@domjudge.org https://www.domjudge.org/mailman/listinfo/domjudge-devel
DOMjudge-devel mailing list DOMjudge-devel@domjudge.org https://www.domjudge.org/mailman/listinfo/domjudge-devel
DOMjudge-devel mailing list DOMjudge-devel@domjudge.org https://www.domjudge.org/mailman/listinfo/domjudge-devel
DOMjudge-devel mailing list DOMjudge-devel@domjudge.org https://www.domjudge.org/mailman/listinfo/domjudge-devel
Op 12-5-2017 om 14:27 schreef Jaap Eldering:
On 12-05-17 12:13, Jaap Eldering wrote:
On 08-05-17 12:28, Jeroen Bransen wrote:
The test that fails is testprog25.in, which appears to contain an invalid (POSIX) regex. The '{' character is escaped, which is not allowed according to this [1] source (which is not reliable, but I couldn't find a better one). It seems likely that the fact that the current tests passes is due to a bug in some POSIX regex library, see [2]. I suppose that the escaping of '{' was a not on purpuse but just a mistake, can you confirm this Jaap?
I think instead that this is due to the escaping being interpreted twice, since also the '\\' actually is supposed to encode one literal backslash. I will have a look.
Ok, I had a look and committed some fixes.
First, I found that the C++ implementation was implicitly using ECMAscript regexes, since that's the default in the STL. I fixed that to match the documentation. That does imply that a dot now matches newlines. I see no easy way to fix that. I also fixed the regex tests, and indeed the escapes inside character ranges had to be removed (both for the '.' and the '}'.
This did uncover two other issues:
The regex generator code in the C++ implementation does not treat '^' at the beginning of a character range as a special character, so it generates the wrong output on tests/testdata28.in.
The Haskell implementation fails on tests/testprog23.in with the message
!?$$^[a-z]\n ^ Error in line 2 character 23 EOF expected
It seems that the Haskell regex code does not match newlines on dots (which is consistent with the old documentation).
The Haskell code implements the POSIX Extended regular expressions, which indeed don't match a newline for the dot.
I'm also open to suggestions to change to a different regex standard.
I am confused. What standard are you using/implementing now? The docs don't seem to mention one, and I think that looking at the code previously I concluded it was POSIX-extended. I am fine with changing to a different one, but I'd suggest either PCRE, POSIX-basic or POSIX-extended. These three are all implemented in Boost, which I assume you can easily use as the code already depends on Boost. Anyway, it would be good to pick one and make it explicit in the docs.
Jeroen
Jaap
[1] http://stackoverflow.com/questions/399078/what-special-characters-must-be-es... [2] https://wiki.haskell.org/Regex_Posix
Op 13-12-2016 om 19:29 schreef Jaap Eldering:
On 13-12-16 12:27, Jeroen Bransen wrote:
Hi Jaap,
Op 11-12-2016 om 11:42 schreef Jaap Eldering:
Hi Jeroen,
On 10-12-16 13:18, Jeroen Bransen wrote: > Hi all, > > Checktestdata is a nice DSL and has proved helpful in many situation, > but from the moment I used it I was missing features and wondering why > it wasn't implemented as an EDSL (Embedded Domain Specific Language). > Now, many years later, I've finally found some time to create it. I've > implemented a little library in Haskell, and as a sneak preview I've > taken the interception problem from NCPC 2016 as referenced to in [1] > and attached it. The first question is: where shall I put the sources to > the library? Somewhere in the checktestdata repository (and if so, > where), or maybe create a checktestdata2 repository? I'd say just put them in the checktestdata repository, simply in the root. We can always move stuff later.
Done, in a haskell_edsl subdirectory to avoid cluttering of the current code a bit.
Btw, this reminds me of code that Gerben Stavenga once wrote: he also wrote a reimplementation of checktestdata in Haskell, and it was similarly short,
Would his implementation still be available somewhere? I'd be curious to see it.
I think I still have a copy lying around, but on my desktop that's in storage in the Netherlands... I'll send Gerben an email, but I doubt he still has it.
but also a preliminary version lacking good error handling/reporting.
The version I've commited now does have proper error reporting, similar to the original checktestdata tool. Of course it may need some polishing here and there, but the basis is there and suffices for most cases I think.
I think this can be useful, especially for specific problems where either speed or extensibility of the syntax is needed. On the other hand, I think that the requirement of having GHC around is quite heavy, while a C++ compiler with boost is more standard, even on Windows I'd think?
What is your definition of standard? In number of people that have them installed: honestly no clue, could be that boost is far more popular. I have only installed boost in order to build checktestdata, and this has always taken me several hours. The long list of instructions [2] make it clear that getting boost to work on Windows is far from trivial, while Haskell platform has a native binary installer for all popular platforms and works pretty much out of the box. Yes, it is a large package, but disk space is not so much an issue nowadays, and as a Haskell programmer (ouch, I start to sound like a Haskell evangelist) I think it's worth the extra bytes on disk, as I really believe this can improve the checktestdata tool.
Ok, I wasn't aware that Boost was so tricky to install on Windows, on Debian it's just 'apt-get install libboost-dev' ;-)
Btw, my main point is that I think we should create an EDSL. Because of my background and because of its features I've chosen Haskell, but we could have look at other languages. Python crossed my mind but I'm afraid the syntax will be ugly and it won't be such a nice embedding as I've created now.
I like the idea of a EDSL and I think that Haskell is quite good to use in terms of syntax. I'm just wondering whether that benefit outweighs the disadvantage that now to run it, you actually have to compile your specific testcase description EDSL code instead of parsing the current script with a precompiled binary.
Do you think it would be easy to automatically convert the original checktestdata syntax scripts into this EDSL to then compile them? That way, we could support both syntaxes and users would be able to choose which method to use.
I guess this is pretty as hard as parsing the old syntax directly. The main challenge is the variable binding handling, your suggestion solves that for simple variables, but for array-indices we still need to take extra care. I'd also see this as a form of backwards compatibility, and not as a choice for the user, as this will never fully use the capabilities of the EDSL, and then one could as well just use the old version. The whole point is that you can do more with this EDSL (and of course speed improvement and easy maintainability are nice side-effects of that). I'll think about this a bit more.
My point is that I'd rather not dump the current syntax specification, so therefore I think backward compatibility is an important feature. Secondly, I think the current syntax is I think easier to start using than your EDSL, because then you need some knowledge of the underlying language.
But it might be best to try to keep both around for a bit to compare their advantages.
Jaap
[2] http://www.boost.org/doc/libs/1_62_0/more/getting_started/windows.html
> Let me give a little more information on why I think an EDSL is a good > idea here, and why I choose for Haskell (other than my background): > > + With an EDSL, you get so much for free from the underlying language, > such as (typed) variables, all sorts of abstraction mechanisms, compiler > optimizations, etc. For example, some testdata script may consist of > some repeated parts, which in an EDSL can simply be given a name to get > rid of the duplication. > + Compiler optimizations, which of course also exist for C++ in which > checktestdata is built now, don't only optimize the checktestdata > runtime, but also the script itself since it is just a program and not > some custom script anymore. Especially with a functional language such > as Haskell this means definitions of primitives can be inlined and whole > parts of the script may get fused to a very efficient version. > + Maintainability: my library is now +- 200 LOC, including docs and > empty lines, and contains quite a lot of the checktestdata language > already. Because Haskell has many well optimized parsing libraries, we > get a lot for free, and only have to create a nice generalization for > this particular use case. > + Efficiency: the current checktestdata is just too slow, as mentioned > in [1]. As a reference, on my machine the checktestdata script for > interception 09.max_ans.in takes 35.5s, the python script takes 2.2s, > and the attached version only 2s. And I haven't done anything to > optimize specifically for this case, I've just translated the ctd script > to the syntax of my EDSL. > + With this, the script can do more than just checking the syntax. I.e. > when the problem specifies that it contains a connected graph, one can > check the syntax, but also the connectedness of the graph, while still > separating concerns, i.e. the syntax of the input can be separated from > the semantic checks, but the semantic checks don't need to come in a > separate program which also has to read the input again, like we should > now. All can be combined naturally. > + Create your own combinators: the primitives can be easily combined > into more complex combinators which you can then reuse. I.e. think of a > 'point2d' combinator that reads two space separated integers. > > - Haskell Platform needs to be available, this is easy on all major OS, > but is a quite large dependency. Then again, GCC and Boost are large as > well, and are less Windows-friendly. > - Slighly different syntax: although Haskell is quite liberal in its > syntax for EDSL's, users have to get used to particularities of the > language/ > - Data generation is not easily possible, if not infeasible. There > should be some way of embedding the language to make this possible, and > this is actually what I've been working on for the past few years when > thinking about this problem, but since we mostly use it for checking > data I went with a monadic embedding which is very easy for data checking. > - Backwards compatibility: parsing existing scripts and running it with > the new backend is not going to be easy. I don't expect show stoppers, > but it definitely won't be as efficient as writing scripts within the > language because of variable bindings. I need to work on this a bit to > be sure though. > > That's it, of course my libary is not done yet (no good error messages, > no regex yet), but it's a proof of concept that is already usable for > simpler problems. I hope to convince you that this is the way to go, and > to get some ideas to create the POC into a working tool. Also, > suggestions on 'interesting use cases' are welcome, the interception is > a nice one but I am sure there are many more problems for which the > standard checktestdata just not works. > > Yours truly, > Jeroen > > [1] https://github.com/DOMjudge/checktestdata/issues/3 > > > > _______________________________________________ > DOMjudge-devel mailing list > DOMjudge-devel@domjudge.org > https://www.domjudge.org/mailman/listinfo/domjudge-devel > _______________________________________________ DOMjudge-devel mailing list DOMjudge-devel@domjudge.org https://www.domjudge.org/mailman/listinfo/domjudge-devel
DOMjudge-devel mailing list DOMjudge-devel@domjudge.org https://www.domjudge.org/mailman/listinfo/domjudge-devel
DOMjudge-devel mailing list DOMjudge-devel@domjudge.org https://www.domjudge.org/mailman/listinfo/domjudge-devel
DOMjudge-devel mailing list DOMjudge-devel@domjudge.org https://www.domjudge.org/mailman/listinfo/domjudge-devel
DOMjudge-devel mailing list DOMjudge-devel@domjudge.org https://www.domjudge.org/mailman/listinfo/domjudge-devel
DOMjudge-devel mailing list DOMjudge-devel@domjudge.org https://www.domjudge.org/mailman/listinfo/domjudge-devel
On 16-05-17 19:00, Jeroen Bransen wrote:
Op 12-5-2017 om 14:27 schreef Jaap Eldering:
On 12-05-17 12:13, Jaap Eldering wrote:
On 08-05-17 12:28, Jeroen Bransen wrote:
The test that fails is testprog25.in, which appears to contain an invalid (POSIX) regex. The '{' character is escaped, which is not allowed according to this [1] source (which is not reliable, but I couldn't find a better one). It seems likely that the fact that the current tests passes is due to a bug in some POSIX regex library, see [2]. I suppose that the escaping of '{' was a not on purpuse but just a mistake, can you confirm this Jaap?
I think instead that this is due to the escaping being interpreted twice, since also the '\\' actually is supposed to encode one literal backslash. I will have a look.
Ok, I had a look and committed some fixes.
First, I found that the C++ implementation was implicitly using ECMAscript regexes, since that's the default in the STL. I fixed that to match the documentation. That does imply that a dot now matches newlines. I see no easy way to fix that. I also fixed the regex tests, and indeed the escapes inside character ranges had to be removed (both for the '.' and the '}'.
This did uncover two other issues:
The regex generator code in the C++ implementation does not treat '^' at the beginning of a character range as a special character, so it generates the wrong output on tests/testdata28.in.
The Haskell implementation fails on tests/testprog23.in with the message
!?$$^[a-z]\n ^ Error in line 2 character 23 EOF expected
It seems that the Haskell regex code does not match newlines on dots (which is consistent with the old documentation).
The Haskell code implements the POSIX Extended regular expressions, which indeed don't match a newline for the dot.
Ok, but my understanding of POSIX extended regex is that it *should* match newlines on a dot. Indeed, see
http://pubs.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap09.html
section 9.4.4:
A period ( '.' ), when used outside a bracket expression, is an ERE that shall match any character in the supported character set except NUL.
The annoying thing is that C++ (extended) regex does not have an option to turn off this newline matching. The Boost regex implementation does actually allow turning this off with the option 'match_not_dot_newline', see
http://www.boost.org/doc/libs/1_64_0/libs/regex/doc/html/boost_regex/syntax/...
In principle I think that *not* matching newlines is more convenient in practice, since if you really want, you can write '[.\n]' to do match them.
I tried to fix the Haskell code, see attached diff, but that didn't work (at least for me).
I'm also open to suggestions to change to a different regex standard.
I am confused. What standard are you using/implementing now? The docs don't seem to mention one, and I think that looking at the code previously I concluded it was POSIX-extended. I am fine with changing to a different one, but I'd suggest either PCRE, POSIX-basic or POSIX-extended. These three are all implemented in Boost, which I assume you can easily use as the code already depends on Boost. Anyway, it would be good to pick one and make it explicit in the docs.
So I was aiming for POSIX extended, but was inadvertently using C++ regex which defaults to ECMAscript.
I'd prefer to stick to the C++ STL regex library, but that has the disadvantage I mentioned above. So that leaves me wondering what the best choice is.
Jaap
[1] http://stackoverflow.com/questions/399078/what-special-characters-must-be-es... [2] https://wiki.haskell.org/Regex_Posix
Op 13-12-2016 om 19:29 schreef Jaap Eldering:
On 13-12-16 12:27, Jeroen Bransen wrote:
Hi Jaap,
Op 11-12-2016 om 11:42 schreef Jaap Eldering: > Hi Jeroen, > > On 10-12-16 13:18, Jeroen Bransen wrote: >> Hi all, >> >> Checktestdata is a nice DSL and has proved helpful in many situation, >> but from the moment I used it I was missing features and wondering why >> it wasn't implemented as an EDSL (Embedded Domain Specific Language). >> Now, many years later, I've finally found some time to create it. I've >> implemented a little library in Haskell, and as a sneak preview I've >> taken the interception problem from NCPC 2016 as referenced to in [1] >> and attached it. The first question is: where shall I put the sources to >> the library? Somewhere in the checktestdata repository (and if so, >> where), or maybe create a checktestdata2 repository? > I'd say just put them in the checktestdata repository, simply in the > root. We can always move stuff later. Done, in a haskell_edsl subdirectory to avoid cluttering of the current code a bit.
> Btw, this reminds me of code that Gerben Stavenga once wrote: he also > wrote a reimplementation of checktestdata in Haskell, and it was > similarly short, Would his implementation still be available somewhere? I'd be curious to see it.
I think I still have a copy lying around, but on my desktop that's in storage in the Netherlands... I'll send Gerben an email, but I doubt he still has it.
> but also a preliminary version lacking good error > handling/reporting. The version I've commited now does have proper error reporting, similar to the original checktestdata tool. Of course it may need some polishing here and there, but the basis is there and suffices for most cases I think. > I think this can be useful, especially for specific problems where > either speed or extensibility of the syntax is needed. On the other > hand, I think that the requirement of having GHC around is quite heavy, > while a C++ compiler with boost is more standard, even on Windows I'd think? What is your definition of standard? In number of people that have them installed: honestly no clue, could be that boost is far more popular. I have only installed boost in order to build checktestdata, and this has always taken me several hours. The long list of instructions [2] make it clear that getting boost to work on Windows is far from trivial, while Haskell platform has a native binary installer for all popular platforms and works pretty much out of the box. Yes, it is a large package, but disk space is not so much an issue nowadays, and as a Haskell programmer (ouch, I start to sound like a Haskell evangelist) I think it's worth the extra bytes on disk, as I really believe this can improve the checktestdata tool.
Ok, I wasn't aware that Boost was so tricky to install on Windows, on Debian it's just 'apt-get install libboost-dev' ;-)
Btw, my main point is that I think we should create an EDSL. Because of my background and because of its features I've chosen Haskell, but we could have look at other languages. Python crossed my mind but I'm afraid the syntax will be ugly and it won't be such a nice embedding as I've created now.
I like the idea of a EDSL and I think that Haskell is quite good to use in terms of syntax. I'm just wondering whether that benefit outweighs the disadvantage that now to run it, you actually have to compile your specific testcase description EDSL code instead of parsing the current script with a precompiled binary.
> Do you think it would be easy to automatically convert the original > checktestdata syntax scripts into this EDSL to then compile them? That > way, we could support both syntaxes and users would be able to choose > which method to use. I guess this is pretty as hard as parsing the old syntax directly. The main challenge is the variable binding handling, your suggestion solves that for simple variables, but for array-indices we still need to take extra care. I'd also see this as a form of backwards compatibility, and not as a choice for the user, as this will never fully use the capabilities of the EDSL, and then one could as well just use the old version. The whole point is that you can do more with this EDSL (and of course speed improvement and easy maintainability are nice side-effects of that). I'll think about this a bit more.
My point is that I'd rather not dump the current syntax specification, so therefore I think backward compatibility is an important feature. Secondly, I think the current syntax is I think easier to start using than your EDSL, because then you need some knowledge of the underlying language.
But it might be best to try to keep both around for a bit to compare their advantages.
Jaap
[2] http://www.boost.org/doc/libs/1_62_0/more/getting_started/windows.html
>> Let me give a little more information on why I think an EDSL is a good >> idea here, and why I choose for Haskell (other than my background): >> >> + With an EDSL, you get so much for free from the underlying language, >> such as (typed) variables, all sorts of abstraction mechanisms, compiler >> optimizations, etc. For example, some testdata script may consist of >> some repeated parts, which in an EDSL can simply be given a name to get >> rid of the duplication. >> + Compiler optimizations, which of course also exist for C++ in which >> checktestdata is built now, don't only optimize the checktestdata >> runtime, but also the script itself since it is just a program and not >> some custom script anymore. Especially with a functional language such >> as Haskell this means definitions of primitives can be inlined and whole >> parts of the script may get fused to a very efficient version. >> + Maintainability: my library is now +- 200 LOC, including docs and >> empty lines, and contains quite a lot of the checktestdata language >> already. Because Haskell has many well optimized parsing libraries, we >> get a lot for free, and only have to create a nice generalization for >> this particular use case. >> + Efficiency: the current checktestdata is just too slow, as mentioned >> in [1]. As a reference, on my machine the checktestdata script for >> interception 09.max_ans.in takes 35.5s, the python script takes 2.2s, >> and the attached version only 2s. And I haven't done anything to >> optimize specifically for this case, I've just translated the ctd script >> to the syntax of my EDSL. >> + With this, the script can do more than just checking the syntax. I.e. >> when the problem specifies that it contains a connected graph, one can >> check the syntax, but also the connectedness of the graph, while still >> separating concerns, i.e. the syntax of the input can be separated from >> the semantic checks, but the semantic checks don't need to come in a >> separate program which also has to read the input again, like we should >> now. All can be combined naturally. >> + Create your own combinators: the primitives can be easily combined >> into more complex combinators which you can then reuse. I.e. think of a >> 'point2d' combinator that reads two space separated integers. >> >> - Haskell Platform needs to be available, this is easy on all major OS, >> but is a quite large dependency. Then again, GCC and Boost are large as >> well, and are less Windows-friendly. >> - Slighly different syntax: although Haskell is quite liberal in its >> syntax for EDSL's, users have to get used to particularities of the >> language/ >> - Data generation is not easily possible, if not infeasible. There >> should be some way of embedding the language to make this possible, and >> this is actually what I've been working on for the past few years when >> thinking about this problem, but since we mostly use it for checking >> data I went with a monadic embedding which is very easy for data checking. >> - Backwards compatibility: parsing existing scripts and running it with >> the new backend is not going to be easy. I don't expect show stoppers, >> but it definitely won't be as efficient as writing scripts within the >> language because of variable bindings. I need to work on this a bit to >> be sure though. >> >> That's it, of course my libary is not done yet (no good error messages, >> no regex yet), but it's a proof of concept that is already usable for >> simpler problems. I hope to convince you that this is the way to go, and >> to get some ideas to create the POC into a working tool. Also, >> suggestions on 'interesting use cases' are welcome, the interception is >> a nice one but I am sure there are many more problems for which the >> standard checktestdata just not works. >> >> Yours truly, >> Jeroen >> >> [1] https://github.com/DOMjudge/checktestdata/issues/3 >> >> >> >> _______________________________________________ >> DOMjudge-devel mailing list >> DOMjudge-devel@domjudge.org >> https://www.domjudge.org/mailman/listinfo/domjudge-devel >> > _______________________________________________ > DOMjudge-devel mailing list > DOMjudge-devel@domjudge.org > https://www.domjudge.org/mailman/listinfo/domjudge-devel _______________________________________________ DOMjudge-devel mailing list DOMjudge-devel@domjudge.org https://www.domjudge.org/mailman/listinfo/domjudge-devel
DOMjudge-devel mailing list DOMjudge-devel@domjudge.org https://www.domjudge.org/mailman/listinfo/domjudge-devel
DOMjudge-devel mailing list DOMjudge-devel@domjudge.org https://www.domjudge.org/mailman/listinfo/domjudge-devel
DOMjudge-devel mailing list DOMjudge-devel@domjudge.org https://www.domjudge.org/mailman/listinfo/domjudge-devel
DOMjudge-devel mailing list DOMjudge-devel@domjudge.org https://www.domjudge.org/mailman/listinfo/domjudge-devel
DOMjudge-devel mailing list DOMjudge-devel@domjudge.org https://www.domjudge.org/mailman/listinfo/domjudge-devel
On 18-05-17 00:36, Jaap Eldering wrote:
On 16-05-17 19:00, Jeroen Bransen wrote:
Op 12-5-2017 om 14:27 schreef Jaap Eldering:
On 12-05-17 12:13, Jaap Eldering wrote:
On 08-05-17 12:28, Jeroen Bransen wrote:
[...]
I tried to fix the Haskell code, see attached diff, but that didn't work (at least for me).
Hmm... I guess I was fuzzing around too much. I tested this patch again today, and it yields the expected result: the Haskell code makes a dot match newlines as POSIX extended regex prescribes.
Running through the tests, it did uncover a later test that does fail (in a minor way):
./checktestdata-haskell tests/testprog29.err2 tests/testdata29.in Testdata OK
but the minimum number of digits (-1) is out of range:
FLOATP(10,100,-1,3) NEWLINE
I'll commit changes to fix both. Jeroen, can you verify that my changes are correct?
Jaap
I'm also open to suggestions to change to a different regex standard.
I am confused. What standard are you using/implementing now? The docs don't seem to mention one, and I think that looking at the code previously I concluded it was POSIX-extended. I am fine with changing to a different one, but I'd suggest either PCRE, POSIX-basic or POSIX-extended. These three are all implemented in Boost, which I assume you can easily use as the code already depends on Boost. Anyway, it would be good to pick one and make it explicit in the docs.
So I was aiming for POSIX extended, but was inadvertently using C++ regex which defaults to ECMAscript.
I'd prefer to stick to the C++ STL regex library, but that has the disadvantage I mentioned above. So that leaves me wondering what the best choice is.
Jaap
[1] http://stackoverflow.com/questions/399078/what-special-characters-must-be-es... [2] https://wiki.haskell.org/Regex_Posix
Op 13-12-2016 om 19:29 schreef Jaap Eldering:
On 13-12-16 12:27, Jeroen Bransen wrote: > Hi Jaap, > > Op 11-12-2016 om 11:42 schreef Jaap Eldering: >> Hi Jeroen, >> >> On 10-12-16 13:18, Jeroen Bransen wrote: >>> Hi all, >>> >>> Checktestdata is a nice DSL and has proved helpful in many situation, >>> but from the moment I used it I was missing features and wondering why >>> it wasn't implemented as an EDSL (Embedded Domain Specific Language). >>> Now, many years later, I've finally found some time to create it. I've >>> implemented a little library in Haskell, and as a sneak preview I've >>> taken the interception problem from NCPC 2016 as referenced to in [1] >>> and attached it. The first question is: where shall I put the sources to >>> the library? Somewhere in the checktestdata repository (and if so, >>> where), or maybe create a checktestdata2 repository? >> I'd say just put them in the checktestdata repository, simply in the >> root. We can always move stuff later. > Done, in a haskell_edsl subdirectory to avoid cluttering of the current > code a bit. > >> Btw, this reminds me of code that Gerben Stavenga once wrote: he also >> wrote a reimplementation of checktestdata in Haskell, and it was >> similarly short, > Would his implementation still be available somewhere? I'd be curious to > see it. I think I still have a copy lying around, but on my desktop that's in storage in the Netherlands... I'll send Gerben an email, but I doubt he still has it.
>> but also a preliminary version lacking good error >> handling/reporting. > The version I've commited now does have proper error reporting, similar > to the original > checktestdata tool. Of course it may need some polishing here and there, > but the basis > is there and suffices for most cases I think. >> I think this can be useful, especially for specific problems where >> either speed or extensibility of the syntax is needed. On the other >> hand, I think that the requirement of having GHC around is quite heavy, >> while a C++ compiler with boost is more standard, even on Windows I'd think? > What is your definition of standard? In number of people that have them > installed: honestly no clue, could be that boost is far more popular. I > have only installed boost in order to build checktestdata, and this has > always taken me several hours. The long list of instructions [2] make it > clear that getting boost to work on Windows is far from trivial, while > Haskell platform has a native binary installer for all popular platforms > and works pretty much out of the box. Yes, it is a large package, but > disk space is not so much an issue nowadays, and as a Haskell programmer > (ouch, I start to sound like a Haskell evangelist) I think it's worth > the extra bytes on disk, as I really believe this can improve the > checktestdata tool. Ok, I wasn't aware that Boost was so tricky to install on Windows, on Debian it's just 'apt-get install libboost-dev' ;-)
> Btw, my main point is that I think we should create an EDSL. Because of > my background and because of its features I've chosen Haskell, but we > could have look at other languages. Python crossed my mind but I'm > afraid the syntax will be ugly and it won't be such a nice embedding as > I've created now. I like the idea of a EDSL and I think that Haskell is quite good to use in terms of syntax. I'm just wondering whether that benefit outweighs the disadvantage that now to run it, you actually have to compile your specific testcase description EDSL code instead of parsing the current script with a precompiled binary.
>> Do you think it would be easy to automatically convert the original >> checktestdata syntax scripts into this EDSL to then compile them? That >> way, we could support both syntaxes and users would be able to choose >> which method to use. > I guess this is pretty as hard as parsing the old syntax directly. The > main challenge is the variable binding handling, your suggestion solves > that for simple variables, but for array-indices we still need to take > extra care. I'd also see this as a form of backwards compatibility, and > not as a choice for the user, as this will never fully use the > capabilities of the EDSL, and then one could as well just use the old > version. The whole point is that you can do more with this EDSL (and of > course speed improvement and easy maintainability are nice side-effects > of that). I'll think about this a bit more. My point is that I'd rather not dump the current syntax specification, so therefore I think backward compatibility is an important feature. Secondly, I think the current syntax is I think easier to start using than your EDSL, because then you need some knowledge of the underlying language.
But it might be best to try to keep both around for a bit to compare their advantages.
Jaap
> [2] http://www.boost.org/doc/libs/1_62_0/more/getting_started/windows.html > >>> Let me give a little more information on why I think an EDSL is a good >>> idea here, and why I choose for Haskell (other than my background): >>> >>> + With an EDSL, you get so much for free from the underlying language, >>> such as (typed) variables, all sorts of abstraction mechanisms, compiler >>> optimizations, etc. For example, some testdata script may consist of >>> some repeated parts, which in an EDSL can simply be given a name to get >>> rid of the duplication. >>> + Compiler optimizations, which of course also exist for C++ in which >>> checktestdata is built now, don't only optimize the checktestdata >>> runtime, but also the script itself since it is just a program and not >>> some custom script anymore. Especially with a functional language such >>> as Haskell this means definitions of primitives can be inlined and whole >>> parts of the script may get fused to a very efficient version. >>> + Maintainability: my library is now +- 200 LOC, including docs and >>> empty lines, and contains quite a lot of the checktestdata language >>> already. Because Haskell has many well optimized parsing libraries, we >>> get a lot for free, and only have to create a nice generalization for >>> this particular use case. >>> + Efficiency: the current checktestdata is just too slow, as mentioned >>> in [1]. As a reference, on my machine the checktestdata script for >>> interception 09.max_ans.in takes 35.5s, the python script takes 2.2s, >>> and the attached version only 2s. And I haven't done anything to >>> optimize specifically for this case, I've just translated the ctd script >>> to the syntax of my EDSL. >>> + With this, the script can do more than just checking the syntax. I.e. >>> when the problem specifies that it contains a connected graph, one can >>> check the syntax, but also the connectedness of the graph, while still >>> separating concerns, i.e. the syntax of the input can be separated from >>> the semantic checks, but the semantic checks don't need to come in a >>> separate program which also has to read the input again, like we should >>> now. All can be combined naturally. >>> + Create your own combinators: the primitives can be easily combined >>> into more complex combinators which you can then reuse. I.e. think of a >>> 'point2d' combinator that reads two space separated integers. >>> >>> - Haskell Platform needs to be available, this is easy on all major OS, >>> but is a quite large dependency. Then again, GCC and Boost are large as >>> well, and are less Windows-friendly. >>> - Slighly different syntax: although Haskell is quite liberal in its >>> syntax for EDSL's, users have to get used to particularities of the >>> language/ >>> - Data generation is not easily possible, if not infeasible. There >>> should be some way of embedding the language to make this possible, and >>> this is actually what I've been working on for the past few years when >>> thinking about this problem, but since we mostly use it for checking >>> data I went with a monadic embedding which is very easy for data checking. >>> - Backwards compatibility: parsing existing scripts and running it with >>> the new backend is not going to be easy. I don't expect show stoppers, >>> but it definitely won't be as efficient as writing scripts within the >>> language because of variable bindings. I need to work on this a bit to >>> be sure though. >>> >>> That's it, of course my libary is not done yet (no good error messages, >>> no regex yet), but it's a proof of concept that is already usable for >>> simpler problems. I hope to convince you that this is the way to go, and >>> to get some ideas to create the POC into a working tool. Also, >>> suggestions on 'interesting use cases' are welcome, the interception is >>> a nice one but I am sure there are many more problems for which the >>> standard checktestdata just not works. >>> >>> Yours truly, >>> Jeroen >>> >>> [1] https://github.com/DOMjudge/checktestdata/issues/3 >>> >>> >>> >>> _______________________________________________ >>> DOMjudge-devel mailing list >>> DOMjudge-devel@domjudge.org >>> https://www.domjudge.org/mailman/listinfo/domjudge-devel >>> >> _______________________________________________ >> DOMjudge-devel mailing list >> DOMjudge-devel@domjudge.org >> https://www.domjudge.org/mailman/listinfo/domjudge-devel > _______________________________________________ > DOMjudge-devel mailing list > DOMjudge-devel@domjudge.org > https://www.domjudge.org/mailman/listinfo/domjudge-devel > _______________________________________________ DOMjudge-devel mailing list DOMjudge-devel@domjudge.org https://www.domjudge.org/mailman/listinfo/domjudge-devel
DOMjudge-devel mailing list DOMjudge-devel@domjudge.org https://www.domjudge.org/mailman/listinfo/domjudge-devel
DOMjudge-devel mailing list DOMjudge-devel@domjudge.org https://www.domjudge.org/mailman/listinfo/domjudge-devel
DOMjudge-devel mailing list DOMjudge-devel@domjudge.org https://www.domjudge.org/mailman/listinfo/domjudge-devel
DOMjudge-devel mailing list DOMjudge-devel@domjudge.org https://www.domjudge.org/mailman/listinfo/domjudge-devel
DOMjudge-devel mailing list DOMjudge-devel@domjudge.org https://www.domjudge.org/mailman/listinfo/domjudge-devel
Op 18-5-2017 om 12:45 schreef Jaap Eldering:
On 18-05-17 00:36, Jaap Eldering wrote:
On 16-05-17 19:00, Jeroen Bransen wrote:
Op 12-5-2017 om 14:27 schreef Jaap Eldering:
On 12-05-17 12:13, Jaap Eldering wrote:
On 08-05-17 12:28, Jeroen Bransen wrote:
[...]
I tried to fix the Haskell code, see attached diff, but that didn't work (at least for me).
Hmm... I guess I was fuzzing around too much. I tested this patch again today, and it yields the expected result: the Haskell code makes a dot match newlines as POSIX extended regex prescribes.
Running through the tests, it did uncover a later test that does fail (in a minor way):
./checktestdata-haskell tests/testprog29.err2 tests/testdata29.in Testdata OK
but the minimum number of digits (-1) is out of range:
FLOATP(10,100,-1,3) NEWLINE
I'll commit changes to fix both. Jeroen, can you verify that my changes are correct?
Well, it's Haskell code and it compiles, so it must be right... ;-) But yes, I thought this was more complicated, but now that I look at your changes this is quite straightforward. I'll try to add some examples soon.
Jeroen
Jaap
I'm also open to suggestions to change to a different regex standard.
I am confused. What standard are you using/implementing now? The docs don't seem to mention one, and I think that looking at the code previously I concluded it was POSIX-extended. I am fine with changing to a different one, but I'd suggest either PCRE, POSIX-basic or POSIX-extended. These three are all implemented in Boost, which I assume you can easily use as the code already depends on Boost. Anyway, it would be good to pick one and make it explicit in the docs.
So I was aiming for POSIX extended, but was inadvertently using C++ regex which defaults to ECMAscript.
I'd prefer to stick to the C++ STL regex library, but that has the disadvantage I mentioned above. So that leaves me wondering what the best choice is.
Jaap
[1] http://stackoverflow.com/questions/399078/what-special-characters-must-be-es... [2] https://wiki.haskell.org/Regex_Posix
Op 13-12-2016 om 19:29 schreef Jaap Eldering: > On 13-12-16 12:27, Jeroen Bransen wrote: >> Hi Jaap, >> >> Op 11-12-2016 om 11:42 schreef Jaap Eldering: >>> Hi Jeroen, >>> >>> On 10-12-16 13:18, Jeroen Bransen wrote: >>>> Hi all, >>>> >>>> Checktestdata is a nice DSL and has proved helpful in many situation, >>>> but from the moment I used it I was missing features and wondering why >>>> it wasn't implemented as an EDSL (Embedded Domain Specific Language). >>>> Now, many years later, I've finally found some time to create it. I've >>>> implemented a little library in Haskell, and as a sneak preview I've >>>> taken the interception problem from NCPC 2016 as referenced to in [1] >>>> and attached it. The first question is: where shall I put the sources to >>>> the library? Somewhere in the checktestdata repository (and if so, >>>> where), or maybe create a checktestdata2 repository? >>> I'd say just put them in the checktestdata repository, simply in the >>> root. We can always move stuff later. >> Done, in a haskell_edsl subdirectory to avoid cluttering of the current >> code a bit. >> >>> Btw, this reminds me of code that Gerben Stavenga once wrote: he also >>> wrote a reimplementation of checktestdata in Haskell, and it was >>> similarly short, >> Would his implementation still be available somewhere? I'd be curious to >> see it. > I think I still have a copy lying around, but on my desktop that's in > storage in the Netherlands... I'll send Gerben an email, but I doubt he > still has it. > >>> but also a preliminary version lacking good error >>> handling/reporting. >> The version I've commited now does have proper error reporting, similar >> to the original >> checktestdata tool. Of course it may need some polishing here and there, >> but the basis >> is there and suffices for most cases I think. >>> I think this can be useful, especially for specific problems where >>> either speed or extensibility of the syntax is needed. On the other >>> hand, I think that the requirement of having GHC around is quite heavy, >>> while a C++ compiler with boost is more standard, even on Windows I'd think? >> What is your definition of standard? In number of people that have them >> installed: honestly no clue, could be that boost is far more popular. I >> have only installed boost in order to build checktestdata, and this has >> always taken me several hours. The long list of instructions [2] make it >> clear that getting boost to work on Windows is far from trivial, while >> Haskell platform has a native binary installer for all popular platforms >> and works pretty much out of the box. Yes, it is a large package, but >> disk space is not so much an issue nowadays, and as a Haskell programmer >> (ouch, I start to sound like a Haskell evangelist) I think it's worth >> the extra bytes on disk, as I really believe this can improve the >> checktestdata tool. > Ok, I wasn't aware that Boost was so tricky to install on Windows, on > Debian it's just 'apt-get install libboost-dev' ;-) > >> Btw, my main point is that I think we should create an EDSL. Because of >> my background and because of its features I've chosen Haskell, but we >> could have look at other languages. Python crossed my mind but I'm >> afraid the syntax will be ugly and it won't be such a nice embedding as >> I've created now. > I like the idea of a EDSL and I think that Haskell is quite good to use > in terms of syntax. I'm just wondering whether that benefit outweighs > the disadvantage that now to run it, you actually have to compile your > specific testcase description EDSL code instead of parsing the current > script with a precompiled binary. > >>> Do you think it would be easy to automatically convert the original >>> checktestdata syntax scripts into this EDSL to then compile them? That >>> way, we could support both syntaxes and users would be able to choose >>> which method to use. >> I guess this is pretty as hard as parsing the old syntax directly. The >> main challenge is the variable binding handling, your suggestion solves >> that for simple variables, but for array-indices we still need to take >> extra care. I'd also see this as a form of backwards compatibility, and >> not as a choice for the user, as this will never fully use the >> capabilities of the EDSL, and then one could as well just use the old >> version. The whole point is that you can do more with this EDSL (and of >> course speed improvement and easy maintainability are nice side-effects >> of that). I'll think about this a bit more. > My point is that I'd rather not dump the current syntax specification, > so therefore I think backward compatibility is an important feature. > Secondly, I think the current syntax is I think easier to start using > than your EDSL, because then you need some knowledge of the underlying > language. > > But it might be best to try to keep both around for a bit to compare > their advantages. > > Jaap > > >> [2] http://www.boost.org/doc/libs/1_62_0/more/getting_started/windows.html >> >>>> Let me give a little more information on why I think an EDSL is a good >>>> idea here, and why I choose for Haskell (other than my background): >>>> >>>> + With an EDSL, you get so much for free from the underlying language, >>>> such as (typed) variables, all sorts of abstraction mechanisms, compiler >>>> optimizations, etc. For example, some testdata script may consist of >>>> some repeated parts, which in an EDSL can simply be given a name to get >>>> rid of the duplication. >>>> + Compiler optimizations, which of course also exist for C++ in which >>>> checktestdata is built now, don't only optimize the checktestdata >>>> runtime, but also the script itself since it is just a program and not >>>> some custom script anymore. Especially with a functional language such >>>> as Haskell this means definitions of primitives can be inlined and whole >>>> parts of the script may get fused to a very efficient version. >>>> + Maintainability: my library is now +- 200 LOC, including docs and >>>> empty lines, and contains quite a lot of the checktestdata language >>>> already. Because Haskell has many well optimized parsing libraries, we >>>> get a lot for free, and only have to create a nice generalization for >>>> this particular use case. >>>> + Efficiency: the current checktestdata is just too slow, as mentioned >>>> in [1]. As a reference, on my machine the checktestdata script for >>>> interception 09.max_ans.in takes 35.5s, the python script takes 2.2s, >>>> and the attached version only 2s. And I haven't done anything to >>>> optimize specifically for this case, I've just translated the ctd script >>>> to the syntax of my EDSL. >>>> + With this, the script can do more than just checking the syntax. I.e. >>>> when the problem specifies that it contains a connected graph, one can >>>> check the syntax, but also the connectedness of the graph, while still >>>> separating concerns, i.e. the syntax of the input can be separated from >>>> the semantic checks, but the semantic checks don't need to come in a >>>> separate program which also has to read the input again, like we should >>>> now. All can be combined naturally. >>>> + Create your own combinators: the primitives can be easily combined >>>> into more complex combinators which you can then reuse. I.e. think of a >>>> 'point2d' combinator that reads two space separated integers. >>>> >>>> - Haskell Platform needs to be available, this is easy on all major OS, >>>> but is a quite large dependency. Then again, GCC and Boost are large as >>>> well, and are less Windows-friendly. >>>> - Slighly different syntax: although Haskell is quite liberal in its >>>> syntax for EDSL's, users have to get used to particularities of the >>>> language/ >>>> - Data generation is not easily possible, if not infeasible. There >>>> should be some way of embedding the language to make this possible, and >>>> this is actually what I've been working on for the past few years when >>>> thinking about this problem, but since we mostly use it for checking >>>> data I went with a monadic embedding which is very easy for data checking. >>>> - Backwards compatibility: parsing existing scripts and running it with >>>> the new backend is not going to be easy. I don't expect show stoppers, >>>> but it definitely won't be as efficient as writing scripts within the >>>> language because of variable bindings. I need to work on this a bit to >>>> be sure though. >>>> >>>> That's it, of course my libary is not done yet (no good error messages, >>>> no regex yet), but it's a proof of concept that is already usable for >>>> simpler problems. I hope to convince you that this is the way to go, and >>>> to get some ideas to create the POC into a working tool. Also, >>>> suggestions on 'interesting use cases' are welcome, the interception is >>>> a nice one but I am sure there are many more problems for which the >>>> standard checktestdata just not works. >>>> >>>> Yours truly, >>>> Jeroen >>>> >>>> [1] https://github.com/DOMjudge/checktestdata/issues/3 >>>> >>>> >>>> >>>> _______________________________________________ >>>> DOMjudge-devel mailing list >>>> DOMjudge-devel@domjudge.org >>>> https://www.domjudge.org/mailman/listinfo/domjudge-devel >>>> >>> _______________________________________________ >>> DOMjudge-devel mailing list >>> DOMjudge-devel@domjudge.org >>> https://www.domjudge.org/mailman/listinfo/domjudge-devel >> _______________________________________________ >> DOMjudge-devel mailing list >> DOMjudge-devel@domjudge.org >> https://www.domjudge.org/mailman/listinfo/domjudge-devel >> > _______________________________________________ > DOMjudge-devel mailing list > DOMjudge-devel@domjudge.org > https://www.domjudge.org/mailman/listinfo/domjudge-devel _______________________________________________ DOMjudge-devel mailing list DOMjudge-devel@domjudge.org https://www.domjudge.org/mailman/listinfo/domjudge-devel
DOMjudge-devel mailing list DOMjudge-devel@domjudge.org https://www.domjudge.org/mailman/listinfo/domjudge-devel
DOMjudge-devel mailing list DOMjudge-devel@domjudge.org https://www.domjudge.org/mailman/listinfo/domjudge-devel
DOMjudge-devel mailing list DOMjudge-devel@domjudge.org https://www.domjudge.org/mailman/listinfo/domjudge-devel
DOMjudge-devel mailing list DOMjudge-devel@domjudge.org https://www.domjudge.org/mailman/listinfo/domjudge-devel
DOMjudge-devel mailing list DOMjudge-devel@domjudge.org https://www.domjudge.org/mailman/listinfo/domjudge-devel