[Ur] Regular expressions DSEL?

Adam Chlipala adamc at csail.mit.edu
Tue Feb 28 10:02:27 EST 2017


The easiest solution I found is to add this directive to test.urp: 
monoInline 10

Short rationale: the compiler uses conservative program analysis to find 
opportunities to get rid of first-class functions.  Uses of the 
[transaction] monad are compiled internally into first-class functions, 
so we need to get rid of all their first-classness, as the compiler for 
server-side code currently doesn't support functions at runtime.  Extra 
inlining can help reveal more structure to the program analysis, and it 
did so in this case.

On 02/27/2017 11:22 AM, Artyom Shalkhakov wrote:
> Hello all,
>
> So I hit a road-block, the code:
>
> > fun groups (): transaction page =
> >  m <- return ((Tregex.match'
> >    (concat (literal "a") (capture [#X] (literal "b"))) "ab")
> >    : option {Whole:counted_substring, Groups:{X:counted_substring}});
> >  case m of
> >    None => return <xml>Failed: mismatch!</xml>
> >  | Some {Whole = whole, Groups = {X = {Start=s,Len=l}}} => return 
> <xml>Success? Whole match: {[whole.Start]} + {[whole.Len]}, group is 
> {[s]} + {[l]}</xml>
>
> Gives this error:
>
> > .../test.ur:137:17: (to 139:7) Anonymous function remains at code 
> generation
> > Function:
> > (fn _ : {} =>
> >   (case UNBOUND_1 of
> >     None => write("Failed: mismatch!") |
> >      Some {Whole = whole, Groups = {X = {Start = s, Len = l}}} =>
> >       (write("Success? Whole match: ");
> >        (FFI(Basis.htmlifyInt_w(whole.Start));
> >         (write(" + ");
> >          (FFI(Basis.htmlifyInt_w(whole.Len));
> >           (write(", group is ");
> >            (FFI(Basis.htmlifyInt_w(s));
> >             (write(" + "); FFI(Basis.htmlifyInt_w(l)))))))))))
>
> To reproduce, you'll have to build urweb-regex (I've put some 
> directions into README in my branch) and then do [cd tests && make all]
>
> What can be done to avoid this issue? I tried to put type annotations, 
> didn't work.
>
> 2017-02-25 9:30 GMT+06:00 Artyom Shalkhakov 
> <artyom.shalkhakov at gmail.com <mailto:artyom.shalkhakov at gmail.com>>:
>
>     Hello all,
>
>     2017-02-23 21:55 GMT+06:00, Artyom Shalkhakov
>     <artyom.shalkhakov at gmail.com <mailto:artyom.shalkhakov at gmail.com>>:
>     > Hello Benjamin, Ziv,
>     >
>     > I wrote a very rough approximation to the idea that Ziv
>     proposed, here's
>     > the code (currently, it does not do much):
>     >
>     > https://github.com/ashalkhakov/urweb-regex/tree/typed-regex
>     <https://github.com/ashalkhakov/urweb-regex/tree/typed-regex>
>     >
>     > At first I was thinking that it would be better to introduce
>     named groups,
>     > but now I'm wondering how to handle positional groups, instead?
>     >
>     > Tangentially, I also noticed that JS does not handle capture
>     groups mixed
>     > with repetition, e.g.
>     >
>     > 'xabxabxab'.match(/(xab){3}/) // or with /(xab)*/
>     >
>     > gives: ["xabxabxab", "xab"] (would it be worth it to try to rule
>     out cases
>     > like this statically? probably not?)
>     >
>
>     I guess I figured it out. The idea is to keep track of index for every
>     group as well as the total count of groups in a regular expression.
>     Then, when constructing a capture:
>
>     > capture [#Name] E
>
>     the group indexes in E get incremented by 1, a new group is added:
>     {Name=0}, and the total count of groups is increased by 1.
>
>     Similarly, when concatenating two expressions in
>
>     > concat E1 E2
>
>     the group indexes in E2 will all get incremented by N, where N is the
>     count of groups of E1.
>
>     I think it should work.
>
>     >
>     > 2017-02-21 21:21 GMT+06:00 Benjamin Barenblat <bbaren at mit.edu
>     <mailto:bbaren at mit.edu>>:
>     >
>     >> On Mon, Feb 20, 2017 at 10:27 PM, Artyom Shalkhakov
>     >> <artyom.shalkhakov at gmail.com
>     <mailto:artyom.shalkhakov at gmail.com>> wrote:
>     >> > Thank you for the pointer. I guess creating a new package
>     that depends
>     >> > on
>     >> > urweb-regex is the way to go.
>     >>
>     >> I’m also happy to merge changes to urweb-regex. I think a
>     richly-typed
>     >> API like the one you’re looking for would be quite valuable in the
>     >> regex library.
>     >>
>     >> _______________________________________________
>     >> Ur mailing list
>     >> Ur at impredicative.com <mailto:Ur at impredicative.com>
>     >> http://www.impredicative.com/cgi-bin/mailman/listinfo/ur
>     <http://www.impredicative.com/cgi-bin/mailman/listinfo/ur>
>     >>
>     >
>     >
>     >
>     > --
>     > Cheers,
>     > Artyom Shalkhakov
>     >
>
>
>     --
>     Cheers,
>     Artyom Shalkhakov
>
>
>
>
> -- 
> Cheers,
> Artyom Shalkhakov
>
>
> _______________________________________________
> Ur mailing list
> Ur at impredicative.com
> http://www.impredicative.com/cgi-bin/mailman/listinfo/ur

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.impredicative.com/pipermail/ur/attachments/20170228/b1f8e7bb/attachment.html>


More information about the Ur mailing list