[Ur] Regular expressions DSEL?

Artyom Shalkhakov artyom.shalkhakov at gmail.com
Mon Feb 27 11:22:05 EST 2017


Hello all,

So I hit a road-block, the code:

> fun groups (): transaction page =
>  m <- return ((Tregex.match'
>    (concat (literal "a") (capture [#X] (literal "b"))) "ab")
>    : option {Whole:counted_substring, Groups:{X:counted_substring}});
>  case m of
>    None => return <xml>Failed: mismatch!</xml>
>  | Some {Whole = whole, Groups = {X = {Start=s,Len=l}}} => return
<xml>Success? Whole match: {[whole.Start]} + {[whole.Len]}, group is {[s]}
+ {[l]}</xml>

Gives this error:

> .../test.ur:137:17: (to 139:7) Anonymous function remains at code
generation
> Function:
> (fn _ : {} =>
>   (case UNBOUND_1 of
>     None => write("Failed: mismatch!") |
>      Some {Whole = whole, Groups = {X = {Start = s, Len = l}}} =>
>       (write("Success? Whole match: ");
>        (FFI(Basis.htmlifyInt_w(whole.Start));
>         (write(" + ");
>          (FFI(Basis.htmlifyInt_w(whole.Len));
>           (write(", group is ");
>            (FFI(Basis.htmlifyInt_w(s));
>             (write(" + "); FFI(Basis.htmlifyInt_w(l)))))))))))

To reproduce, you'll have to build urweb-regex (I've put some directions
into README in my branch) and then do [cd tests && make all]

What can be done to avoid this issue? I tried to put type annotations,
didn't work.

2017-02-25 9:30 GMT+06:00 Artyom Shalkhakov <artyom.shalkhakov at gmail.com>:

> Hello all,
>
> 2017-02-23 21:55 GMT+06:00, Artyom Shalkhakov <artyom.shalkhakov at gmail.com
> >:
> > Hello Benjamin, Ziv,
> >
> > I wrote a very rough approximation to the idea that Ziv proposed, here's
> > the code (currently, it does not do much):
> >
> > https://github.com/ashalkhakov/urweb-regex/tree/typed-regex
> >
> > At first I was thinking that it would be better to introduce named
> groups,
> > but now I'm wondering how to handle positional groups, instead?
> >
> > Tangentially, I also noticed that JS does not handle capture groups mixed
> > with repetition, e.g.
> >
> > 'xabxabxab'.match(/(xab){3}/) // or with /(xab)*/
> >
> > gives: ["xabxabxab", "xab"] (would it be worth it to try to rule out
> cases
> > like this statically? probably not?)
> >
>
> I guess I figured it out. The idea is to keep track of index for every
> group as well as the total count of groups in a regular expression.
> Then, when constructing a capture:
>
> > capture [#Name] E
>
> the group indexes in E get incremented by 1, a new group is added:
> {Name=0}, and the total count of groups is increased by 1.
>
> Similarly, when concatenating two expressions in
>
> > concat E1 E2
>
> the group indexes in E2 will all get incremented by N, where N is the
> count of groups of E1.
>
> I think it should work.
>
> >
> > 2017-02-21 21:21 GMT+06:00 Benjamin Barenblat <bbaren at mit.edu>:
> >
> >> On Mon, Feb 20, 2017 at 10:27 PM, Artyom Shalkhakov
> >> <artyom.shalkhakov at gmail.com> wrote:
> >> > Thank you for the pointer. I guess creating a new package that depends
> >> > on
> >> > urweb-regex is the way to go.
> >>
> >> I’m also happy to merge changes to urweb-regex. I think a richly-typed
> >> API like the one you’re looking for would be quite valuable in the
> >> regex library.
> >>
> >> _______________________________________________
> >> Ur mailing list
> >> Ur at impredicative.com
> >> http://www.impredicative.com/cgi-bin/mailman/listinfo/ur
> >>
> >
> >
> >
> > --
> > Cheers,
> > Artyom Shalkhakov
> >
>
>
> --
> Cheers,
> Artyom Shalkhakov
>



-- 
Cheers,
Artyom Shalkhakov
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.impredicative.com/pipermail/ur/attachments/20170227/417d99ec/attachment.html>


More information about the Ur mailing list