[Ur] Regular expressions DSEL?

Ziv Scully zivscully at gmail.com
Thu Mar 9 08:39:49 EST 2017


Looks good—glad it worked out!

My main question: what happens if the whole string is matched but a
particular subgroup isn't? This can happen with alt or star, for instance.
I couldn't tell from the types. I would expect a return type for match to
be something like `match r (option string)` instead of `option (match r
string)`. I don't know the underlying regex library, so perhaps there's a
good reason for this.

I also thought it may be useful if alt allowed the same group to be
captured on either side, but I realize that takes further fiddling with
subgroup indices and post-processing of the results (each group field name
would map to a list of indices instead of a single index), so it can wait
for version 2 :).


On Tue, Mar 7, 2017 at 02:11 Artyom Shalkhakov <artyom.shalkhakov at gmail.com>
wrote:

> Hello Ziv,
>
> I've implemented your idea [1], and it works. Looking for feedback.
>
> 2017-02-20 15:48 GMT+06:00 Ziv Scully <zivscully at gmail.com>:
> > By parametrizing over a type-level record, you can encode the names of
> > matched subgroups in the type of the regular expression. The interface
> > probably looks something like:
> >
> > (* Abstract type of uncompiled regular expressions. *)
> > con t :: {Unit} (* names of captured subgroups *) -> Type
> >
> > val literal :
> >     string -> t []
> > val any : t []
> > val concat : r1 ::: {Unit} -> r2 ::: {Unit} -> [r1 ~ r2] =>
> >     t r1 -> t r2 -> t (r1 ++ r2)
> > val star : r ::: {Unit} ->
> >     t r -> t r
> > val alt : r ::: {Unit} -> s1 ::: {Unit} -> s2 ::: {Unit} -> [r ~ s1] =>
> [r ~
> > s2] => [s1 ~ s2] =>
> >     t (r ++ s1) -> t (r ++ s2) -> t (r ++ s1 ++ s2)
> > val capture : r ::: {Unit} ->
> >     nm :: Name -> [r ~ [nm]] => t r -> t (r ++ [nm])
> >
> > (* Abstract type of compiled regular expressions *)
> > con compiled :: {Unit} -> Type
> >
> > val compile : r ::: {Unit} ->
> >     t r -> compiled r
> >
> > Obviously this probably won't exactly work. For instance, if the
> underlying
> > library requires explicit names for captured groups, you'll need
> something
> > more like
> >
> > val capture : r ::: {Unit} -> nm :: Name -> [r ~ [nm]] =>
> >     {nm : string} -> t r -> t (r ++ [nm])
> >
> > because there's no way to extract a string from a type-level Name. In
> fact,
> > this way has another benefit: the internal representation of the type can
> > just be a string! All the tracking of subgroups can happen at the type
> > level. You may not even need compile.
> >
> > This interface assumes each subgroup may appear 0 or 1 times. You could
> > maybe keep track of the number of matches more precisely by using {Type}
> > instead of {Unit} and using a type class to implement type-level
> functions.
> >
> > con t : {Type} (* maps names of captured subgroups to type-level
> > representation of number of matched fields *) -> Type
> >
> > type one
> > type oneOrNone
> >
> > class meet : Type -> Type -> Type
> > val meet_one_oneOrNone : weaken one oneOrNone oneOrNone
> > val meet_oneOrNone_one : weaken oneOrNone one oneOrNone
> > val meet_oneOrNone_oneOrNone : weaken oneOrNone oneOrNone oneOrNone
> >
> > This makes the types more complicated, but I suspect (though I don't know
> > for sure) that Ur/Web will be able to infer the extra stuff in any
> concrete
> > (that is, monomorphic) use, so the user code will still look pretty.
> Here's
> > how some of the functions look with this. The place weaken gets used is
> alt.
> >
> > val star : r ::: {Type} ->
> >     t r -> t (map (fn _ => oneOrNo) r)
> >
> > val alt : r ::: {Type * Type * Type} -> s1 ::: {Type} -> s2 ::: {Type} ->
> >     [map fst3 r ~ s1] => [map snd3 r ~ s2] => [s1 ~ s2] =>
> >     $(map (fn (t1, t2, t3) => weaken t1 t2 t3) r)
> >     t (map fst3 r ++ s1) -> t (map snd3 r ++ s2)
> >     -> t (map thd3 r ++ map (fn _ => oneOrNo) (s1 ++ s2))
> >
> > You could extend the same sort of idea to types like oneOrMore and
> > noneOrMore—useful if the underlying library can return a list of matches
> > inside a + or *—by adding more cases to weaken.
> >
> > Here's a Haskell library that takes basically this approach:
> > https://github.com/sweirich/dth/tree/master/popl17/src
> >
> > Hope this helps!
> > Ziv
> >
> >
> > On Mon, Feb 20, 2017 at 12:00 AM, Artyom Shalkhakov
> > <artyom.shalkhakov at gmail.com> wrote:
> >>
> >> 2017-02-20 10:47 GMT+06:00 <fold at tuta.io>:
> >>>
> >>> Maybe just define another attribute in basis.urs?
> >>>
> >>> It would be much interesting to define a DSEL though. I have not seen
> any
> >>> attempts to create a DSEL for regular expressions.
> >>>
> >>
> >> Indeed! And it seems like it would be a nice exercise as well having
> some
> >> practical uses to it.
> >>
> >> I envision something along the lines of:
> >>
> >> datatype RegExp = ... (* constructors *)
> >>
> >> type t (* abstract, not exported outside the module *)
> >> val compile : RegExp -> t (* the trusted function *)
> >> val matches : string -> t -> bool (* check if passed-in string matches
> the
> >> expression *)
> >>
> >> And the implementation might use something unsafe (like a string, or an
> >> actual regexp object in JS). Then we could offload the interpretation of
> >> regexps to an off-the-shelf library. The [compile] function would print
> and
> >> escape the AST to a regexp string, and the Ur/Web compiler might
> actually
> >> optimize all of this away.
> >>
> >> If we are to take care of matching subgroups, that would require a more
> >> complicated type. I guess that should be doable as well, but I don't see
> >> immediately how to handle this.
> >>
> >>>
> >>>
> >>>
> >>> 20. Feb 2017 16:28 by artyom.shalkhakov at gmail.com:
> >>>
> >>>
> >>> Hello all,
> >>>
> >>> 'm doing form validation stuff, following examples on MDN. [1]
> >>>
> >>> It seems like Ur/Web doesn't handle regular expressions?
> >>>
> >>> I guess it should not be too diffcult to define a DSEL in Ur/Web (just
> >>> AST constructors and a few trusted functions: e.g. [compile : AST ->
> >>> regexp], where [regexp] is an abstract type, or maybe just a string
> :-)) and
> >>> have it work at least in the browser.
> >>>
> >>> Could somebody point me in the right direction here?
> >>>
> >>> --
> >>> Cheers,
> >>> Artyom Shalkhakov
> >>>
> >>> [1]
> >>>
> https://developer.mozilla.org/en-US/docs/Web/Guide/HTML/HTML5/Constraint_validation
> >>>
> >>
> >>
> >>
> >> --
> >> Cheers,
> >> Artyom Shalkhakov
> >>
> >> _______________________________________________
> >> Ur mailing list
> >> Ur at impredicative.com
> >> http://www.impredicative.com/cgi-bin/mailman/listinfo/ur
> >>
> >
> >
> > _______________________________________________
> > Ur mailing list
> > Ur at impredicative.com
> > http://www.impredicative.com/cgi-bin/mailman/listinfo/ur
> >
>
> --
> Cheers,
> Artyom Shalkhakov
>
> [1] https://github.com/bbarenblat/urweb-regex/pull/1
>
> _______________________________________________
> Ur mailing list
> Ur at impredicative.com
> http://www.impredicative.com/cgi-bin/mailman/listinfo/ur
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.impredicative.com/pipermail/ur/attachments/20170309/190d3a4d/attachment.html>


More information about the Ur mailing list