[Ur] Regular expressions DSEL?

Artyom Shalkhakov artyom.shalkhakov at gmail.com
Thu Mar 9 09:16:29 EST 2017


2017-03-09 19:39 GMT+06:00 Ziv Scully <zivscully at gmail.com>:
> Looks good—glad it worked out!
>
> My main question: what happens if the whole string is matched but a
> particular subgroup isn't? This can happen with alt or star, for instance. I
> couldn't tell from the types. I would expect a return type for match to be
> something like `match r (option string)` instead of `option (match r
> string)`. I don't know the underlying regex library, so perhaps there's a
> good reason for this.

I think a test is in order; I don't know what happens in that case (do
you have an example handy?). From JS RegExp object documentation, it
seems that subgroups are always present (but, indeed, the strings
might be nulls or something else entirely, this is JS, after all).
Good catch!

>
> I also thought it may be useful if alt allowed the same group to be captured
> on either side, but I realize that takes further fiddling with subgroup
> indices and post-processing of the results (each group field name would map
> to a list of indices instead of a single index), so it can wait for version
> 2 :).
>

I was a bit puzzled about the type you gave for [alt], it only dawned
on me later what you meant with it. :-)

I can see two questions here:

1. say we have an expression like [a(b)*c], if matched against
[abbbbbbc] in JS, it will only capture the *first* b -- I guess
Boost.Regex which is used on the server behaves the same (Benjamin,
could you help out on this?)

2. say we have an expression like [(a)|(b)], in JS the groups will be
1 and 2. PCRE and some other engines support a "group reset" feature,
so that we can indeed capture both alternatives with one group. This
is unsupported in JS, so I decided to leave it as is, for now.

>
> On Tue, Mar 7, 2017 at 02:11 Artyom Shalkhakov <artyom.shalkhakov at gmail.com>
> wrote:
>>
>> Hello Ziv,
>>
>> I've implemented your idea [1], and it works. Looking for feedback.
>>
>> 2017-02-20 15:48 GMT+06:00 Ziv Scully <zivscully at gmail.com>:
>> > By parametrizing over a type-level record, you can encode the names of
>> > matched subgroups in the type of the regular expression. The interface
>> > probably looks something like:
>> >
>> > (* Abstract type of uncompiled regular expressions. *)
>> > con t :: {Unit} (* names of captured subgroups *) -> Type
>> >
>> > val literal :
>> >     string -> t []
>> > val any : t []
>> > val concat : r1 ::: {Unit} -> r2 ::: {Unit} -> [r1 ~ r2] =>
>> >     t r1 -> t r2 -> t (r1 ++ r2)
>> > val star : r ::: {Unit} ->
>> >     t r -> t r
>> > val alt : r ::: {Unit} -> s1 ::: {Unit} -> s2 ::: {Unit} -> [r ~ s1] =>
>> > [r ~
>> > s2] => [s1 ~ s2] =>
>> >     t (r ++ s1) -> t (r ++ s2) -> t (r ++ s1 ++ s2)
>> > val capture : r ::: {Unit} ->
>> >     nm :: Name -> [r ~ [nm]] => t r -> t (r ++ [nm])
>> >
>> > (* Abstract type of compiled regular expressions *)
>> > con compiled :: {Unit} -> Type
>> >
>> > val compile : r ::: {Unit} ->
>> >     t r -> compiled r
>> >
>> > Obviously this probably won't exactly work. For instance, if the
>> > underlying
>> > library requires explicit names for captured groups, you'll need
>> > something
>> > more like
>> >
>> > val capture : r ::: {Unit} -> nm :: Name -> [r ~ [nm]] =>
>> >     {nm : string} -> t r -> t (r ++ [nm])
>> >
>> > because there's no way to extract a string from a type-level Name. In
>> > fact,
>> > this way has another benefit: the internal representation of the type
>> > can
>> > just be a string! All the tracking of subgroups can happen at the type
>> > level. You may not even need compile.
>> >
>> > This interface assumes each subgroup may appear 0 or 1 times. You could
>> > maybe keep track of the number of matches more precisely by using {Type}
>> > instead of {Unit} and using a type class to implement type-level
>> > functions.
>> >
>> > con t : {Type} (* maps names of captured subgroups to type-level
>> > representation of number of matched fields *) -> Type
>> >
>> > type one
>> > type oneOrNone
>> >
>> > class meet : Type -> Type -> Type
>> > val meet_one_oneOrNone : weaken one oneOrNone oneOrNone
>> > val meet_oneOrNone_one : weaken oneOrNone one oneOrNone
>> > val meet_oneOrNone_oneOrNone : weaken oneOrNone oneOrNone oneOrNone
>> >
>> > This makes the types more complicated, but I suspect (though I don't
>> > know
>> > for sure) that Ur/Web will be able to infer the extra stuff in any
>> > concrete
>> > (that is, monomorphic) use, so the user code will still look pretty.
>> > Here's
>> > how some of the functions look with this. The place weaken gets used is
>> > alt.
>> >
>> > val star : r ::: {Type} ->
>> >     t r -> t (map (fn _ => oneOrNo) r)
>> >
>> > val alt : r ::: {Type * Type * Type} -> s1 ::: {Type} -> s2 ::: {Type}
>> > ->
>> >     [map fst3 r ~ s1] => [map snd3 r ~ s2] => [s1 ~ s2] =>
>> >     $(map (fn (t1, t2, t3) => weaken t1 t2 t3) r)
>> >     t (map fst3 r ++ s1) -> t (map snd3 r ++ s2)
>> >     -> t (map thd3 r ++ map (fn _ => oneOrNo) (s1 ++ s2))
>> >
>> > You could extend the same sort of idea to types like oneOrMore and
>> > noneOrMore—useful if the underlying library can return a list of matches
>> > inside a + or *—by adding more cases to weaken.
>> >
>> > Here's a Haskell library that takes basically this approach:
>> > https://github.com/sweirich/dth/tree/master/popl17/src
>> >
>> > Hope this helps!
>> > Ziv
>> >
>> >
>> > On Mon, Feb 20, 2017 at 12:00 AM, Artyom Shalkhakov
>> > <artyom.shalkhakov at gmail.com> wrote:
>> >>
>> >> 2017-02-20 10:47 GMT+06:00 <fold at tuta.io>:
>> >>>
>> >>> Maybe just define another attribute in basis.urs?
>> >>>
>> >>> It would be much interesting to define a DSEL though. I have not seen
>> >>> any
>> >>> attempts to create a DSEL for regular expressions.
>> >>>
>> >>
>> >> Indeed! And it seems like it would be a nice exercise as well having
>> >> some
>> >> practical uses to it.
>> >>
>> >> I envision something along the lines of:
>> >>
>> >> datatype RegExp = ... (* constructors *)
>> >>
>> >> type t (* abstract, not exported outside the module *)
>> >> val compile : RegExp -> t (* the trusted function *)
>> >> val matches : string -> t -> bool (* check if passed-in string matches
>> >> the
>> >> expression *)
>> >>
>> >> And the implementation might use something unsafe (like a string, or an
>> >> actual regexp object in JS). Then we could offload the interpretation
>> >> of
>> >> regexps to an off-the-shelf library. The [compile] function would print
>> >> and
>> >> escape the AST to a regexp string, and the Ur/Web compiler might
>> >> actually
>> >> optimize all of this away.
>> >>
>> >> If we are to take care of matching subgroups, that would require a more
>> >> complicated type. I guess that should be doable as well, but I don't
>> >> see
>> >> immediately how to handle this.
>> >>
>> >>>
>> >>>
>> >>>
>> >>> 20. Feb 2017 16:28 by artyom.shalkhakov at gmail.com:
>> >>>
>> >>>
>> >>> Hello all,
>> >>>
>> >>> 'm doing form validation stuff, following examples on MDN. [1]
>> >>>
>> >>> It seems like Ur/Web doesn't handle regular expressions?
>> >>>
>> >>> I guess it should not be too diffcult to define a DSEL in Ur/Web (just
>> >>> AST constructors and a few trusted functions: e.g. [compile : AST ->
>> >>> regexp], where [regexp] is an abstract type, or maybe just a string
>> >>> :-)) and
>> >>> have it work at least in the browser.
>> >>>
>> >>> Could somebody point me in the right direction here?
>> >>>
>> >>> --
>> >>> Cheers,
>> >>> Artyom Shalkhakov
>> >>>
>> >>> [1]
>> >>>
>> >>> https://developer.mozilla.org/en-US/docs/Web/Guide/HTML/HTML5/Constraint_validation
>> >>>
>> >>
>> >>
>> >>
>> >> --
>> >> Cheers,
>> >> Artyom Shalkhakov
>> >>
>> >> _______________________________________________
>> >> Ur mailing list
>> >> Ur at impredicative.com
>> >> http://www.impredicative.com/cgi-bin/mailman/listinfo/ur
>> >>
>> >
>> >
>> > _______________________________________________
>> > Ur mailing list
>> > Ur at impredicative.com
>> > http://www.impredicative.com/cgi-bin/mailman/listinfo/ur
>> >
>>
>> --
>> Cheers,
>> Artyom Shalkhakov
>>
>> [1] https://github.com/bbarenblat/urweb-regex/pull/1
>>
>> _______________________________________________
>> Ur mailing list
>> Ur at impredicative.com
>> http://www.impredicative.com/cgi-bin/mailman/listinfo/ur
>
>
> _______________________________________________
> Ur mailing list
> Ur at impredicative.com
> http://www.impredicative.com/cgi-bin/mailman/listinfo/ur
>



-- 
Cheers,
Artyom Shalkhakov



More information about the Ur mailing list