[Ur] CMS like features ? unsafe XML - encodings?

Marc Weber marco-oweber at gmx.de
Wed Dec 15 13:28:07 EST 2010


Excerpts from Adam Chlipala's message of Wed Dec 15 15:35:32 +0100 2010:
> types), then simple code like this gets the job done.
Thanks

> > If we are at it: Does it make sense to encode the encoding of a string
> > somehow?

> Maybe so, but I'm woefully underinformed about encodings.  The last time 
> I looked into this, I think my conclusion was that sticking with UTF-8 
> could please everybody reasonably well.

Let me quote two lines from gians blog code:

                Body = {Nam = "Entry Body",
                          Show = (fn b => <xml>{[if strlen b > 25 then substring b 0 25 else b]}...</xml>),


I don't expect C's substring to be UTF-8 aware. In UTF-8 some bytes may
be represented by up to 4 bytes.

And for PDF files there may be a difference - because non-UTF8 fonts
shipped with the viewers in the past - so .pdf files can be smaller.
That is no longer mandatory for the future - however it still seems to
work. But that's a corner case. So for now that case is not important
enough.


Eg In Haskell you could use phantom types:

  data UTF8
  data ISOXX

  data Buffer a = Buffer  String

  let x : (Buffer UTF8) = Buffer "text"

  class ConcatStrs a b c | a,b -> c where
    concat :: Buffer a -> Buffer b -> Buffer c

  instance ConcatStrs a a a where
    -- same encoding: trivial

  By not providing an instance for "ConcatStrs UTF8 ISOXX" you disallow
  concatenating them.

Marc Weber



More information about the Ur mailing list