[Ur] An issue with Cyrillic characters

Artyom Shalkhakov artyom.shalkhakov at gmail.com
Fri Jul 5 00:46:53 EDT 2013


Hello list,

I'm trying to persist some strings with Cyrillic characters in them
into a Postgres 9.1 database. Here's my program:

table entry : {Id : int, Title: string}
  PRIMARY KEY Id
sequence entryS

fun new_handle r =
  id <- nextval entryS;
  dml (INSERT INTO entry (Id, Title) VALUES ({[id]}, {[r.Title]}));
  return <xml><body><p>OK</p></body></xml>

fun main (): transaction page =
  return <xml><body>
  <form>
    Title: <textbox {#Title}/>
    <submit action={new_handle}/>
  </form>
</body></xml>

When I submit "текст" to Ur/Web, I get an error along these lines:

Fatal error: /home/user/proj/simple.ur:7:2-10:2: DML failed:
INSERT INTO uw_Simple_entry (uw_Id, uw_Title) VALUES (20::int8,
E'\377\377\377\377\377\377\377\377'::text)
ERROR:  invalid byte sequence for encoding "UTF8": 0xff

I've prepared a patch (attached; it is made against the tip revision).
The behaviour of sprintf/printf for characters with high bit set is
unexpected on my system, for instance, the following program:

#include <stdio.h>
#include <stdlib.h>

int main(int argc, char** argv) {
  char c = (char)255;

  printf("%03o\n", c);

  return 0;
}

prints "37777777777". If [c] is cast to [unsigned char], then the
program prints "377" (as expected). I'm wondering if this has to do
with locale? FYI, on my system, LANG is set to en_US.UTF-8.

--
Cheers,
Artyom Shalkhakov
-------------- next part --------------
A non-text attachment was scrubbed...
Name: tip.patch
Type: application/octet-stream
Size: 570 bytes
Desc: not available
URL: <http://www.impredicative.com/pipermail/ur/attachments/20130705/30864730/attachment.obj>


More information about the Ur mailing list