[Ur] Ur/Web in production

Sat Jan 18 10:54:38 EST 2014

2014/1/18 Marc Weber <marco-oweber at gmx.de>

> Maybe also "smarter caching" could work, eg only cache or keep in memory
> the info which feeds got read, and try to store them efficiently in
> memory (eg one bit per thread) - or try to pack by storing:
> read 1 to 5, 10 to 20 or such.
>

I'm doing precisely this. Here is the set of intervals
https://github.com/bazqux/bazqux-urweb/blob/master/crawler/Lib/ReadSet.hs
and PostsRead datatype uses this ReadSet
https://github.com/bazqux/bazqux-urweb/blob/master/crawler/Gen.hs

It helps a lot to minimize writes. Other RSS readers usually have
limitation for how long articles are kept unread. Thanks to this read state
compression my reader doesn't have such limitation.

But there are still a lot of writes. Especially in feeds fetcher. It
fetches about 8M new messages every day. And performs about 11M feed
fetches. And feed fetch itself is several writes (update queue, update
posts/comments lists, update feed info). Perhaps it could work with
Postgres on single machine with SSD RAID but I chose Riak when I found that
not much relational stuff has left and wanted to scale and operate easier.

Sometimes there is a lot you can do this way - just some thoughts
> without knowing all details.
>

I'm caching a lot. All posts lists are cached in Haskell memory and
newsfeed for folders/all items is merged right inside memory (and filtered
using above ReadSet). That way it's working really fast and I don't need
any table joins or data denormalization.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.impredicative.com/pipermail/ur/attachments/20140118/00148288/attachment-0001.html>