Mar 1, 2010
3 weird errors

(addressed to the arc forum at http://arclanguage.org/item?id=11355)

My confidence in readwarp's software stack (EC2, ubuntu jaunty, mzscheme 4.1.3/4.2.4, arc 3.1) has been taking a hit in recent weeks, with several hard-to-diagnose and hard-to-reproduce bugs. I don't know what to provide code-wise, but I thought I'd post descriptions of a couple of the phenomena to see if y'all have run into any of them and have any lessons to share.

A. queues

Details: http://arclanguage.org/item?id=11347

arc.arc has the following comment near enq/deq:

  ; Despite call to atomic, once had some sign this wasn't thread-safe.
  ; Keep an eye on it.

I think I've seen this in readwarp.

arc.arc implements queues as a three-tuple: a list of elements, a pointer to the last element, and an integer length. Every now and then (several times a day) a queue in my server gets corrupted. Enq'ing in a 1-element queue sometimes seems to cause the pointer to tail to be replaced with a copy. As a result elements that were enq'd (into the tail) can't be deq'd (from the list's head).

B. DOM corruption

This one's a doozy. I've had 3 reports in the last 2 days that button text that should look like this:

has now been caught looking like this:

I couldn't believe my eyes at first, but I have screenshots and full html to confirm it.

The code for the buttons is simplicity itself, and it hasn't changed in weeks:

  (def buttons(user sname doc)
    (tag (div class "buttons")
      (button user sname doc 1 "skip" "not interesting")
      (button user sname doc 2 "next" "more like this")
      (button user sname doc 4 "love" "more from this site")
      (clear)))

(def button(user sname doc n cls tooltip) (tag:input type "button" class (+ cls " button") value tooltip onclick ..))

So each button's text is statically bound to a string literal.

C. Segfaults with regexp operations

Running the following scheme code often causes a segmentation fault (first download this file)

  (require (lib "mzlib/pregexp"))

; Repeatedly run regexp-replace over a list of words. (let ((words (call-with-input-file "ztmp.words2" (lambda(f) (read f))))) (letrec ((fn (lambda() (print "iter") (newline) (map (lambda(x) (regexp-replace (pregexp "b.*") x "")) ; You can replace b with any letter. The input doesn't ; even have any b's. words) (fn)))) (fn)))

I've isolated this test case down from my arc code. This segmentation fault happens on readwarp at least once a day, occasionally up to 6 times.


Can y'all think of any explanation for these? I'm still inclined to suspect a bug in my code, but that hypothesis is wearing thin. In spite of the PG comment, I don't think a race condition can cause these effects. But perhaps references to objects are getting mixed up, perhaps during GC. Copied in the first case, and exchanged in the second. And the regexp thing is perhaps a separate bug.

Could y'all try reproducing bugs A and C, and let me know the results? I'm especially interested in hearing from people trying them out on some VPS like slicehost, linode or EC2. Is it possible that's at the root at the problem?

* *
interests
Social software
Tools for programmers
published
Code (contributions)
Prose (shorter, favorites)
favorite insights
Programming
Social Software

Life
Making
Work
Startups

Social Dynamics
Cognition
Economics
History

Links I peddle to everyone I know
subscribe → subscribe →

© mmxi ak