Eu não poderia estar mais errado

Posted 1 month, 3 weeks ago at 14:20. 4 comments

2 weeks ago, the Litmus servers were updated from RHEL4 to RHEL5. The dry run on the staging install was without incident, but soon after we upgraded the production server, we started receiving complaints about blank pages being served. A quick check of the error logs indicated that the blank pages corresponded to httpd child process segfaults:

[Sat Jun 14 13:39:18 2008] [notice] child pid 31570 exit signal Segmentation fault (11)

Normally, this issue would have landed squarely on Mozilla IT’s plate, but they were a little busy last week. As you can see in the bug, I tried just about everything to diagnose (and then fix) the segfaults. I eventually caught a coredump from one of the failures and tracked the problem to mysql_ping() as called by the DBD::mysql perl module:

#0 0x057d8e5b in mysql_ping () from /usr/lib/mysql/libmysqlclient.so.15
#1 0×0082a8c1 in XS_DBD__mysql__db_ping (my_perl=0×8504ba0, cv=0×8a3bff0) at mysql.xs:516

The full trace is in the bug.

That gave me a little more to work with. After some searching, I eventually found this article in Portuguese that Google was able to trans-massacre-late just enough for it to implicate the most recent version of the DBD::mysql module (4.007). This is the version that ships with RHEL5 (or can be installed via yum install at any rate). It works just fine with a local database like in our staging setup, but repeatedly segfaults for us when connecting to our remote database pool in production.

As suggested in the Portuguese article, downgrading the version of DBD::mysql to 4.006 fixed the problem. We haven’t had another segfault since. My hope is that this will help the next non-Portuguese person that goes looking for this information.

Current Tunes: Richard Durand and Paul Van Dyk - Essential Mix - 2008-06-14 | Filed under Litmus, Mozilla, Software |

4 Replies

  1. As if I don’t have enough trouble understanding this stuff, now you’re doing it in Portuguese????

    Sounds like you’ve been busy!

  2. Hi, thanks for your mention of my article. I am short on time to have a blog in English, I’m sorry you had to Google transmassacralate it.

    As of today there have been 2 more confirmations of this bug both on FreeBSD and Solaris 10 and the bug is still open at MySQL (http://bugs.mysql.com/bug.php?id=36810&thanks=3&notify=67)

    Best wishes from down under,
    Jose Fonseca

  3. @Ze: oh, no worries. I’m just glad I *did* find your article and didn’t dismiss it outright because it wasn’t in English! Transmassacrelate or no, it got me far enough to fix the problem, and that’s what I care about. Cheers.

  4. Hi Coop, I’m certainly happy the portuguese wasn’t a barrier, it was indeed cool that you translated it. Glad you got the problem solved in the end. Cheers!


Leave a Reply