Workers reaped with SIGABRT

Discussion:

Workers reaped with SIGABRT - how to debug?

Henrik Nyh

2014-04-15 08:00:31 UTC

We get errors like this one a few times a day:

Apr 13 12:16:31 app1 unicorn.log: E, [2014-04-13T12:16:31.302011
#17269] ERROR -- : reaped #<Process::Status: pid 17300 SIGABRT (signal
6)> worker=2

We use Unicorn 4.8.2, Ruby 2.1.1 and a Ruby on Rails app.

It doesn't seem to happen at any obvious time, like during or just
after deploys.

We were previously on Ruby 1.9.3 with Unicorn 4.8.0. Then we had
almost the same issue but with SIGIOT, I believe. Then we upgraded
Ruby to 2.1.1. I believe that's when it changed to SIGABRT. Then we
upgraded Unicorn to 4.8.2 with no improvement.

We're not sure how to debug this - any suggestions on either what the
problem could be, or how to debug it?
_______________________________________________
Unicorn mailing list - mongrel-***@rubyforge.org
http://rubyforge.org/mailman/listinfo/mongrel-unicorn
Do not quote signatures (like this one) or top post when replying

Eric Wong

2014-04-15 08:43:27 UTC

Permalink

Post by Henrik Nyh
Apr 13 12:16:31 app1 unicorn.log: E, [2014-04-13T12:16:31.302011
#17269] ERROR -- : reaped #<Process::Status: pid 17300 SIGABRT (signal
6)> worker=2
We use Unicorn 4.8.2, Ruby 2.1.1 and a Ruby on Rails app.
It doesn't seem to happen at any obvious time, like during or just
after deploys.
We were previously on Ruby 1.9.3 with Unicorn 4.8.0. Then we had
almost the same issue but with SIGIOT, I believe. Then we upgraded
Ruby to 2.1.1. I believe that's when it changed to SIGABRT. Then we
upgraded Unicorn to 4.8.2 with no improvement.
We're not sure how to debug this - any suggestions on either what the
problem could be, or how to debug it?

This is may be a bug in a C extension RubyGem or even Ruby itself.
Do you get core dumps + backtraces or any other error messages in the logs?
Which OS/distribution is this?

Since you see SIGABRT/SIGIOT and not SIGSEGV, you might be crashing
inside the SIGSEGV handler of Ruby itself.

Can you also try --enable-debug-env when you ./configure Ruby?
Thanks in advance for any more info you can provide!

(Btw, please keep everybody Cc:-ed since the mailing list is going
in that direction, and rubyforge.org isn't very reliable in its
final days. New ML announcement in a few days).
_______________________________________________
Unicorn mailing list - mongrel-***@rubyforge.org
http://rubyforge.org/mailman/listinfo/mongrel-unicorn
Do not quote signatures (like this one) or top post when replying

Henrik Nyh

2014-04-15 09:05:45 UTC

Permalink

Post by Eric Wong
This is may be a bug in a C extension RubyGem or even Ruby itself.
Do you get core dumps + backtraces or any other error messages in the logs?

You'd expect this to be in the Rails app's production.log, right? We
added that to our logging service this morning - we'll check it the
next time this error happens. Maybe we could also try to dig it up
from archived logs.

Post by Eric Wong
Which OS/distribution is this?

Ubuntu 12.04.1 LTS

Post by Eric Wong
Since you see SIGABRT/SIGIOT and not SIGSEGV, you might be crashing
inside the SIGSEGV handler of Ruby itself.
Can you also try --enable-debug-env when you ./configure Ruby?

I guess that would output more details to the Rails production.log in
the event of a crash? Will look into that.

Post by Eric Wong
Thanks in advance for any more info you can provide!

Thanks so much for the help! Much appreciated.
_______________________________________________
Unicorn mailing list - mongrel-***@rubyforge.org
http://rubyforge.org/mailman/listinfo/mongrel-unicorn
Do not quote signatures (like this one) or top post when replying

Aaron Suggs

2014-04-15 10:34:26 UTC

Permalink

Henrik, we had the same problem after upgrading to Ruby 2.1.1. My
coworker Tieg Zaharia tracked it down to this bug with
BigDecimal#to_d:

https://bugs.ruby-lang.org/issues/9657

He discusses a workaround (using BigDecimal coercion instead of #to_d).

Worked for us.

-Aaron

Post by Henrik Nyh

Post by Eric Wong
This is may be a bug in a C extension RubyGem or even Ruby itself.
Do you get core dumps + backtraces or any other error messages in the logs?

Post by Eric Wong
Which OS/distribution is this?

Ubuntu 12.04.1 LTS

Post by Eric Wong
Since you see SIGABRT/SIGIOT and not SIGSEGV, you might be crashing
inside the SIGSEGV handler of Ruby itself.
Can you also try --enable-debug-env when you ./configure Ruby?

I guess that would output more details to the Rails production.log in
the event of a crash? Will look into that.

Post by Eric Wong
Thanks in advance for any more info you can provide!

Thanks so much for the help! Much appreciated.
_______________________________________________
http://rubyforge.org/mailman/listinfo/mongrel-unicorn
Do not quote signatures (like this one) or top post when replying

_______________________________________________
Unicorn mailing list - mongrel-***@rubyforge.org
http://rubyforge.org/mailman/listinfo/mongrel-unicorn
Do not quote signatures (like this one) or top post when replying

Henrik Nyh

2014-04-16 07:08:06 UTC

Permalink

Post by Aaron Suggs
Henrik, we had the same problem after upgrading to Ruby 2.1.1. My
coworker Tieg Zaharia tracked it down to this bug with
https://bugs.ruby-lang.org/issues/9657
He discusses a workaround (using BigDecimal coercion instead of #to_d).
Worked for us.

Thank you! We'll look into that. We've definitely seen that error in CI.

I couldn't find any segfaults in our production logs by the way, but
I'm still not sure exactly where they would go. Have looked in our
production.log (at the default prod log level) and the unicorn.log.

(Sidenote: I think you have to CC everyone, not just reply to the
list, or it only goes to the web archives.)
_______________________________________________
Unicorn mailing list - mongrel-***@rubyforge.org
http://rubyforge.org/mailman/listinfo/mongrel-unicorn
Do not quote signatures (like this one) or top post when replying

Henrik Nyh

2014-04-28 09:25:19 UTC

Permalink

I think we have solved this issue.

It was simply that monit did a "kill -6" (SIBABRT) when the process
used too much memory, so we bumped that limit for now. D'oh. We've yet
to research why it used that much memory.

__
http://bogomips.org/unicorn-public/ - unicorn-***@bogomips.org
please quote as little as necessary when replying

Eric Wong

2014-04-28 10:41:26 UTC

Permalink

Post by Henrik Nyh
It was simply that monit did a "kill -6" (SIBABRT) when the process
used too much memory, so we bumped that limit for now. D'oh. We've yet
to research why it used that much memory.

Thanks for the followup. Unfortunately the RGenGC in Ruby 2.1.x uses
more memory than 2.0 did (but GC is faster :).

Sam wrote an article about it here:
http://samsaffron.com/archive/2014/04/08/ruby-2-1-garbage-collection-ready-for-production

We (ruby-core devs) will try to reduce memory for 2.2 without
performance regressions.
__
http://bogomips.org/unicorn-public/ - unicorn-***@bogomips.org
please quote as little as necessary when replying