This describes how to do worker threads in Rails and WebRick. Note—it does not describe how to do multiple Rails worker threads—which are unavailable until 2.2—but how to have a rails thread running and still process other requests simultaneously, which requests are NOT rails requests [like those for static content, etc.]
First some background: WebRick is a single threaded server. If a request from a browser comes in for dynamic content (e.g. a …rhtml page) there is a single Mutex lock that gets acquired and held while the request is being processed. This makes the server code much simpler and makes database access single threaded and easy.
Unfortunately there is a bug in WebRick where the Mutex is acquired incorrectly. If a dynamic page is being processed and a static content request (e.g. some image off disk) gets processed, the Mutex can get released. Then another dynamic content request can go ahead and chaos ensues (ouch). This often manifests as a lost connection to mysql server during query message.
There is a bugfix at bug 3433
with a patch. It gets the begin…ensure…end block correct. This fix should be in the next release of Rails (or WebRick?) (writing this entry on Feb 5 2006)
So lets assume you fixed that.
Now if you simply start a new thread
Thread.new do while true ...some code here, that accesses the database... ...Calculate PI to N more digits... ...solve world hunger (tell no one)... sleep(0.5) end end
When the database gets accessed it may/will conflict with some request that accesses the database. What happens is the database connection gets used for both threads and you get the infamous lost connection to mysql server during query bug. (ouch again)
Database bug described here
So, in the file ruby/gems/1.8/gems/rails-0.14.3/lib/webrick_server.rb
there is a Mutex lock called REQUEST_MUTEX that is used to single thread the requests.
In your worker thread you need to acquire this lock and later release it so you can access the database and in general not trash up the server.
Thread.new do while true begin DispatchServlet::REQUEST_MUTEX.lock unless ActionController::Base.allow_concurrency ...some thread code here, that accesses the database... ...Calculate PI to N more digits... ...solve world hunger (tell no one)... ensure unless ActionController::Base.allow_concurrency DispatchServlet::REQUEST_MUTEX.unlock if DispatchServlet::REQUEST_MUTEX.locked? end sleep(0.5) end end
You might be able to simplify the above code or make it more object oriented. (Ok, I’m lazy, so shoot me.) Also in my situation I don’t need all that unless ActionController::Base.allow_concurrency since I know I am not trying to run in the multi thread mode. (I think multithread mode is for servers that don’t use the database at all and can gain from processing multiple requests at once.)
In my application I have to run lots of background simulation threads and this works.
Note on Thread Liveness
Threads suck in Ruby. They are not native threads but are programatic threads in the ruby engine. If you have any thread that tries to open a socket, and the socket destination is not there, it will take a while for the socket to time out. During this wait, the whole ruby engine is frozen.
So only open sockets in Rails if you a fairly sure the destination will accept the connection immediately.
I had this problem and solved it by running the socket code (telnet) in a completely seperate ruby instance and stuffing the data in the database.
Bug?
This turns out to be the only, most relevant post that I can find either on-line or in rails/ruby books so far. I run into an annoying/tricky issue with threading in WEBrick as this issue doesn’t occur when testing under script/console. When I spawn off a thread in an ActiveRecord’s instance method that calls Gmailer.connect(username, password) (http://rubyforge.org/projects/gmailutils), which makes an NET::HTTP connection to verify the credential in that method, methods of objects in the thread block disappear immediately after the call! Now isn’t this wackily strange and alarming?
Have you seen anything like this or any insight to why this may be?
BUG
I’m having the same issue. A whole bunch of my ActiveRecord subclasses are losing methods when I fork off a thread…
Not a BUG
I had the same problem with methods going missing after spawning a worker thread from a controller action.
I found the problem stemmed from the way the Rails dispatcher destroys ActiveRecord::Base subclasses after performing the requested action (see Dispatcher.reset_after_dispatch).
This behaviour is driven by the configuration option config.cache_classes in the environment specific configuration file like development.rb for example.
Here is a description of what happened in my case:
Just a second later and all would have been fine since the model class would no longer exist and Ruby would be forced to recreate the class fresh with all it’s custom methods. That explains why this behaviour can be intermittent. A spawned threads has to be accessing the model class at the exact moment that the methods are being removed.
To me the easiest way to overcome the problem is to cache the classes like Rails does by default in production.
Obviously the problem with this approach is that code changes are not recognised until the server is restarted.
I have only been suffering this problem for a day so far so I have not really thought about a better workaround.
Corrections/Tips from SPO
I believe a number of the comments made above are wrong:
1) WEBRick is not single-threaded. If WEBRick were single-thread, it would not use a Mutex. It also wouldn’t be possible to get that infamous database error without having multiple threads.
2) Rails is single-threaded by default. While it is possible to set the allow_concurrency flag, Rails seems to have a lot of problems when you do. A safer alternative is to override DispatchServlet::service() (in webrick_server.rb). The code sample below shows how I can hijack URI’s starting with “/thread/” and pass the requests to my own class. ThreadHandler::process() is given a socket and a CGI object, and it must use normal Ruby code to accomplish its task (not Rails). If your task needs a database connection, it must create one (see warning in #3 below). As a side note, it just as easy to do this with Mongrel.
require "webrick_server" # Rails loads this later, but I need it now
$stdout.puts "Overriding WEBrick's DispatchServlet"
class WEBrick::HTTPRequest
attr_reader :socket
end
class DispatchServlet
alias_method :origService, :service
def service(request, response)
if request.request_uri.path =~ /^\/thread\//
ThreadHandler.process(request.socket, CGI.new("query", create_env_table(request, nil), StringIO.new(request.body || "")))
request.socket.close # Is there a cleaner way to do this?
else
origService(request, response)
end
end
end
3) While Ruby threads do suck, network connection attempts do not lock all threads. Ruby makes basic I/O calls in a non-blocking way to avoid locking all threads while waiting. Unfortunately, anything more complex like a database call usually will lock all Ruby threads (it should not lock if it’s a pure Ruby driver that uses simple networking calls to talk to the database). This means if you want to use the extra thread to run a long database query, you’d better think about launching it in a separate process.