Ruby on Rails
HowToUsePolymorphicColumns

The problem

One of the great things about Ruby is that it is truely polymorphic. One of the great things about Rails is it works so well with Ruby. So Rails should be the logical choice if you want to write a web application dealing with polymorphic data, right?

Suppose, for example, you want to have a table with two PayloadColumns, one containing a Ruby expression (text) and the other containing the results of evaluating that expression.

What you might think was a solution, and the problems with it

Rails provides a serializeActiveRecord/Base.html#M000271 directive that uses something like YAML to store objects of any type into a text field. This looks like it should be exactly what we want. But it’s not.

The decision of what objects to convert to_yaml (rather than store natively) is relegated to the connection object, while the unpacking is done in the base class. This results in a very inconsistent (and possibly SQL-flavor dependent) behaviour. Quick tests saving and restoring various objects showed:

My dream solution, and a nightmare

Trying to think out what I wanted, I made the following list of desiderata:

I also made up a table of how I’d expect various values to be stored:

“milk” “milk”
:eggs :eggs
7 7
“7” “7”
9..5 9..5

...and at some point I dimly began to realize that I was writing the same thing in both columns. This crystalized into what might (for some people) be a very elegant solution: the storage format is simply a text string which, if evaluated, would give you the object you want.

[5, “cat”, :monkey]
Student.find(45327)
Teacher.find_by_name(‘H. Knocks’)
...and so forth

This could easily be implemented by adding a “to_source” method to all objects (defaulting to something that returns “YAML::load(#{self.to_yaml.inspect})” for cases where we don’t override it with something sweeter). For decoding the data, we just have to “eval” it.

The problem with this method is that if anyone could manage to inject something funny in my database, they would now have a way to execute arbitrary Ruby code in my rails application. While I can immediately think of three reasons why this wouldn’t be a problem, none of them are compelling enough to make the little voice stop chanting “you’ll be sorry”, so I moved on.

My working solution (and the problems with it)

The problem of course isn’t with the storage format, it’s with blindly evaling things. So if we explicitly list the formats we are using, we can (for a little more work) have much more security.

There is a module called ActiveRecord::Wrapping, the header of which says it is:

A plugin framework for wrapping attribute values before they go in and unwrapping them after they go out of the database. This was intended primarily for YAML wrapping of arrays and hashes, but this behavior is now native in the Base class. So for now this framework is laying dormant until a need pops up.

It looked ideal for what I wanted; a little code-walking suggested that all I needed to do to use it was implement a wrapper class that implemented the wrap(attribute) and unwrap(attribute) functions I wanted:


require 'active_record/wrappings'
module ActiveRecord
  module Wrappings #:nodoc:
    class AccurateWrapper < AbstractWrapper #:nodoc:
        def wrap(attribute)
            case attribute
              when String     then '"'+attribute+'"'
              when Numeric    then attribute.to_s
              when ActiveRecord::Base 
                              "#{attribute.class}:#{attribute.id}" 
              else            attribute.to_yaml
              end
            end
        def unwrap(attribute)
            case attribute
              when /^\((.*)\)$/  then $1
              when /^"(.*)"$/    then $1
              when /^\d*\.\d*$/  then attribute.to_f
              when /^\d*$/       then attribute.to_i
              when /^([a-zA-Z_]+):(\d+)/ 
                     eval($1).find($2.to_i)
              else begin
                  YAML::load(attribute)
                rescue Object
                  attribute
                end
              end
            end
        end
    module ClassMethods #:nodoc:
        # Wraps the attribute in polymorphic encoding
        def polymorphic_fields(*attributes)
            wrap_with(AccurateWrapper, attributes)
            end
        end
    end
  end

Using which, I could write:

class Example < ActiveRecord::Base
    include ActiveRecord::Wrappings
    has_and_belongs_to_many :lessons
    polymorphic_fields :result
    end

Of course, it didn’t work right off the bat. It appears the Wrappings module isn’t getting updated since it isn’t getting used. So (for my version of ActiveRecord, 1.11.1) I had to fix a few things:

My patches


--- /usr/local/lib/ruby/gems/1.8/gems/activerecord-1.11.1-mqr/lib/active_record/wrappings.rb    2005-03-29 20:41:03.000000000 -0600
+++ wrappings.rb    2005-08-11 20:25:02.000000000 -0600
@@ -5,9 +5,11 @@
   module Wrappings #:nodoc:
     module ClassMethods #:nodoc:
       def wrap_with(wrapper, *attributes)
-        [ attributes ].flat.each { |attribute| wrapper.wrap(attribute) }
+        [ attributes ].flatten.each { |attribute| wrapper.wrap(attribute,binding) }
       end
     end
+      def after_find
+          end

     def self.append_features(base)
       super
@@ -16,8 +18,9 @@

     class AbstractWrapper #:nodoc:
       def self.wrap(attribute, record_binding) #:nodoc:
-        %w( before_save after_save after_initialize ).each do |callback|
+        %w( before_save after_save after_initialize after_find ).each do |callback|
           eval "#{callback} #{name}.new('#{attribute}')", record_binding
         end
       end

@@ -47,6 +52,7 @@

       alias_method :before_save, :save_wrapped_attribute #:nodoc:
       alias_method :after_save, :load_wrapped_attribute #:nodoc:
+      alias_method :after_find,  :load_wrapped_attribute #:nodoc:
       alias_method :after_initialize, :after_save #:nodoc:

       # Overwrite to implement the logic that'll take the regular attribute and wrap it.

This isn’t perfect (I need to handle dates better, and reals with an exponent, etc.) but it gives me a good base on which to build. For example, I don’t handle collections very well yet—an array of ActiveRecords?, for example, will still be passed to YAML and consequently come back as dead clones. It should be easy to see how it could be extended as needed if any of these cases matter.

It also imposes a slight overhead due to the use of the after_find callback, unnoticable in my application but possibly objectionable in a much larger application.

It has the strong advantage that the only footprint in my application is closely associated with the declaration of the polymorphic coloumns, making it easy to change to a better solution if anyone posts one.

Your solution…?

I have created a webcrawler using ruby and activerecord, and in it I store parsed webpages, since I am not really interested in the raw HTML after having parsed it.

To store and load this, I started using Yaml, and realized that what I got back was b0rken.

Then I tried Marshal, and realized that REXML sometimes uses singeltons, which breaks Marshal.

So I ended up using something like


  #
  # Before we save, let the content_serialized-field be defined 
  # by either a serialized version of @content, or if that doesnt work
  # (due to it containing a singleton - rexml does that sometimes?)
  # - save it instead as pure xml
  #
  def before_save
    begin
      self[:content_serialized] = Base64.encode64(Marshal.dump(@content))
    rescue Exception => e
      warn("error dumping #{url}: " + e.inspect + ", dumping pure XML instead")
      self[:content_serialized] = "" 
      @content.write(self[:content_serialized])
    end
  end

  #
  # After we have found a page, unserialize the @content from content_serialized
  # If it breaks, assume that we made the decision in before_save to store pure XML instead
  # and so load it as a new REXML::Document
  #
  def after_find
    begin
      @content = Marshal.load(Base64.decode64(self[:content_serialized]))
    rescue Exception => e
      warn("error loading #{url}, loading as pure XML instead")
      @content = REXML::Document.new(self[:content_serialized])
    end
  end

Perhaps this could be extended to a more generic case..

//MartinKihlgren, martin a t troja.ath.cx