Exploring Enumerators in Ruby

I’m fiddling with a bit of Ruby code for my silly Ioughta gem. In this module, the each_resolved_pair method takes a structure of “pairs” (of symbols and lambdas) and mutates them (by calling the lambda with an incrementing value), returning the resolved pairs.

Here’s my first version of the method, lightly edited for the sake of this writeup:

def each_resolved_pair(data)
    iota = 0
    data.each_slice(2).map do |nom, lam|
        val, iota = lam.call(iota), iota.succ
        next if nom == :_
        [nom, val]
    end.compact
end

It’s a little inefficient (#compact creates a copy of the structure instead of modifying it in-place1), but it gets the job done. Here’s a sample call site:

lam = ->(i) { i ** 2 }
pairs = [
    :_, lam,
    :a, lam,
    :b, lam,
    :c, lam
]

each_resolved_pair(pairs).to_h # => {:a=>1, :b=>4, :c=>9}

After coding a bit more, elsewhere in the module, I want to come back and modify this method so it can either yield to a block or return an array. Here’s my updated version:

def each_resolved_pair(data)
    iota, resolved_pairs = 0, []
    data.each_slice(2) do |nom, lam|
        val, iota = lam.call(iota), iota.succ
        next if nom == :_
        if block_given?
            yield nom, val
        else
            resolved_pairs << [nom, val]
        end
    end

    return resolved_pairs unless block_given?
end

Yuck. It’s hard to avoid building up a resolved_pairs array like this without looping twice or duplicating some of the logic. On the plus side, the inefficient #compact call is gone, as well as the #map, and it now lets me call it like this while preserving the original behavior:

each_resolved_pair(pairs) do |key, value|
    const_set(key, value)
end

Still, I think I can do better: why not have this method return some kind of Enumerable collection, and let the caller decide what to do with it, e.g. #each with a block or #to_h?

I’ve recently been playing around with Enumerators, which are similar to generators in Python. They allow a block of code to “yield” values back to the caller as needed, i.e. lazily, and in general behave just like any other Enumerable collection. (Note that the meaning of the word “yield” here is different than the typical usage a Rubyist might expect.) I’ll try wrapping my code in an Enumerator:

def each_resolved_pair(data)
    Enumerator.new do |yielder|
        iota = 0
        data.each_slice(2) do |nom, lam|
            val, iota = lam.call(iota), iota.succ
            next if nom == :_
            yielder << [nom, val]
        end
    end
end

Et voilà! It works, and I got rid of the clumsy resolved_pairs array from the previous version. Now I can do this:

each_resolved_pair(pairs).to_h

Or this:

each_resolved_pair(pairs).each do |key, value|
    const_set(key, value)
end

And the results are exactly what one might expect2.

However, there’s still something a little fishy about my method. It’s named “each…” and it returns an Enumerator… that wraps #each_slice… and I have to call #each on it to pass my block. It certainly feels like there’s at least one too many steps, doesn’t it? Isn’t #each_slice with a yielding block already… enumerating and yielding? Let’s test that theory:

def each_resolved_pair(data)
    iota = 0
    data.each_slice(2) do |nom, lam|
        val, iota = lam.call(iota), iota.succ
        next if nom == :_
        yield nom, val
    end
end

each_resolved_pair(pairs) do |key, value|
    puts "#{key.inspect} => #{value}"
end

# Output:
# :a => 1
# :b => 4
# :c => 9

Holy smokes, it works! And this is the simplest implementation yet. How about #to_h?

each_resolved_pair(pairs).to_h # in `block in each_resolved_pair':
    no block given (yield) (LocalJumpError)

Darn! I have a #yield in my method, so the block is mandatory. I’m so close I can taste it!

It looks like I need to tell Ruby to wrap my method body in a Enumerator, but only if a block is not given. How can I easily do so without going through any more violent contortions? This enumerator should call the original method with a block that yields, in the generator sense, the result of each iteration. I want to avoid complex conditional logic, and I want to avoid an additional wrapper method such as the following:

def each_resolved_pair_wrapper(data)
    Enumerator.new do |yielder|
        each_resolved_pair(data) do |pair|
            yielder << pair
        end
    end
end

def each_resolved_pair(data)
    return each_resolved_pair_wrapper(data) unless block_given?

Believe it or not, Ruby has some built-in “magic” that does exactly this: Object#enum_for (or #to_enum). It wraps your method in an Enumerator—just as the now-unnecessary each_resolved_pair_wrapper method does above—and the original method simply returns said enumerator whenever necessary (i.e. when a block is not given). Here’s the final version, at least as far this writeup is concerned:

def each_resolved_pair(data)
    return enum_for(:each_resolved_pair, data) unless block_given?

    iota = 0
    data.each_slice(2) do |nom, lam|
        val, iota = lam.call(iota), iota.succ
        next if nom == :_
        yield nom, val
    end
end

each_resolved_pair(pairs) do |key, value|
    const_set(key, value) # yay!
end

each_resolved_pair(pairs).to_h # yay!

It’s hard to believe that such an elegant and powerful solution can be so easily implemented. My application logic—such as it is—is completely unencumbered; it simply yields pairs, as the method name suggests, and Ruby makes our callers happy by taking care of the rest. Values are produced lazily and not retained in memory unless explicitly captured by the caller.

This approach is consistent with Ruby’s standard library and recommended when your method can return a collection that is prohibitively large3, or when you need your method to either return a collection or yield to a block when given. In fact, I would go so far as to say that if your method takes an enumerable and returns an enumerable, you should consider yielding to an enumerator. I hope this pattern comes in handy! I know it did for me.

Further reading:


  1. I could tweak it to use #tap(&:compact!) to address this minor inefficiency, but that’s not what I’m after today. 
  2. Astute readers might note that both of these invocations would have worked with my original implementation of the method, which returned an enumerable array. The only difference is that now the results are being generated lazily, instead of all at once and up front. Since it’s not my final solution nor really my point, I hope you’ll permit me some “poetic” license! 
  3. Enumerable#lazy may also be an option. 

Leave a Reply

Your email address will not be published. Required fields are marked *