Upcoming and OnDemand Webinars View full list

“You’re Killin’ Me Smalls!”: Are Small Objects Good?

Josh Justice

In my last post, we dove into the question of whether small methods are really a good thing. We looked at some code that fellow nerd Nathan had rewritten to consist of only a few long but clear methods. We followed a few blocks of code I extracted into smaller methods, and we found that method extractions could be good or bad depending on whether they formed a good abstraction.

But method extractions weren’t the only refactoring I performed. I also extracted a few types — specifically, two modules and a class. How did these extractions turn out? One criterion Nathan proposed for evaluating code is that “context switches should be few in number and short in distance.” Switching between types is a much greater “distance” than switching between methods in the same type. Usually that switch requires navigating to another file, and even in the rare case that the types are in the same file, it requires moving to a more distant part of the module hierarchy. When is extracting a type worth this extra effort to navigate?

Rending the Renderer

Let’s start back at the beginning, the primary method of the BenchReport module:

  def self.build(report_date_range)
    iteration_date_ranges = calculate_iteration_date_ranges(report_date_range)

    users = User.active.preload(:services)
    user_schedules = user_schedules_for(users, period: report_date_range)
    service_availabilities = calculate_service_availabilities(users, user_schedules, iteration_date_ranges)

    create_json(iteration_date_ranges, service_availabilities)
  end

As I was reading through the methods that build calls, the first one that seemed like it might belong in a separate module was create_json. The first four lines of build all relate to setting up data that will be used for the report. The create_json line is different: it renders the report instead.

There’s also a difference in its implementation. Here it is:

  def self.create_json(iteration_date_ranges, service_availabilities)
    {
      report: {
        iteration_entries: iteration_date_ranges.map { |iteration_date_range|
          {
            start_date: iteration_date_range.start_date,
            end_date: iteration_date_range.end_date,
            service_entries: service_availabilities.keys.sort.map {|service_name|
              availabilities = service_availabilities[service_name]
              iteration_vals = availabilities.fetch(iteration_date_range){ blank_avail_val }
              user_availabilities = iteration_vals.fetch(:availabilities)
              user_ids = iteration_vals.fetch(:user_ids).sort
              {
                service: {name: service_name},
                user_availability: user_availabilities.reduce(0, &:+),
                user_ids: user_ids,
                calculation: calculations_from(user_availabilities),
                links: {
                  calendar: capacity_interval_calendar_path(user_ids: user_ids, start_date: iteration_date_range.start_date, end_date: iteration_date_range.end_date)
                },
              }
            }
          }
        },
      }
    }
  end

Of the three methods create_json calls, two of them are called only by create_json. So there are really two groups of methods in the module: the data generation methods, and the rendering methods. This would suggest that rendering is a separate responsibility that could be extracted.

We can easily perform this extraction by moving the create_json method (and the methods it calls) into a new Renderer module:

module BenchReport
  module Renderer
    def self.create_json(iteration_date_ranges, service_availabilities)
      {
        report: {
          iteration_entries: iteration_date_ranges.map { |iteration_date_range|
            {
              start_date: iteration_date_range.start_date,
              end_date: iteration_date_range.end_date,
              service_entries: service_availabilities.keys.sort.map {|service_name|
                availabilities = service_availabilities[service_name]
                iteration_vals = availabilities.fetch(iteration_date_range){ blank_avail_val }
                user_availabilities = iteration_vals.fetch(:availabilities)
                user_ids = iteration_vals.fetch(:user_ids).sort
                {
                  service: {name: service_name},
                  user_availability: user_availabilities.reduce(0, &:+),
                  user_ids: user_ids,
                  calculation: calculations_from(user_availabilities),
                  links: {
                    calendar: capacity_interval_calendar_path(user_ids: user_ids, start_date: iteration_date_range.start_date, end_date: iteration_date_range.end_date)
                  },
                }
              }
            }
          },
        }
      }
    end

    # ...called methods trimmed...
  end
end

Calling create_json at its new location requires only a trivial change in the build method:

   def self.build(report_date_range)
     iteration_date_ranges = calculate_iteration_date_ranges(report_date_range)

     users = User.active.preload(:services)
     user_schedules = user_schedules_for(users, period: report_date_range)
     service_availabilities = calculate_service_availabilities(users, user_schedules, iteration_date_ranges)

-    create_json(iteration_date_ranges, service_availabilities)
+    Renderer.create_json(iteration_date_ranges, service_availabilities)
   end

One Good Module Deserves Another

I liked how the Renderer extraction isolated responsibilites. Nathan agreed, but seeing a call to another module in the middle of the build method made him wonder if he needed to look through the rest of the module for other calls to external modules.

As we talked through it, we realized that the build method (and the BenchReport as a whole) was really operating at two different levels of abstraction. It performed several low-level operations to gather data, but then it made a high-level call to Renderer to render it. To get the levels of abstraction equal, we decided to extract the rest of the BenchReport module into a new module as well, leaving only a slimmed-down build method in BenchReport:

 module BenchReport
   def self.build(report_date_range)
-    iteration_date_ranges = calculate_iteration_date_ranges(report_date_range)
-
-    users = User.active.preload(:services)
-    user_schedules = user_schedules_for(users, period: report_date_range)
-    service_availabilities = calculate_service_availabilities(users, user_schedules, iteration_date_ranges)
+    iteration_date_ranges, service_availabilities = Calculator.calculate(report_date_range)
     Renderer.create_json(iteration_date_ranges, service_availabilities)
   end
-
-  def self.calculate_iteration_date_ranges(report_date_range)
-  # and the rest of the methods
end

Here’s the main method of the new Calculator module. Note that whereas the last line of build was a call to Renderer, we now create an array to return instead:

module BenchReport
  module Calculator
    MINIMUM_DISPLAYABLE_AVAILABILITY = 50

    def self.calculate(report_date_range)
      iteration_date_ranges = calculate_iteration_date_ranges(report_date_range)

      users = User.active.preload(:services)
      user_schedules = user_schedules_for(users, period: report_date_range)
      service_availabilities = calculate_service_availabilities(users, user_schedules, iteration_date_ranges)

      [iteration_date_ranges, service_availabilities]
    end

    # ...
  end
end

With this change, all BenchReport does is delegate to two other modules at the same level of abstraction: Calculator to generate the data and Renderer to render it. This seems quite small, but Nathan and I both like this separation of responsibilities.

Hide or Seek?

In addition to these two module extractions, I also extracted a class related to the calculate_service_availabilities method we focused on in my last post. Here’s that method again (with a few changes from last time due to unrelated minor refactorings):

  def self.calculate_service_availabilities(users, user_schedules, iteration_date_ranges)
    service_availabilities = {}
    users.each do |user|
      service = primary_service_name(user)
      next if primary_service.nil?
      iteration_date_ranges.each do |iteration_date_range|
        user_availability = user_availability(user, user_schedules, iteration_date_range)
        if displayable?(user_availability)
          service_availabilities[primary_service] ||= {}
          service_availabilities[primary_service][iteration_date_range] ||= blank_avail_val
          service_availabilities[primary_service][iteration_date_range][:availabilities] << user_availability / 100.0
          service_availabilities[primary_service][iteration_date_range][:user_ids] << user.id
        end
      end
    end
    service_availabilities
  end

If you recall, the nested loops and conditionals bothered me, and I tried to extract some methods, but it didn’t work out well. What else could I try?

Well, other than the loops and conditionals, this method’s other responsibility is to initialize and populate the nested hash structure. When I realized that, it occurred to me that this behavior could be moved into an object that wrapped the hash. This would allow us to express what we’re accomplishing (adding a service availability) without exposing how we’re doing it (building up a particular hash structure).

I decided to name this class ServiceAvailabilities. Here it is:

module BenchReport
  class ServiceAvailabilities
    MINIMUM_DISPLAYABLE_AVAILABILITY = 50

    def initialize
      @service_availabilities = {}
    end

    def add(user, availability, service, iteration_date_range)
      return unless displayable?(availability)
      service_availabilities[service] ||= {}
      service_availabilities[service][iteration_date_range] ||= blank_avail_val
      service_availabilities[service][iteration_date_range][:availabilities] << availability / 100.0
      service_availabilities[service][iteration_date_range][:user_ids] << user.id
    end

    def to_h
      service_availabilities
    end

    private

    attr_reader :service_availabilities

    def displayable?(user_availability)
      user_availability >= MINIMUM_DISPLAYABLE_AVAILABILITY
    end

    def blank_avail_val
      {availabilities: [], user_ids: []}
    end
  end
end

Note all the knowledge we were able to move out of Calculator:

  • The minimum amount of availability we include in report
  • The fact that the data is stored in a three-level nested hash
  • The keys for the hash
  • The default values for hash items
  • The format of fields to store

That’s a lot of knowledge that calculate_service_availabilities doesn’t need to have anymore. Here are the changes to that method:

   def self.calculate_service_availabilities(users, user_schedules, iteration_date_ranges)
-    service_availabilities = {}
+    service_availabilities = ServiceAvailabilities.new
     users.each do |user|
       service = primary_service_name(user)
       next if primary_service.nil?
       iteration_date_ranges.each do |iteration_date_range|
         user_availability = user_availability(user, user_schedules, iteration_date_range)
-        if displayable?(user_availability)
-          service_availabilities[primary_service] ||= {}
-          service_availabilities[primary_service][iteration_date_range] ||= blank_avail_val
-          service_availabilities[primary_service][iteration_date_range][:availabilities] << user_availability / 100.0
-          service_availabilities[primary_service][iteration_date_range][:user_ids] << user.id
-        end
+        service_availabilities.add(user, user_availability, service, iteration_date_range)
       end
     end
-    service_availabilities
+    service_availabilities.to_h
   end

That’s pretty small; does calculate_service_availabilities still know enough to pull its weight? It knows:

  • How to retrieve all combinations of users and iterations
  • How to retrieve a user’s primary service and availability for an iteration
  • When to skip a user
  • What message to send to a ServiceAvailabilities object to store this data

So there’s still a good amount here. We’ve split out about half of its responsibilities into another object.

(Note: the fact that the ServiceAvailabilities abstraction is thrown away after this method, rather than continuing to be used during rendering, is a weakness. If we had decided to keep this refactoring, that’s something we would want to explore further.)

The Disagreement

I really liked this extraction, but Nathan didn’t prefer it. Part of this is probably because Nathan was the only one of us who experienced the previous version of the code, where back-and-forth message passing was genuinely difficult to follow. But I think there are some more interesting reasons for our differences too.

Nathan has been experimenting with stateless functions and bare data structures, and he saw this BenchReport as a great fit for that approach. To Nathan, building up a hash is the whole purpose of Calculator, which is then rendered as JSON and used to draw a graph. The core of what he liked about his initial design was that the hash structure was so visible in the calculate_service_availabilities method. He wanted to minimize the context switches required to understand the makeup of the data structure and the steps that populated it with data.

I can understand that view, but lately I’ve been doing an experiment of my own, influenced by authors like Kent Beck, Rebecca Wirfs-Brock, and Uncle Bob Martin. They argue for the advantages of smaller objects that know less than the procedure but more than the behaviorless data constructs. These advantages include hiding implementation, separating out responsibilities, and facilitating reuse and recombination. This emphasis on proactively creating small objects comes from the early, idealistic days of object-oriented design, when the benefits of small objects were more well-understood than the costs. These days, a more moderate recommendation (from authors like Sandi Metz) is to leave code that doesn’t change procedural, but when requirements come in that force you to change it, apply OO design principles to separate out the parts that change for different reasons (the responsibilities).

So, in the end, I look at ServiceAvailabilities as an experiment in what a smarter data structure could look like — an experiment that shouldn’t go to production yet. Even so, I think Nathan and I have different thresholds for deciding when the benefit of extracting ServiceAvailabilities outweighs the cost…and that’s okay! Refactoring means that whatever decision we make now, we can always reverse it if necessary when we have more information. Plus, all our conversations about this refactoring have helped me understand class extractions much more deeply, and that makes it all worthwhile.

Counting the Cost

So, when is it worth it to extract a class or module? The thought process is similar to deciding when to extract a method: extract when you have a good abstraction, one that hides details you don’t need to consider at that point in the code. However, because the cost of context switching between types is so much higher than between methods, the value you get from abstracting away those details also needs to be greater to be worth it. The exact tipping point between the cost and the benefit depends on your wiring and that of your team.

Does your code have long procedures that might make more sense if details were split out? Or are you chasing after details in multiple files that might be better inlined? Do you have behaviorless data constructs that could benefit from adding some intelligence? Or are your classes just hoops to jump through to get to your data? What would it look like to switch your approach on any of the above? Give it a try, see how you feel about it, and then discuss it with someone on your team. That’s the heart of refactoring.

Not Happy with Your Current App, or Digital Product?

Submit your event

Let's Discuss Your Project

Let's Discuss Your Project