Refactoring SQL Strings

February 27th, 2008

Our apps have a lot of custom sql. A lot of the time it’s easier just to write exactly the sql we want instead of messing around with ActiveRecord. But any time two programming languages run in to each other things can get out of hand, so there are lots of opportunities for refactoring.

Before:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
    NUM_STARTED_OR_COMPLETED_ITEMS_DURING_RANGE = <<-SQL.prettify_sql
        SELECT count(*) FROM listed_items 
        WHERE active = 1 
        AND person_id = ?
        AND ( (completed_on IS NULL 
        AND complete = 0 
        AND give_up = 0 
        AND posted_date > ? 
        AND posted_date < ?)
        OR (completed_on > FROM_UNIXTIME(?) 
        AND completed_on < FROM_UNIXTIME(?) ) )
    SQL

    STARTED_OR_COMPLETED_ITEMS_DURING_RANGE = <<-SQL.prettify_sql
         SELECT * FROM listed_items
         WHERE active = 1
         AND person_id = ?
         AND ( (completed_on IS NULL
         AND complete = 0
         AND give_up = 0 
         AND posted_date > ?
         AND posted_date < ?)
         OR (completed_on > FROM_UNIXTIME(?)
         AND completed_on < FROM_UNIXTIME(?) ) )
         ORDER BY updated_date DESC
         LIMIT 0, 100
    SQL

    def num_completed_or_started_items_during_range(start_date, stop_date)
        ListedItem.count_by_sql [NUM_STARTED_OR_COMPLETED_ITEMS_DURING_RANGE, self.id, start_date, stop_date, start_date, stop_date]
    end

    def completed_or_started_items_during_range(start_date, stop_date, offset=0, limit=20)
        items = ListedItem.find_by_sql [STARTED_OR_COMPLETED_ITEMS_DURING_RANGE, self.id, start_date, stop_date, start_date, stop_date]    
        item_to_sort = Hash.new
        items.each do |item| 
            item_to_sort[item.id] = item.completed_on ? Time.at(item.completed_on).to_i : item.posted_date
        end
        items.sort{|a,b| item_to_sort[a.id] <=> item_to_sort[b.id]}.reverse[offset .. (offset + limit - 1)]
    end

What’s wrong with this?

First of all, there are no tests. So first step is to write some. If I were just refactoring, I’d write tests that pass, but the reason I’m in this code is that there’s a bug (long story short, the bug is caused by the fact that a listed_item can have a non-null completed_on but have completed = 0). So I write a failing one that catches the bug too.

The STARTED_OR_COMPLETED_ITEMS_DURING_RANGE and NUM_STARTED_OR_COMPLETED_ITEMS_DURING_RANGE string constants are almost identical. So when I start changing the strings I realize that I’m going to mess things up if I leave them duplicated like that (even if I do the right thing, change both strings, it’ll be wrong for the moment where I’m moving the cursor 10 lines down), so I stop and factor out the where clause.

The next thing is that I don’t like the way the parameters for the SQL string work. Parameters are repeated: if you look at completed_or_started_items_during_range, the start_date and stop_date parameters are both in there twice. Also, it’s annoying and error prone to keep paging between the string and the methods (I pasted the excerpt together, so it doesn’t look like it here, but this file is arranged so all of the constants are together at the top of the class definition, so there are several pages separating these constants and methods) to remember what the parameters mean. Rails lets me use named bind variables so I can give them names like a real programming language.

Ok, now that I have > :start and < :end instead of > ? and < ?, I’m more comfortable changing that to use the SQL between operator, and it looks a lot nicer (and one line shorter). I realize that between doesn’t quite have the semantics of < and >, but that’s fine with me in this case.

It’s a good thing it’s shorter because I’m going to take a break from refactoring and fix the bug, which involves adding a line. Ok, finally the tests pass.

We’re doing sorting and limiting with ruby. Sometimes it’s faster to do the sorting in ruby because the database is doing something dumb, so I’ll have to keep track of the timings. And sometimes it looks nicer in Ruby if it’s complicated code. But in this case the Ruby code is 5 lines of really dense code (use of the ternary operator and the compact {} form of blocks makes it look like you’re trying to hide something). After staring at it for a while, it’s just doing a coalesce and a normal limit with offset, so doing it in SQL looks nicer to me.

After:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
    WHERE_STARTED_OR_COMPLETED_ITEMS_DURING_RANGE = <<-SQL
        WHERE active = 1 
        AND person_id = :person_id
        AND ( (completed_on IS NULL 
               AND complete = 0 
               AND give_up = 0 
               AND posted_date BETWEEN :start AND :end)
           OR (completed_on BETWEEN FROM_UNIXTIME(:start) AND FROM_UNIXTIME(:end)
               AND complete = 1 ) )
    SQL

    NUM_STARTED_OR_COMPLETED_ITEMS_DURING_RANGE = <<-SQL.prettify_sql
        SELECT count(*) FROM listed_items
        #{WHERE_STARTED_OR_COMPLETED_ITEMS_DURING_RANGE}
    SQL

    STARTED_OR_COMPLETED_ITEMS_DURING_RANGE = <<-SQL.prettify_sql
        SELECT * FROM listed_items
        #{WHERE_STARTED_OR_COMPLETED_ITEMS_DURING_RANGE}
        ORDER BY COALESCE(completed_on, FROM_UNIXTIME(posted_date)) DESC
        LIMIT :offset, :limit
    SQL

    def num_completed_or_started_items_during_range(start_date, stop_date)
        ListedItem.count_by_sql [NUM_STARTED_OR_COMPLETED_ITEMS_DURING_RANGE,
                                 {:person_id => self.id,
                                     :start => start_date,
                                     :end => stop_date}]
    end

    def completed_or_started_items_during_range(start_date, stop_date, offset=0, limit=20)
        ListedItem.find_by_sql [STARTED_OR_COMPLETED_ITEMS_DURING_RANGE,
                                {:person_id => self.id,
                                    :start => start_date,
                                    :end => stop_date,
                                    :offset => offset,
                                    :limit => limit}]
    end

Pop quiz: how do you get the month (as a name, not a number) of a Time in Ruby (without looking at the Time#strftime documentation)?

All I remembered was that it was some meaningless looking format string. Here’s my first guess:
1
2
irb(main):001:0> Time.now.strftime("%a")
=> "Tue"
Oops, it’s actually:
1
2
irb(main):002:0> Time.now.strftime("%b")
=> "Feb"

That doesn’t look very ruby-like. Aren’t % format strings for C programmers (whoever wrote the code I’m currently refactoring is probably a recovering C programmer. They even use sprintf…)?

How about:
1
2
 irb(main):003:0> Time.now.short_month_name
=> "Feb"

I’m sure others have done this already, but I couldn’t find it so I made my own Rails plugin, and stuck it here: http://svn.laurelfan.com/decorated_time/, mostly to see if I could set up an svn repository on dreamhost. The stuff that actually does the work is a total of about 4 lines of code since Ruby nicely lets me reopen the Time class at will.

(speaking of the day of week, I noticed that 1.9 added monday?, tuesday?, etc methods)