Bugs and Features

April 21st, 2008

Too bad this only works in unit tests:

Playing with graphs

April 7th, 2008

I’ve always liked playing with graphs and representing data, so when we we decided to implement a version of Buster’s Morale-O-Meter for the new 43 Things profile page

I started with gruff, but I wanted something a little more stylized and less techy looking, so I took out all of the axes and hacked it to use numbers as the points. I kind of like how it turned out:

(This isn’t the final version, and the feature isn’t live on 43 Things yet, but it will be “soon”...)

Refactoring SQL Strings

February 27th, 2008

Our apps have a lot of custom sql. A lot of the time it’s easier just to write exactly the sql we want instead of messing around with ActiveRecord. But any time two programming languages run in to each other things can get out of hand, so there are lots of opportunities for refactoring.

Before:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
    NUM_STARTED_OR_COMPLETED_ITEMS_DURING_RANGE = <<-SQL.prettify_sql
        SELECT count(*) FROM listed_items 
        WHERE active = 1 
        AND person_id = ?
        AND ( (completed_on IS NULL 
        AND complete = 0 
        AND give_up = 0 
        AND posted_date > ? 
        AND posted_date < ?)
        OR (completed_on > FROM_UNIXTIME(?) 
        AND completed_on < FROM_UNIXTIME(?) ) )
    SQL

    STARTED_OR_COMPLETED_ITEMS_DURING_RANGE = <<-SQL.prettify_sql
         SELECT * FROM listed_items
         WHERE active = 1
         AND person_id = ?
         AND ( (completed_on IS NULL
         AND complete = 0
         AND give_up = 0 
         AND posted_date > ?
         AND posted_date < ?)
         OR (completed_on > FROM_UNIXTIME(?)
         AND completed_on < FROM_UNIXTIME(?) ) )
         ORDER BY updated_date DESC
         LIMIT 0, 100
    SQL

    def num_completed_or_started_items_during_range(start_date, stop_date)
        ListedItem.count_by_sql [NUM_STARTED_OR_COMPLETED_ITEMS_DURING_RANGE, self.id, start_date, stop_date, start_date, stop_date]
    end

    def completed_or_started_items_during_range(start_date, stop_date, offset=0, limit=20)
        items = ListedItem.find_by_sql [STARTED_OR_COMPLETED_ITEMS_DURING_RANGE, self.id, start_date, stop_date, start_date, stop_date]    
        item_to_sort = Hash.new
        items.each do |item| 
            item_to_sort[item.id] = item.completed_on ? Time.at(item.completed_on).to_i : item.posted_date
        end
        items.sort{|a,b| item_to_sort[a.id] <=> item_to_sort[b.id]}.reverse[offset .. (offset + limit - 1)]
    end

What’s wrong with this?

First of all, there are no tests. So first step is to write some. If I were just refactoring, I’d write tests that pass, but the reason I’m in this code is that there’s a bug (long story short, the bug is caused by the fact that a listed_item can have a non-null completed_on but have completed = 0). So I write a failing one that catches the bug too.

The STARTED_OR_COMPLETED_ITEMS_DURING_RANGE and NUM_STARTED_OR_COMPLETED_ITEMS_DURING_RANGE string constants are almost identical. So when I start changing the strings I realize that I’m going to mess things up if I leave them duplicated like that (even if I do the right thing, change both strings, it’ll be wrong for the moment where I’m moving the cursor 10 lines down), so I stop and factor out the where clause.

The next thing is that I don’t like the way the parameters for the SQL string work. Parameters are repeated: if you look at completed_or_started_items_during_range, the start_date and stop_date parameters are both in there twice. Also, it’s annoying and error prone to keep paging between the string and the methods (I pasted the excerpt together, so it doesn’t look like it here, but this file is arranged so all of the constants are together at the top of the class definition, so there are several pages separating these constants and methods) to remember what the parameters mean. Rails lets me use named bind variables so I can give them names like a real programming language.

Ok, now that I have > :start and < :end instead of > ? and < ?, I’m more comfortable changing that to use the SQL between operator, and it looks a lot nicer (and one line shorter). I realize that between doesn’t quite have the semantics of < and >, but that’s fine with me in this case.

It’s a good thing it’s shorter because I’m going to take a break from refactoring and fix the bug, which involves adding a line. Ok, finally the tests pass.

We’re doing sorting and limiting with ruby. Sometimes it’s faster to do the sorting in ruby because the database is doing something dumb, so I’ll have to keep track of the timings. And sometimes it looks nicer in Ruby if it’s complicated code. But in this case the Ruby code is 5 lines of really dense code (use of the ternary operator and the compact {} form of blocks makes it look like you’re trying to hide something). After staring at it for a while, it’s just doing a coalesce and a normal limit with offset, so doing it in SQL looks nicer to me.

After:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
    WHERE_STARTED_OR_COMPLETED_ITEMS_DURING_RANGE = <<-SQL
        WHERE active = 1 
        AND person_id = :person_id
        AND ( (completed_on IS NULL 
               AND complete = 0 
               AND give_up = 0 
               AND posted_date BETWEEN :start AND :end)
           OR (completed_on BETWEEN FROM_UNIXTIME(:start) AND FROM_UNIXTIME(:end)
               AND complete = 1 ) )
    SQL

    NUM_STARTED_OR_COMPLETED_ITEMS_DURING_RANGE = <<-SQL.prettify_sql
        SELECT count(*) FROM listed_items
        #{WHERE_STARTED_OR_COMPLETED_ITEMS_DURING_RANGE}
    SQL

    STARTED_OR_COMPLETED_ITEMS_DURING_RANGE = <<-SQL.prettify_sql
        SELECT * FROM listed_items
        #{WHERE_STARTED_OR_COMPLETED_ITEMS_DURING_RANGE}
        ORDER BY COALESCE(completed_on, FROM_UNIXTIME(posted_date)) DESC
        LIMIT :offset, :limit
    SQL

    def num_completed_or_started_items_during_range(start_date, stop_date)
        ListedItem.count_by_sql [NUM_STARTED_OR_COMPLETED_ITEMS_DURING_RANGE,
                                 {:person_id => self.id,
                                     :start => start_date,
                                     :end => stop_date}]
    end

    def completed_or_started_items_during_range(start_date, stop_date, offset=0, limit=20)
        ListedItem.find_by_sql [STARTED_OR_COMPLETED_ITEMS_DURING_RANGE,
                                {:person_id => self.id,
                                    :start => start_date,
                                    :end => stop_date,
                                    :offset => offset,
                                    :limit => limit}]
    end

Pop quiz: how do you get the month (as a name, not a number) of a Time in Ruby (without looking at the Time#strftime documentation)?

All I remembered was that it was some meaningless looking format string. Here’s my first guess:
1
2
irb(main):001:0> Time.now.strftime("%a")
=> "Tue"
Oops, it’s actually:
1
2
irb(main):002:0> Time.now.strftime("%b")
=> "Feb"

That doesn’t look very ruby-like. Aren’t % format strings for C programmers (whoever wrote the code I’m currently refactoring is probably a recovering C programmer. They even use sprintf…)?

How about:
1
2
 irb(main):003:0> Time.now.short_month_name
=> "Feb"

I’m sure others have done this already, but I couldn’t find it so I made my own Rails plugin, and stuck it here: http://svn.laurelfan.com/decorated_time/, mostly to see if I could set up an svn repository on dreamhost. The stuff that actually does the work is a total of about 4 lines of code since Ruby nicely lets me reopen the Time class at will.

(speaking of the day of week, I noticed that 1.9 added monday?, tuesday?, etc methods)

Fun with Emacs

February 19th, 2008

One good part of doing a big refactoring is that I get to have fun with emacs.

I found a great step by step tutorial with all the details. But basically, first you get in to dired (directory editing), then mark the files you want to search through, and then do dired-do-query-replace-regexp.

So for example, to replace @params with params in all controllers:

  • M-x find-dired
    • Run find in directory: app/controllers
    • Run find (with args): -name '*.rb'
  • %m
    • Mark files (regexp): .
  • M-x dired-do-query-replace-regexp
    • Query replace in marked files (regexp): @params
    • Query replace @params by: params

Dired is fun stuff like emacs’s other “everything is a buffer!!” things (just like unix’s “everything is a file!!”). I’m not quite emacs-hacker enough to use it as a shell for extended periods of time though.

Reraising Exceptions in Ruby

February 14th, 2008

Don’t do this:
1
2
3
    rescue Exception => e
      # other stuff
      raise "#{e.class}: #{e.message}"
Do this:
1
2
3
    rescue Exception => e
      # other stuff
      raise

raise without any arguments will reraise the current exception, complete with class, message, and stack trace.

See also Programming Ruby on exceptions.

Stupid SEO Tricks

January 17th, 2008

We found a site that looked like it was stealing our content. Then we go there, and it’s exactly the same site! What are they doing? Are they crawling the site and sucking up all of our pages? Proxying us?

But wait:
 > dig d****list.com
d****list.com.          1736    IN      A       209.61.175.237
That IP address looks familiar!
 > dig 43things.com  
43things.com.           86400   IN      A       209.61.175.237

Why would someone do that? Is it some SEO trick to steal our google rank for their domain?

Josh noticed that the domain was registered to someone in China (we don’t think this is a good person trying to get us around the Great Firewall—a test tool we tried showed that 43things.com isn’t blocked). So we decided to cause them some trouble:

    RewriteCond %{HTTP_HOST} .*d****list.com$
    RewriteRule ^/(.*) http://en.wikipedia.org/wiki/Tiananmen_Square_protests_of_1989 [R]

Ruby Memory Usage

January 15th, 2008

Tracking down a memory leak? Here’s a way to find the memory usage of the current process on Ruby:


memory_usage = `ps -o rss= -p #{Process.pid}`.to_i # in kilobytes 

-o rss= asks ps to print only the RSS (Resident Set Size, or physical memory used). You could also use vsz/vsize (virtual memory). The hanging = sign sets the header text to a blank string so you don’t have to filter out the header line.

-p #{Process.pid} limits the ps to only show the current process.

The backticks are kind of hacky, but this cuts down on the piping and grepping. It works on all of the unixes I’ve tried (Linux, FreeBSD, OSX), but of course ps is notoriously nonstandardized.

MySQL Performance Tuning Links

January 10th, 2008

Here are some links to things mentioned at the MySQL Performance Tuning class that I’ve been attending this past week.

Day 1 Morning: MySQL website, basics, upsell for more training classes and certifications Day 1 Afternoon: architecture overview
  • Documentation for the innodb transaction model
    • multiple versions of uncommitted rows are stored on disk
  • SHOW TABLE STATUS shows statistics like table size, number of rows (it’s actually an approximation)
Day 2 Morning: data types, benchmarking, logs, admin tools, indexes
  • slow query log and mysqldumpslow (which summarizes the information in the slow query log)
  • PROCEDURE ANALYSE, which tries to figure out the optimal data type for the columns of a table
  • Benchmarking tools: mysqlslap, which comes with 5.1 and might not work with 4.1 and Super Smack
  • mysqlreport, a more human readable version of SHOW STATUS
  • mytop, a top-view of SHOW PROCESSLIST
  • innotop monitors all kinds of innodb stuff
  • Maatkit contains essential command-line tools for MySQL, such as table checksums, a query profiler, and a visual EXPLAIN tool. It provides missing features such as checking whether slaves have the same data as the master.
Day 2 Afternoon: statement tuning, query cache Day 3 Morning: server configuration Day 3 Afternoon: MyISAM Day 4 Morning: InnoDB, Transactions, Locking Day 4 Afternoon: Other Storage Engines: Memory, Blackhole, CSV, Falcon
  • Memory is a table that’s stored completely in memory
  • Blackhole is like /dev/null
  • CSV yes, Comma Separated Value file
  • Falcon the 6.0 storage engine that will solve all of our perf problems
  • Example for writing your own storage engine

Mocking Facebook

December 10th, 2007

Any sort of interaction with external services is a serious hassle when trying to do TDD or even Development with Some Tests In It. External services could include anything from web services like facebook to our own drb ferret service (the real purists would include your database in this, but one step at a time…). The worst thing to do is to actually interact with the service in the test—it makes tests slow, dependent on a network connection, potentially messes up production data, etc.

The solution is mock objects. For a long time I did mock objects the wrong way, by inheriting from the real object and essentially reimplementing the functionality. This is pretty painful, so I didn’t do it much. Fortunately, really good mocking frameworks are everywhere now. Ruby even has two of them—mocha and flexmock.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17

    def test_facebook_publish_complete_goal
      @mock_fbsession = valid_facebook_session
      
      person = people(:rob_cooper)
      person.facebook_session = @mock_fbsession

      @mock_team_member = flexmock
      @mock_team_member.should_receive("goal.name").and_return("write a facebook app")
      @mock_team_member.should_receive("goal_is_complete?").and_return(true)
      @mock_team_member.should_receive("give_up?").and_return(false)

      @mock_fbsession.should_receive(:feed_publishActionOfUser).
        with(:title => 'has completed the goal: write a facebook app').once
      
      person.facebook_publish_goal_activity(@mock_team_member)
    end

First some preliminaries

First I need to set up a mock FacebookSession (I use a pseudomock wrapped around a real FacebookWebSession because I’m lazy and don’t want to mock the session_id and session_key accessors). The RFacebook library obviously wasn’t implemented TDD, so there are a few ways that it’s a bit messy to test.

Hide the description

1
2
3
4
5
6
7
8

    def valid_facebook_session
      mock_session =  flexmock(RFacebook::FacebookWebSession.new("test", "test"))
      mock_session.should_receive(:is_valid?).and_return(true)
      mock_session.should_receive(:is_ready?).and_return(true)

      return mock_session
    end

Next, I’m making a mock team member (a TeamMember is a model that represents a Person doing a Goal), but this time I use a plain mock that doesn’t reference the “real” object at all. And check out this line:
1
2

      @mock_team_member.should_receive("goal.name").and_return("write a facebook app")

This call is actually making a chain of mock objects, so I can call @mock_team_member.goal.name without having to explicitly create a mock object for the goal.

Here’s the guts

The first line sets up the expectation—kind of like an assert.

1
2
3
4
5

      @mock_fbsession.should_receive(:feed_publishActionOfUser).
        with(:title => 'has completed the goal: write a facebook app').once
      
      person.facebook_publish_goal_activity(@mock_team_member)

The second line calls the method we wants to test. The expectation (that the method feed_publishActionOfUser will be called once with the specified arguments) will magically be evaluated when the test is over.

MySQL High Availability Links

November 14th, 2007

I just got back from a 2 day mysql training on High Availablity in Portland. It was a well taught class with a good interactive setup. Getting to spend 4 days in Portland was great too! Here are a few links to stuff mentioned in class:

  • Performance: The class didn’t really cover performance, but there were lots of questions about it. I guess HA is only interesting if you care about performance.
  • Replication: We use a very simple master/slave setup, and don’t really use the slave as much as we could. There are lots of other options for replication, like an arbitrary number of slaves, and even an arbitrary number of masters set up in a circle.
    • mysqlproxy, a nice looking way to do load balancing and distributing reads over multiple read slaves without going config crazy at the application level
    • a replicated table on a slave can have a different storage engine than on the master— even the Black Hole storage engine
    • Google replication patches
  • Cluster: It’s really cool stuff and surprisingly easy to set up, but unfortunately it doesn’t really match what our application does. All the arbitration and split brain stuff reminds me of class though!
    • example config file
    • almost all of the data has to fit in memory, so it’s useful to calculate the size of a mysql cluster with ndb_size.pl
  • Disk Stuff: We do backups with mysqldump. If you’re not familar with this, it turns a database into a text stream of sql statements. As you would expect it is a bit slow.
    • DRBD is a linux kernel module that sits in front of a block device and synchronously writes to a block device on another machine
    • a lot of people do backups via LVM

FBML validation

October 25th, 2007

Facebook has made up this thing called FBML, which is basically HTML with a few tags added, a few tags subtracted, and some random rules. So I decided to have some fun with Hpricot and make an assertion to put in the relevant view test cases:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40

  # acceptable html tags from http://wiki.developers.facebook.com/index.php/FBML
  ACCEPTABLE_HTML_TAGS = %w{ a abbr acronym address b bdo big blockquote br
    caption center cite code dd del dfn div dl dt em fieldset font form h1 h2 h3 h4 h5
    h6 hr i img input ins kbd label legend li ol optgroup option p pre q s samp script
    select small span strike strong style sub sup table tbody td textarea tfoot th thead
    tr tt u ul var}

  def assert_all_tags_valid(doc)
    doc.search('*') do |element|
      if element.is_a?(Hpricot::Elem) # ignore comments, text, etc
        if !ACCEPTABLE_HTML_TAGS.include?(element.name) and
            !element.name.starts_with?("fb:") # accept anything that looks like a custom fb: tag
          assert false, "#{element.name} is not a valid fbml tag"
        end
      end
    end
  end

  def assert_all_img_src_absolute(doc)
    doc.search('img') do |img|
      src = img.attributes['src']
      if src.nil?
        assert false, "<img> tag missing src attribute"
      end
      if !src.starts_with?("http")
        assert false, "<img> tag src attribute (#{src}) is not absolute"
      end
    end
  end
  
  def assert_valid_fbml(fbml = nil)
    if fbml.nil?
      fbml = @response.body
    end

    doc = Hpricot(fbml)
    assert_all_tags_valid(doc)
    assert_all_img_src_absolute(doc)
  end