Quick and dirty Rails profiling
May 13th, 2010
If you have the log level verbose enough, rails spits out the timings for every little thing it does, but it inconveniently puts it in parentheses in the middle of the line so you can’t read it. So I wrote some really simple code to pull it out and stick it at the beginning of the line for sorting purposes:
1 2 3 4 5 6 7 |
#!/usr/bin/env ruby $stdin.each do |line| if line =~ /\(([0-9\.]+)\)/ puts "#{$1} #{line}" end end |
You can also use awk but I used ruby because I know ruby and don’t know awk. You could also one-line it with -pe but this way I can remember it for next time. Told you this was quick and dirty.
Let’s call that script “script/log_timing_debug”, and run it kind of like:1 2 3 |
> tail -f /var/log/rails_app.log > /tmp/railslog # hit a page from your browser, then ^C this > cat /tmp/railslog | ./script/log_timing_debug | sort -n |
The stuff that takes the longest will sort to the bottom of the output.
When we ran this on master (the same database and configuration as production, but isn’t in the real production rotation), we found a bunch of “SHOW FIELDS FROM” queries. First we thought “hmm, why is this taking so long?”, and then “wait, what? why is this even being run?”
Oh.
# The production environment is meant for finished, "live" apps.
# Code is not reloaded between requests
config.cache_classes = false
Oops.
Database specific types in Rails migrations
January 26th, 2010
There are charts of what Rails migration types (:string, :integer, :boolean, ...) map to mysql types (varchar(255), int(11), tinyint(1)...), but sometimes you want something not on the chart. For example, for string types, you only get varchar and text, but sometimes you want a char. In mysql, accessing tables with fixed length rows is faster than tables with variable length rows (of course the tradeoff is that if your string is actually variable length you can waste space.
It’s surprisingly simple. If Rails doesn’t understand the column type, it’ll pass it straight through to the database. So if you want a char instead of varchar, just replace:
t.column :token, :string
With:
t.column :token, "char(255)"
(Of course, this may or may not make your migrations non-portable to another database).
Disabling the Rails query cache
December 1st, 2009
Yes, Rails has a query cache (in addition to the one your database probably has).
I discovered this when trying to debug a problem with a query that should have been giving me random rows from the database. The unit test worked fine, so I didn’t bother testing for randomness in the controller test, since it’s annoying to test for randomness. But of course it didn’t work in real life. It’s kind of ironic to have a bug that the results are not unexpected enough.
Hey, let’s look at the logs:
Goal Load (0.000656) SELECT SQL_NO_CACHE g.* FROM goals g ORDER BY RAND() LIMIT 100
CACHE (0.000000) SELECT SQL_NO_CACHE g.* FROM goals g ORDER BY RAND() LIMIT 100
CACHE (0.000000) SELECT SQL_NO_CACHE g.* FROM goals g ORDER BY RAND() LIMIT 100
Hmm. So it turns out that Rails/ActiveRecord has its own query cache. It invisibly does what you want, unless you are doing a query that returns random results more than once in a request. The documentation is a bit thin, but it’s easy enough to disable temporarily:
1 2 3 4 5 |
def self.random_goals(limit = 100) uncached do # disable rails query cache Goal.find_by_sql [RANDOM_GOALS_SQL, limit] end end |
A year is 365.25 days in ruby
November 6th, 2009
1 2 3 4 |
>> 1.year.to_i / 1.day.to_i => 365 >> 1.year.to_f / 1.day.to_f => 365.25 |
Which makes quick and dirty calculations like age = (Time.now - birthdate) / 1.year slightly more accurate than one might expect.
How not to spam everyone when testing email sending
January 3rd, 2009
How not to spam everyone when testing email sending (assuming I don’t want to just turn off email sending).
mysql> update users set email = concat(email, '.xx') where email not like 'laurel%';
Unreasonably satisfying
August 7th, 2008
Before:

After:

Bugs and Features
April 21st, 2008
Too bad this only works in unit tests:

Playing with graphs
April 7th, 2008
I’ve always liked playing with graphs and representing data, so when we we decided to implement a version of Buster’s Morale-O-Meter for the new 43 Things profile page
I started with gruff, but I wanted something a little more stylized and less techy looking, so I took out all of the axes and hacked it to use numbers as the points. I kind of like how it turned out:

(This isn’t the final version, and the feature isn’t live on 43 Things yet, but it will be “soon”...)
Refactoring SQL Strings
February 27th, 2008
Our apps have a lot of custom sql. A lot of the time it’s easier just to write exactly the sql we want instead of messing around with ActiveRecord. But any time two programming languages run in to each other things can get out of hand, so there are lots of opportunities for refactoring.
Before:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 |
NUM_STARTED_OR_COMPLETED_ITEMS_DURING_RANGE = <<-SQL.prettify_sql SELECT count(*) FROM listed_items WHERE active = 1 AND person_id = ? AND ( (completed_on IS NULL AND complete = 0 AND give_up = 0 AND posted_date > ? AND posted_date < ?) OR (completed_on > FROM_UNIXTIME(?) AND completed_on < FROM_UNIXTIME(?) ) ) SQL STARTED_OR_COMPLETED_ITEMS_DURING_RANGE = <<-SQL.prettify_sql SELECT * FROM listed_items WHERE active = 1 AND person_id = ? AND ( (completed_on IS NULL AND complete = 0 AND give_up = 0 AND posted_date > ? AND posted_date < ?) OR (completed_on > FROM_UNIXTIME(?) AND completed_on < FROM_UNIXTIME(?) ) ) ORDER BY updated_date DESC LIMIT 0, 100 SQL def num_completed_or_started_items_during_range(start_date, stop_date) ListedItem.count_by_sql [NUM_STARTED_OR_COMPLETED_ITEMS_DURING_RANGE, self.id, start_date, stop_date, start_date, stop_date] end def completed_or_started_items_during_range(start_date, stop_date, offset=0, limit=20) items = ListedItem.find_by_sql [STARTED_OR_COMPLETED_ITEMS_DURING_RANGE, self.id, start_date, stop_date, start_date, stop_date] item_to_sort = Hash.new items.each do |item| item_to_sort[item.id] = item.completed_on ? Time.at(item.completed_on).to_i : item.posted_date end items.sort{|a,b| item_to_sort[a.id] <=> item_to_sort[b.id]}.reverse[offset .. (offset + limit - 1)] end |
What’s wrong with this?
First of all, there are no tests. So first step is to write some. If I were just refactoring, I’d write tests that pass, but the reason I’m in this code is that there’s a bug (long story short, the bug is caused by the fact that a listed_item can have a non-null completed_on but have completed = 0). So I write a failing one that catches the bug too.
The STARTED_OR_COMPLETED_ITEMS_DURING_RANGE and
NUM_STARTED_OR_COMPLETED_ITEMS_DURING_RANGE string constants are
almost identical. So when I start changing the strings I realize that
I’m going to mess things up if I leave them duplicated like that (even
if I do the right thing, change both strings, it’ll be wrong for
the moment where I’m moving the cursor 10 lines down), so I stop and
factor out the where clause.
The next thing is that I don’t like the way the parameters for the SQL
string work. Parameters are repeated: if you look at
completed_or_started_items_during_range, the start_date and
stop_date parameters are both in there twice. Also, it’s annoying
and error prone to keep paging between the string and the methods (I
pasted the excerpt together, so it doesn’t look like it here, but this
file is arranged so all of the constants are together at the top of
the class definition, so there are several pages separating these
constants and methods) to remember what the parameters mean. Rails
lets me use named
bind variables
so I can give them names like a real programming language.
Ok, now that I have > :start and < :end instead of > ? and < ?, I’m more comfortable changing that to use the SQL between
operator,
and it looks a lot nicer (and one line shorter). I realize that
between doesn’t quite have the semantics of < and >, but that’s fine
with me in this case.
It’s a good thing it’s shorter because I’m going to take a break from refactoring and fix the bug, which involves adding a line. Ok, finally the tests pass.
We’re doing sorting and limiting with ruby. Sometimes it’s faster to do the sorting in ruby because the database is doing something dumb, so I’ll have to keep track of the timings. And sometimes it looks nicer in Ruby if it’s complicated code. But in this case the Ruby code is 5 lines of really dense code (use of the ternary operator and the compact {} form of blocks makes it look like you’re trying to hide something). After staring at it for a while, it’s just doing a coalesce and a normal limit with offset, so doing it in SQL looks nicer to me.
After:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 |
WHERE_STARTED_OR_COMPLETED_ITEMS_DURING_RANGE = <<-SQL WHERE active = 1 AND person_id = :person_id AND ( (completed_on IS NULL AND complete = 0 AND give_up = 0 AND posted_date BETWEEN :start AND :end) OR (completed_on BETWEEN FROM_UNIXTIME(:start) AND FROM_UNIXTIME(:end) AND complete = 1 ) ) SQL NUM_STARTED_OR_COMPLETED_ITEMS_DURING_RANGE = <<-SQL.prettify_sql SELECT count(*) FROM listed_items #{WHERE_STARTED_OR_COMPLETED_ITEMS_DURING_RANGE} SQL STARTED_OR_COMPLETED_ITEMS_DURING_RANGE = <<-SQL.prettify_sql SELECT * FROM listed_items #{WHERE_STARTED_OR_COMPLETED_ITEMS_DURING_RANGE} ORDER BY COALESCE(completed_on, FROM_UNIXTIME(posted_date)) DESC LIMIT :offset, :limit SQL def num_completed_or_started_items_during_range(start_date, stop_date) ListedItem.count_by_sql [NUM_STARTED_OR_COMPLETED_ITEMS_DURING_RANGE, {:person_id => self.id, :start => start_date, :end => stop_date}] end def completed_or_started_items_during_range(start_date, stop_date, offset=0, limit=20) ListedItem.find_by_sql [STARTED_OR_COMPLETED_ITEMS_DURING_RANGE, {:person_id => self.id, :start => start_date, :end => stop_date, :offset => offset, :limit => limit}] end |
Getting the weekday name and month name from a Time
February 26th, 2008
Pop quiz: how do you get the month (as a name, not a number) of a Time in Ruby (without looking at the Time#strftime documentation)?
All I remembered was that it was some meaningless looking format string. Here’s my first guess:1 2 |
irb(main):001:0> Time.now.strftime("%a") => "Tue" |
1 2 |
irb(main):002:0> Time.now.strftime("%b") => "Feb" |
That doesn’t look very ruby-like. Aren’t % format strings for C programmers (whoever wrote the code I’m currently refactoring is probably a recovering C programmer. They even use sprintf…)?
How about:1 2 |
irb(main):003:0> Time.now.short_month_name => "Feb" |
I’m sure others have done this already, but I couldn’t find it so I made my own Rails plugin, and stuck it here: http://svn.laurelfan.com/decorated_time/, mostly to see if I could set up an svn repository on dreamhost. The stuff that actually does the work is a total of about 4 lines of code since Ruby nicely lets me reopen the Time class at will.
(speaking of the day of week, I noticed that 1.9 added monday?, tuesday?, etc methods)
Fun with Emacs
February 19th, 2008
One good part of doing a big refactoring is that I get to have fun with emacs.
I found a great step by step tutorial with all the details. But basically, first you get in to dired (directory editing), then mark the files you want to search through, and then do dired-do-query-replace-regexp.
So for example, to replace @params with params in all controllers:
M-x find-dired- Run find in directory:
app/controllers - Run find (with args):
-name '*.rb'
- Run find in directory:
%m- Mark files (regexp):
.
- Mark files (regexp):
M-x dired-do-query-replace-regexp- Query replace in marked files (regexp):
@params - Query replace @params by:
params
- Query replace in marked files (regexp):
Dired is fun stuff like emacs’s other “everything is a buffer!!” things (just like unix’s “everything is a file!!”). I’m not quite emacs-hacker enough to use it as a shell for extended periods of time though.
Reraising Exceptions in Ruby
February 14th, 2008
1 2 3 |
rescue Exception => e # other stuff raise "#{e.class}: #{e.message}" |
1 2 3 |
rescue Exception => e # other stuff raise |
raise without any arguments will reraise the current exception, complete with class, message, and stack trace.
See also Programming Ruby on exceptions.
Stupid SEO Tricks
January 17th, 2008
We found a site that looked like it was stealing our content. Then we go there, and it’s exactly the same site! What are they doing? Are they crawling the site and sucking up all of our pages? Proxying us?
But wait:> dig d****list.com d****list.com. 1736 IN A 209.61.175.237That IP address looks familiar!
> dig 43things.com 43things.com. 86400 IN A 209.61.175.237
Why would someone do that? Is it some SEO trick to steal our google rank for their domain?
Josh noticed that the domain was registered to someone in China (we don’t think this is a good person trying to get us around the Great Firewall—a test tool we tried showed that 43things.com isn’t blocked). So we decided to cause them some trouble:
RewriteCond %{HTTP_HOST} .*d****list.com$
RewriteRule ^/(.*) http://en.wikipedia.org/wiki/Tiananmen_Square_protests_of_1989 [R]
Ruby Memory Usage
January 15th, 2008
Tracking down a memory leak? Here’s a way to find the memory usage of the current process on Ruby:
memory_usage = `ps -o rss= -p #{Process.pid}`.to_i # in kilobytes |
-o rss= asks ps to print only the RSS (Resident Set Size, or physical memory used). You could also use vsz/vsize (virtual memory). The hanging = sign sets the header text to a blank string so you don’t have to filter out the header line.
-p #{Process.pid} limits the ps to only show the current process.
The backticks are kind of hacky, but this cuts down on the piping and grepping. It works on all of the unixes I’ve tried (Linux, FreeBSD, OSX), but of course ps is notoriously nonstandardized.
MySQL Performance Tuning Links
January 10th, 2008
Here are some links to things mentioned at the MySQL Performance Tuning class that I’ve been attending this past week.
Day 1 Morning: MySQL website, basics, upsell for more training classes and certifications- Documentation pages for all options and variables and the system variables page (which indicates which ones are dynamic).
- Documentation for the innodb transaction model
- multiple versions of uncommitted rows are stored on disk
- SHOW TABLE STATUS shows statistics like table size, number of rows (it’s actually an approximation)
- slow query log and mysqldumpslow (which summarizes the information in the slow query log)
- PROCEDURE ANALYSE, which tries to figure out the optimal data type for the columns of a table
- Benchmarking tools: mysqlslap, which comes with 5.1 and might not work with 4.1 and Super Smack
- mysqlreport, a more human readable version of SHOW STATUS
- mytop, a top-view of SHOW PROCESSLIST
- innotop monitors all kinds of innodb stuff
- Maatkit contains essential command-line tools for MySQL, such as table checksums, a query profiler, and a visual EXPLAIN tool. It provides missing features such as checking whether slaves have the same data as the master.
- EXPLAIN EXTENDED and then SHOW WARNINGS will show you the actual SQL being executed (which can be different if the optimizer wants to rewrite it).
- Query cache documentation
- table_cache setting (aka table_definition_cache and table_open_cache in 5.1) how many file descriptors are used to open table files.
- max_connections
- open_files_limit
- thread_cache_size
- MyISAM storage engine
- spatial index for efficiently searching geographical data
- MyISAM key cache for caching MyISAM indexes
- MERGE storage engine concatenates (like UNION ALL not UNION) more than one MyISAM table with the same schema
- InnoDB storage engine
- Configuring InnoDB to use a separate file per table (instead of the default of one shared ibdata file for all tables)
- Partitioning (new in 5.1), partitions a table over multiple files.
- Advisory locks are semaphores implemented in MySQL (not InnoDB specific). These don’t affect table locks.
- Next key/gap locks means that when a locking SELECT on a range of indexed values is done, the table is also locked for INSERTs within that range (and the gap before that range!).
- doublewrite setting
- innodb_flush_log_at_trx_commit
- Disk I/O
- InnoDB status and monitors that print information about the InnoDB internal state
- More about SHOW INNODB STATUS