A few weeks ago I came across a blog post from Aaron Francis in which he talks about creating efficient pagination using deferred joins. A technique he remembered reading in High Performance MySQL: Proven Strategies for Operating at Scale.

The idea is that without deferred joins pagination queries can impact response time. Pagination is done using an OFFSET to skip over a number of records, however, even though the results are skipped, the database must still fetch those records. Meaning we are reading data from the disk and immediately discarding it. This is an inefficient process and is what causes pagination performance to degrade as you paginate over more records.

I want to run a test, I have a query used by an application to paginate a database with roughly 700K records. The following SQL statement is generated by a client application to paginate over those records looks somewhat like like this.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
DECLARE @PageNumber AS INT
DECLARE @RowsOfPage AS INT

SET @PageNumber = 1000
SET @RowsOfPage = 100

SELECT * FROM Customers c
ORDER BY Id 
OFFESET(@PageNumber-1) * @RowsOfPage ROWS 
FETCH NEXT @RowsOfPage ROWS ONLY

Note that for simplicity, I have hard-coded the page number and page row directly in the queyr, normally these are parameters that come from the app. Now, according to the data provided by the Client Statistics tool found on SSMS. The client processing time for this query after 10 iterations on average was 2393.0000 ms. The total execution time after 10 iterations was on average 3287.0000 ms and the total time waiting on the server to reply was 894.0000 ms.

The same query modified to use deferred joins looks somewhat like this.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
DECLARE @PageNumber AS INT
DECLARE @RowsOfPage AS INT

SET @PageNumber = 1000
SET @RowsOfPage = 100

SELECT * FROM Customers c
    INNER JOIN(
        SELECT Id from Customers
        ORDER BY Id OFFESET(@PageNumber - 1) * @RowsOfPage ROWS FETCH NEXT @RowsOfPage ROWS ONLY
    ) as t ON c.Id = t.Id

I ran the query above against the same database with roughly 700K records. I got some rather impressive results. The client processing time was 981.8000 ms after 10 iterations, down from the original 2393.0000 ms. The total execution time was 1402.8000 ms after 10 iterations, down from 3287.0000 ms and the wait time on server reply was 421.0000 ms after 10 iterations, down from 894.0000 ms.

Those are some rather serious improvements. I’m thinking I will need to update the client app to use the updated query with deferred joins. It should improve the overall performance of the application, which is always a win in my book.

I highly recommend reading Aaron’s original blog post and checking out the book High Performance MySQL: Proven Strategies for Operating at Scale to learn more about deferred joins.