Top N Queries by Group

Top N Queries by Group

May 14, 2020 by Robert Gravelle

A Top N query is one that fetches the top records, ordered by some value, in descending order. Typically, these are accomplished using the TOP or LIMIT clause. Problem is, Top N result sets are limited to the highest values in the table, without any grouping. The GROUP BY clause can help with that, but it is limited to the single top result for each group. If you want the top 5 per category, GROUP BY won’t help by itself. That doesn’t mean it can’t be done. In fact, in today’s blog, we’ll learn exactly how to construct a Top N query by group.

To gain a better understanding of a the Top N Query, let’s compose one that selects the top 5 films with the longest running times from the Sakila Sample Database. If you aren’t familiar with the Sakila database, it’s a MySQL database that contains a number of tables, views, and queries related to a fictional video rental store. Tables include actor, film, customer, rentals, etc.

top_n (77K)

The GROUP BY clause applies an aggregate function to one or more fields so that the data relates to the groupings that you specify. It’s a step forward in terms of grouping results, but GROUP BY still has a couple of limitations:

  • it only provides the very first result (i.e. row) per group and ignores others,
  • the columns are limited to those included in the grouping criteria and aggregated field(s). All other columns are not accessible.

This query uses GROUP BY to show the longest running film for each rating:

group_by (47K)

Notice that we can’t include the film title title, because it is not part of either the GROUP BY or aggregated field(s).

The term “window” in Windows Functions refers to the set of rows on which the function operates because the function uses values from the rows in a window to calculate the returned values. The set of rows within the window are aggregated into a single value.

To use a window function in a query, you have to define the window using the OVER() clause. It does 2 things:

  1. Defines window partitions to form groups of rows, via the PARTITION BY clause.
  2. Orders rows within a partition, via the ORDER BY clause.

A query can include multiple window functions with the same or different window definitions.

Our query uses the ROW_NUMBER() window function. It assigns a sequential integer number to each row in the query’s inner window result set. We can use that value to limit the results for each rating to the top 5. That’s done by ordering the length in descending order.

row_number (116K)

In today’s blog we learned how to construct a query that fetches the top 5 rows per category in Navicat Premium. Version 15 adds over 100 enhancements and includes several new features to give you more ways that ever to build, manage, and maintain your databases than ever before!