Skip Navigation LinksHome > Categories > Code from a Category

How to Aggregating Correlated Sub-Queries



User Name: codelecturer
Name: Mike Chauhan
Contact Me: www.datawebcoder.com/ContactUs.aspx
Home Page: www.datawebcoder.com
6 years of total IT experience including programming, application development, System Maintenance . 3 years of Experienced and expertise in .Net Framework VB, Visual Basic.Net, ASP.Net 2.0, 3.5 and AD... [More]
Viewed Times: 1657
Add Date: 01/06/2014
In this article we will learn about how to aggregating correlated sub queries in Sql Server.
Every now and then you may attempt to calculate an aggregate function -- such as SUM() -- on a correlated subquery, only to encounter the following error:

Cannot perform an aggregate function on an expression containing an aggregate or a subquery.

This can be a tricky situation to resolve depending on how your query is structured and how complicated it is, but here's a couple of ideas to help you out.

Summing up a correlated SUM()

Let's start by looking at a typical example. Suppose you have written the following SELECT:

Notice that we have used a correlated sub-query to return the total Order Amount per Company. Working from that, let's say that you instead would like to the return the total Order Amount by Region. Typically, you would remove Company from the SELECT list, add GROUP BY Region to the end, and wrap what you'd like to total in in a SUM() aggregate function:

Unfortunately, that is not a valid SQL statement, since you are trying to directly SUM() a correlated sub-query (which also happens to have a SUM() in it already). This is not allowed in SQL Server; if you try to execute that, you'll be greeted with the error mentioned above.

There are two ways to fix this:

Option 1: Use a dervied table

Perhaps the easiest solution is to simply wrap your original SELECT in a derived table, and then select from that derived table and do the grouping in the outer select:

This is logically equivalent to what we tried to do earlier, except it is now a valid SQL statement and it will return the correct results without an error. This is a very quick and easy way to solve this situation in general, though it is not always the optimal solution in terms of performance and readability.

Option 2: Rewrite your SELECT without a Correlated Subquery

Often the best solution is to rewrite your SELECT without using a correlated sub-query at all. In our example, we could write the original as:

Once the SELECT is re-written, we can now easily adjust that to return only totals by Region simply by removing Company.Name from the SELECT and GROUP BY clauses:

This is usually a good way to solve the issue because the end result is very efficient and well-structured, so it is easier to understand and maintain.

Nesting Mixed Aggregate Functions

The previous example was fairly easy to rewrite because all we were using was a SUM() aggregate; it is quite easy just to adjust the grouping to get the totals that you need. However, what happens if you'd like to calculate the SUM() of the MAX() Order Amount per company? That is, let's say you have this:

That returns each Company along with their Region and their top Order Amount. Now, let's say we'd like to summarize that by Region, so that we are returning the SUM() of each customer's maximum Order Amount per Region. We can try to remove Company.Name from the GROUP BY and the SELECT, and add a SUM() around the MAX() expression like this:
...
but that will result in an error, since an aggregate function cannot be contained within another aggregate function. It also doesn't make much logical sense, anyway: We have a MAX() function nested within the SUM() function, but we never indicate anywhere what the MAX() is for.

To solve this, just wrap the original SELECT in a derived table, GROUP BY Region and calculate the SUM() of the MaxOrderAmount:

Now it is clear that we are first summarizing the Orders by Company, and then summarizing those results by Region. The derived table allows us to break down the problem into smaller, simpler parts and to calculate one aggregate function within another.
A More Complicated Situation

Another example that I've seen recently is replacing a hard-coded list of values in an IN() clause with values selected from a table. For example, suppose you'd like to return a grand total of all items in a table along with a subtotal of just a select few; you can use a SUM(CASE ...) expression like this:

Now, what if that list of Codes actually comes from another table? That is, instead of hard-coding
when Code in ('a','b','d') we'd like it to be when Code in (select Code from SubTotalCodes) If we try to make that change in our previous SQL statement:

we get that dreaded error once again, because we are trying to apply a SUM() on an expression containing a sub-query. This one seems a bit more tricky to solve!

We can handle this by removing the SUM() aggregate functions completely, wrapping that in a derived table, and then totaling it up:

We also could rewrite the previous statement using JOINS instead of an IN() clause like this:

left outer join SubTotalCodes on SubTotalCodes.Code = SomeTable.Code We'd have to be sure the primary key of SubTotalCodes is on the "Code" column, otherwise the JOIN may cause duplicate values to be aggregated in our results.

Conclusion

Resolving this situation can sometimes be tricky, but it can always be done. If you rewrite your query using standard JOINS and/or derived tables, it will not only fix the error, but it will also result in a clearer and more structured SELECT statement.

Happy Programming!!!

Post a Comment

Name: (Optional)
Email: (Optional, you can get an email if somebody replys your comments)*
Email me if somebody respons my comment below:
Details**:
Enter Text
as Below:
(case insensitive, if hard to read, click the "get a new one" button)
 
    
* Your email address will not be shared with any third parties for any reason.
** Maximum 1000 charactors.