Hello, and welcome back to another post!
In my last post, I showed you the basics of SQL Server queries, now in this post let’s complement that with more functionality. We’re going to add aggregations and groups to our queries.
SQL Server has many built-in functions that we can use to manipulate data. Scalar valued functions, table valued functions, type conversion functions, others to work with dates and times, offset functions, character functions, aggregate functions, to format and concatenate strings, also for splitting strings, system functions and more.
I will be writing of several of those in future posts, but for now we’re going to focus on aggregate functions.
WHAT IS AN AGGREGATE FUNCTION?
Basically, an aggregate function is a function that can be applied to a set of rows and get a single value as a result. For example SQL Server supports aggregate functions like: SUM()
, COUNT()
, MIN()
, MAX()
, AVG()
and some others.
Something worth mentioning is that aggregate functions ignore NULL
, with the exception of COUNT(*)
which means, it’s going to count all the rows in our table.
Let’s see some examples of how these aggregate functions are used, and the first one is to show you what I meant about ignoring NULL
values.
COUNT()
SELECT COUNT(*) AS CountAllRows , COUNT(ShipRegion) AS CountWithoutNulls FROM Orders; CountAllRows CountWithoutNulls ------------ ----------------- 830 323
I included the result right below of the query. If you notice, the first column “CountAllRows” is doing a COUNT(*), which is returning the 830 rows in that table. For the second row, I chose a column with NULL
values in it, and if you see, we’re getting only 323 rows back. That means the other 507 rows have NULL
s.
Alright, moving on to the next one. We’re going to see how SUM works.
SUM()
SELECT SUM(UnitPrice) AS SumTotalUnitPrice FROM Products; SumTotalUnitPrice --------------------- 2222.71
What’s going on in here? So, I mentioned that an aggregate function returns a single value when applied to a set of data. So, here since we told it to SUM()
the UnitPrice column, we got back the total sum of every value in that column, we didn’t ask for any filtering or categorization or anything else. Later, I will show you a bit more complex example. Move on.
AVG()
If you can tell, this function it’s the abbreviation of Average, so the column we pass into that aggregate will return an average.
SELECT AVG(UnitPrice) AS AverageUnitPrice FROM Products; AverageUnitPrice --------------------- 28.8663
As I promised, what the function returned was the average of the UnitPrice column. It means that it went grabbing all the values in the column, adding them up one by one, and divided them by the total number of elements in the column. You know, an average. Next.
MIN() and MAX()
I bet you know what they do by now, right? Let’s go to the example.
SELECT MIN(UnitPrice) AS CheapestProduct , MAX(UnitPrice) AS MostExpensiveProduct FROM Products; CheapestProduct MostExpensiveProduct --------------------- --------------------- 2.00 263.50
By looking at the results, you might have a clue of what these functions do. If you haven’t, the MIN()
function went to the UnitPrice column and grabbed the lowest value, and MAX()
went on and grabbed the highest value it could find. Easy, right?
Aggregate functions are very useful when we are analyzing data, the thing is that we need to know what we are analyzing. In our previous examples, we didn’t show any other columns or values, we just saw a single result and we don’t know how to interpret that alone.
We need to group the results in order to be able to discover more about our data. For example, what are the cheapest and most expensive products we have? How many units of each product we have on inventory?
GROUPING
Grouping is exactly that… when we have just one column and an aggregation, it will take that column and group together the values that are equal. In our examples: ProductName, Customer, Orders and so on. When we have more than one column, and we want to apply an aggregation to those, it will produce a group for each combination of values.
I will show you that with some examples.
SELECT CategoryID , COUNT(*) AS CountOfProducts FROM Products GROUP BY CategoryID; CategoryID CountOfProducts ----------- --------------- 1 12 2 12 3 13 4 10 5 7 6 6 7 5 8 12
Let’s analyze the query above. I wanted to know how many product were in each category, so I included CategoryID and the COUNT(*)
. In grouped queries, the aggregates are applied to the group.
Now, because of this, we have to include the columns in the SELECT
list in the GROUP BY
clause. Let’s check a few more examples.
SELECT ShipCountry , COUNT(*) AS CountOfShipments FROM Orders GROUP BY ShipCountry ORDER BY COUNT(*) DESC; ShipCountry CountOfShipments --------------- ---------------- USA 122 Germany 122 Brazil 83 France 77 UK 56 Venezuela 46 Austria 40 Sweden 37 Canada 30 Mexico 28
In this query, we have the same pattern. We wanted to know which country we ship the most. If you see I included the ORDER BY COUNT(*) DESC
at the end, to sort the results in descending order so we can see who has more shipments.
We know which country we ship the most, but in that country which city is the one with more shipments? We can include the ShipCity column, in here it will depend on how we want to see our data sorted. If we keep ORDER BY COUNT(*) DESC
, we will see the city with most shipment, but it won’t be the US and Germany like in our last example.
SELECT ShipCountry , ShipCity , COUNT(*) AS CountOfShipments FROM Orders GROUP BY ShipCountry, ShipCity ORDER BY COUNT(*) DESC; ShipCountry ShipCity CountOfShipments --------------- --------------- ---------------- Brazil Rio de Janeiro 34 UK London 33 Brazil Sao Paulo 31 USA Boise 31 Austria Graz 30 Germany Cunewalde 28 Mexico México D.F. 28 Ireland Cork 19 Sweden Bräcke 19 USA Albuquerque 18 Sweden Luleå 18
I’m pasting only the first rows of the result set. If you see, the city with the highest amount of shipments is Rio de Janeiro in Brazil. If you see the US is now in a different place because of the sort we gave to the query.
FILTERING GROUPS
There would be times where we are looking for something specific within groups. For example in our last query, let’s say we want to show only Countries and Cities with more than 20 shipments. You might think that we can include a WHERE
clause to the query, but remember that we have an aggregation to a group. How can we do that?
SELECT ShipCountry , ShipCity , COUNT(*) AS CountOfShipments FROM Orders WHERE COUNT(*) >= 20 GROUP BY ShipCountry, ShipCity; Msg 147, Level 15, State 1, Line 80 An aggregate may not appear in the WHERE clause unless it is in a subquery contained in a HAVING clause or a select list, and the column being aggregated is an outer reference.
As you can see, if we try to filter with a WHERE
clause, we get an error message saying that we can’t do that because we have an aggregate, it also says something else about HAVING
.
The HAVING
clause is use exactly for the task we want to achieve, to filter groups.
SELECT ShipCountry , ShipCity , COUNT(*) AS CountOfShipments FROM Orders GROUP BY ShipCountry, ShipCity HAVING COUNT(*) >= 20; ShipCountry ShipCity CountOfShipments --------------- --------------- ---------------- USA Boise 31 Germany Cunewalde 28 Austria Graz 30 UK London 33 Mexico México D.F. 28 Brazil Rio de Janeiro 34 Brazil Sao Paulo 31
There you have it. Cities with more than 20 shipments. Analyzing the query we can see that we just changed WHERE
to HAVING
and the condition is the same, but there’s something else. If we want to filter groups, we first need to define our groups with GROUP BY
and then we can filter them out with HAVING
clause.
Well, that’s it for now. I hope you liked the post, don’t forget that you can find the code in my GitHub if you want to follow along, also if you don’t have a copy of Northwind database, you can get it here.
Thanks for reading, see you soon. Cheers! 🍻
Discover more from Ben Rodríguez
Subscribe to get the latest posts sent to your email.