T-SQL/MSBI Knowledge Share: Group By/Distinct/Count

Group BY

Group BY is used to summarize the table data. As the name implies it will group the set of similar data in a column. Grouping may be based on only one column or multiple columns. Below examples illustrate this:

CREATE TABLE dbo.Employee

(

EmpID VARCHAR(5),

EmpName VARCHAR(40),

EmpDept VARCHAR(10),

EmpLoc VARCHAR(40),

EmpSal MONEY

)

INSERT INTO dbo.Employee VALUES('E001','AAA','HR','Delhi',20000)

INSERT INTO dbo.Employee VALUES('E002','BBB','IT','Bangalore',30000)

INSERT INTO dbo.Employee VALUES('E003','CCC','IT','Delhi',15000)

INSERT INTO dbo.Employee VALUES('E004','AAA','HR','Bangalore',10000)

INSERT INTO dbo.Employee VALUES('E005','GGG','FIN','Hydrabad',25000)

INSERT INTO dbo.Employee VALUES('E006','YYY','FIN','Hydrabad',26000)

INSERT INTO dbo.Employee VALUES('E007','SSS','IT','Goa',40000)

INSERT INTO dbo.Employee VALUES('E008','WWW','IT','HP',26000)

INSERT INTO dbo.Employee VALUES('E009','WWW','FIN','HP',80000)

INSERT INTO dbo.Employee VALUES('E010','QQQ','FIN','Gujrath',13000)

SELECT * FROM dbo.Employee

SELECT * FROM dbo.Employee ORDER BY EmpDept

In the above fig we can see that EmpDept has a set of similar values as highlighted in read box : FIN,HR,IT.

Single Column Grouping :

Below query returns Number of Employees in each department.

SELECT EmpDept,COUNT(*) AS NoOfEmp

FROM dbo.Employee

GROUP BY EmpDept

ORDER BY EmpDept

Below query returns Number of Employees in each department with the following condition:

A dept should have more than 2 employees, only those records should be displayed.

So HR dept will not be returned in the output as it has 2 employees.

SELECT EmpDept,COUNT(*) AS NoOfEmp

FROM dbo.Employee

GROUP BY EmpDept

HAVING COUNT(*)>2

ORDER BY EmpDept

Working:

Below query returns the Number of Employees in each dept with the following condition:

A dept should have more than 2 employees, only those records should be displayed.
Except FIN dept all other dept should be displayed.

SELECT EmpDept,COUNT(*) AS NoOfEmp

FROM dbo.Employee

WHERE EmpDept<>'FIN'

GROUP BY EmpDept

HAVING COUNT(*)>2

ORDER BY EmpDept

Working:

Multiple Column Grouping :

Below query is the example for multiple column grouping which will return, in a dept how many employees are from same places. In below example we can see, from FIN dept from Hydrabad there are 2 employees and rest all have 1 employee.

SELECT EmpDept,EmpLoc,COUNT(*) AS NoOfEmp

FROM dbo.Employee

GROUP BY EmpDept,EmpLoc

ORDER BY EmpDept

Working:

Other Examples :

Similarly instead of count we can use aggregate functions like MIN(), MAX(), AVG(), SUM() etc.

Below queries returns Minimum and Maximum salary in each department

SELECT EmpDept,MIN(EmpSal) AS MinSal

FROM dbo.Employee

GROUP BY EmpDept

SELECT EmpDept,MAX(EmpSal) AS MaxSal

FROM dbo.Employee

GROUP BY EmpDept

Below query returns average salary in each department.

SELECT EmpDept,AVG(EmpSal) AS AvgSal

FROM dbo.Employee

GROUP BY EmpDept

Below query returns total salary paid in each department.

SELECT EmpDept,SUM(EmpSal) AS SumSal

FROM dbo.Employee

GROUP BY EmpDept

HAVING and WHERE

People say that HAVING cannot exists without GROUP BY, but this is wrong. HAVING can exists without GROUP BY. When HAVING alone is used its compulsory to use aggregate function. In the absence of GROUP BY HAVING will group entire table as one and applies aggregate function.

WHERE is also a filter condition and HAVING is also a filter condition. WHERE applies filter on single record, HAVING applies filter on grouped data. In where clause we cannot use aggregate functions but in having clause we can use aggregate function.

Having works like Where clause with out Group By Clause.

Below query will return the table count if HAVING condition is satisfied or else no output is displayed. Here since HAVING exists without GROUP BY entire table is grouped as one . Since Avg of EmpSal in the entire table is 28500 ((20000.00 + 30000.00 + 15000.00 + 10000.00 + 25000.00 + 26000.00 + 40000.00 + 26000.00 + 80000.00 + 13000.00)/10) and in the Having clause it is mentioned that if this avg is greater than 28000 ,output 10 is displayed which is the count of table records. It is something like, if HAVING condition is true it will return the select statement output. Similarly you can find the one for average also.

SELECT COUNT(*)

FROM dbo.Employee

HAVING AVG(EmpSal)>28000

SELECT AVG(EmpSal)

FROM dbo.Employee

HAVING AVG(EmpSal)>28000

Distinct/Count

Distinct will remove the duplicate records in the output as shown in below example:

SELECT DISTINCT EmpDept FROM dbo.Employee

Below query first it will take the count of EmpDept values and then applies distinct on the result obtained. First it will return 10 and DISTINCT 10 is always 10.

SELECT DISTINCT COUNT(EmpDept) FROM dbo.Employee

Below query first it will take the distinct values and then applies count on the result obtained. First it will return the distinct EmpDept which is 3 records IT,FIN,HR then COUNT of obtained result is 3.

SELECT COUNT(DISTINCT EmpDept) FROM dbo.Employee

Lets see DISTINCT and COUNT with NULL's and BLANK's in a column

UPDATE dbo.Employee

SET EmpLoc=''

WHERE EmpID='E006'

UPDATE dbo.Employee

SET EmpLoc=NULL

WHERE EmpID='E007'

SELECT DISTINCT EmpLoc FROM dbo.Employee

SELECT COUNT(EmpLoc) FROM dbo.Employee

SELECT COUNT(*) FROM dbo.Employee

SELECT COUNT(EmpLoc) AS EmpLocCnt,COUNT(EmpSal) EmpSalCnt FROM dbo.Employee

DISTINCT will consider both NULL as well as BLANK values. NULL is nothing whereas BLANK is a string of zero characters.
COUNT will not consider NULL values but it considers BLANK values.
“*” represents all the columns in the table. So when COUNT(*) is specified the count is taken on group of columns. So irrespective of one column EmpLoc it will consider all columns and displays count as 12.

CREATE TABLE dbo.TestCnt

(

Col1 varchar(max),

Col2 varchar(max),

Col3 varchar(max),

Col4 varchar(max)

)

insert into dbo.TestCnt values(NULL,NULL,NULL,NULL)

select * from dbo.TestCnt

select COUNT(*) from dbo.TestCnt -----> 7

select COUNT(1) from dbo.TestCnt -----> 7

select COUNT(Col1) from dbo.TestCnt -----> 0

select COUNT(Col2) from dbo.TestCnt -----> 0

select COUNT(Col1,Col2) from dbo.TestCnt -----> Error

select COUNT(Col1),COUNT(Col2) from dbo.TestCnt -----> 0,0

select COUNT(1),COUNT(Col2) from dbo.TestCnt -----> 7,0

select COUNT(1),COUNT(*) from dbo.TestCnt -----> 7,7

select COUNT(*),COUNT(*) from dbo.TestCnt -----> 7,7

CREATE TABLE dbo.TestCnt1

(

Col1 varchar(max),

Col2 varchar(max),

Col3 varchar(max),

Col4 varchar(max)

)

insert into dbo.TestCnt1 values('',NULL,NULL,NULL)

insert into dbo.TestCnt1 values(NULL,'',NULL,NULL)

insert into dbo.TestCnt1 values(NULL,NULL,NULL,NULL)

insert into dbo.TestCnt1 values(NULL,'',NULL,NULL)

select * from dbo.TestCnt1

select COUNT(*) from dbo.TestCnt1 -----> 7

select COUNT(1) from dbo.TestCnt1 -----> 7

select COUNT(Col1) from dbo.TestCnt1 -----> 1

select COUNT(Col2) from dbo.TestCnt1 -----> 2

select COUNT(Col1,Col2) from dbo.TestCnt1 -----> Error

select COUNT(Col1),COUNT(Col2) from dbo.TestCnt1 -----> 1,2

select COUNT(1),COUNT(Col2) from dbo.TestCnt1 -----> 7,2

select COUNT(1),COUNT(*) from dbo.TestCnt1 -----> 7,7

select COUNT(*),COUNT(*) from dbo.TestCnt1 -----> 7,7

T-SQL/MSBI Knowledge Share

T-SQL/MSBI Knowledge Share Videos

Group By/Distinct/Count

No comments: