T-SQL/MSBI Knowledge Share Videos

Group By/Distinct/Count

Group BY

Group BY is used to summarize the table data. As the name implies it will group the set of similar data in a column. Grouping may be based on only one column or multiple columns. Below examples illustrate this:

CREATE TABLE dbo.Employee
(
EmpID VARCHAR(5),
EmpName VARCHAR(40),
EmpDept VARCHAR(10),
EmpLoc VARCHAR(40),
EmpSal MONEY
)

INSERT INTO dbo.Employee VALUES('E001','AAA','HR','Delhi',20000)
INSERT INTO dbo.Employee VALUES('E002','BBB','IT','Bangalore',30000)
INSERT INTO dbo.Employee VALUES('E003','CCC','IT','Delhi',15000)
INSERT INTO dbo.Employee VALUES('E004','AAA','HR','Bangalore',10000)
INSERT INTO dbo.Employee VALUES('E005','GGG','FIN','Hydrabad',25000)
INSERT INTO dbo.Employee VALUES('E006','YYY','FIN','Hydrabad',26000)
INSERT INTO dbo.Employee VALUES('E007','SSS','IT','Goa',40000)
INSERT INTO dbo.Employee VALUES('E008','WWW','IT','HP',26000)
INSERT INTO dbo.Employee VALUES('E009','WWW','FIN','HP',80000)
INSERT INTO dbo.Employee VALUES('E010','QQQ','FIN','Gujrath',13000)

SELECT * FROM dbo.Employee

SELECT * FROM dbo.Employee ORDER BY EmpDept















In the above fig we can see that EmpDept has a set of similar values as highlighted in read box : FIN,HR,IT.

Single Column Grouping :

Below query returns Number of Employees in each department.

SELECT EmpDept,COUNT(*) AS NoOfEmp
FROM dbo.Employee
GROUP BY EmpDept
ORDER BY EmpDept

Below query returns Number of Employees in each department with the following condition:
  • A dept should have more than 2  employees, only those records should be displayed. 
So HR dept will not be returned in the output as it has 2 employees. 

SELECT EmpDept,COUNT(*) AS NoOfEmp
FROM dbo.Employee
GROUP BY EmpDept
HAVING COUNT(*)>2
ORDER BY EmpDept








Working:












Below query returns the Number of Employees in each dept with the following condition:
  • A dept should have more than 2  employees, only those records should be displayed. 
  • Except FIN dept all other dept should be displayed.
SELECT EmpDept,COUNT(*) AS NoOfEmp
FROM dbo.Employee
WHERE EmpDept<>'FIN'
GROUP BY EmpDept
HAVING COUNT(*)>2
ORDER BY EmpDept







Working:














Multiple Column Grouping :

Below query is the example for multiple column grouping which will return, in a dept how many employees are from same places. In below example we can see, from FIN dept from Hydrabad there are 2 employees and rest all have 1 employee.

SELECT EmpDept,EmpLoc,COUNT(*) AS NoOfEmp
FROM dbo.Employee
GROUP BY EmpDept,EmpLoc
ORDER BY EmpDept

Working:

Other Examples :

Similarly instead of count we can use aggregate functions like MIN(), MAX(), AVG(), SUM() etc.
Below queries returns Minimum and Maximum salary in each department

SELECT EmpDept,MIN(EmpSal) AS MinSal
FROM dbo.Employee
GROUP BY EmpDept




















SELECT EmpDept,MAX(EmpSal) AS MaxSal
FROM dbo.Employee
GROUP BY EmpDept

Below query returns average salary in each department.

SELECT EmpDept,AVG(EmpSal) AS AvgSal
FROM dbo.Employee
GROUP BY EmpDept

Below query returns total salary paid in each department.

SELECT EmpDept,SUM(EmpSal) AS SumSal
FROM dbo.Employee
GROUP BY EmpDept

HAVING and WHERE

People say that HAVING cannot exists without GROUP BY, but this is wrong. HAVING can exists without GROUP BY. When HAVING alone is used its compulsory to use aggregate function. In the absence of GROUP BY HAVING will group entire table as one and applies aggregate function.

WHERE is also a filter condition and HAVING is also a filter condition. WHERE applies filter on single record, HAVING applies filter  on grouped data. In where clause we cannot use aggregate functions but in having clause we can use aggregate function.

Having works like Where clause with out Group By Clause.

Below query will return the table count if HAVING condition is satisfied or else no output is displayed. Here since HAVING exists without GROUP BY entire table is grouped as one . Since Avg of EmpSal in the entire table is 28500 ((20000.00 + 30000.00 + 15000.00 + 10000.00 + 25000.00 + 26000.00 + 40000.00 + 26000.00 + 80000.00 + 13000.00)/10) and in the Having clause it is mentioned that if this avg is greater than 28000 ,output 10 is displayed which is the count of table records. It is something like, if HAVING condition is true it will return the select statement output. Similarly you can find the one for average also.

SELECT COUNT(*)
FROM dbo.Employee
HAVING AVG(EmpSal)>28000
SELECT AVG(EmpSal)
FROM dbo.Employee
HAVING AVG(EmpSal)>28000


Distinct/Count

Distinct will remove the duplicate records in the output as shown in below example:

SELECT DISTINCT EmpDept FROM dbo.Employee
Below query first it will take the count of EmpDept values and then applies distinct on the result obtained. First it will return 10 and DISTINCT 10 is always 10.

SELECT DISTINCT COUNT(EmpDept) FROM dbo.Employee
Below query first it will take the distinct values and then applies count on the result obtained. First it will return the distinct EmpDept which is 3 records IT,FIN,HR then COUNT of obtained result is 3.

SELECT COUNT(DISTINCT EmpDept) FROM dbo.Employee
Lets see DISTINCT and COUNT with NULL's and BLANK's in a column
UPDATE dbo.Employee
SET EmpLoc=''
WHERE EmpID='E006'

UPDATE dbo.Employee
SET EmpLoc=NULL
WHERE EmpID='E007'

SELECT DISTINCT EmpLoc FROM dbo.Employee














SELECT COUNT(EmpLoc) FROM dbo.Employee
SELECT COUNT(*) FROM dbo.Employee
SELECT COUNT(EmpLoc) AS EmpLocCnt,COUNT(EmpSal) EmpSalCnt FROM dbo.Employee

  • DISTINCT will consider both NULL as well as BLANK values. NULL is nothing whereas BLANK is a string of zero characters.
  • COUNT will not consider NULL values but it considers BLANK values.
  • *” represents all the columns in the table. So when COUNT(*) is specified the count is taken on group of columns. So irrespective of one column EmpLoc it will consider all columns and displays count as 12.
CREATE TABLE dbo.TestCnt
(
Col1 varchar(max),
Col2 varchar(max),
Col3 varchar(max),
Col4 varchar(max)
)

insert into dbo.TestCnt values(NULL,NULL,NULL,NULL)
insert into dbo.TestCnt values(NULL,NULL,NULL,NULL)
insert into dbo.TestCnt values(NULL,NULL,NULL,NULL)
insert into dbo.TestCnt values(NULL,NULL,NULL,NULL)
insert into dbo.TestCnt values(NULL,NULL,NULL,NULL)
insert into dbo.TestCnt values(NULL,NULL,NULL,NULL)
insert into dbo.TestCnt values(NULL,NULL,NULL,NULL)

select * from dbo.TestCnt













select COUNT(*) from dbo.TestCnt     -----> 7
select COUNT(1) from dbo.TestCnt     -----> 7
select COUNT(Col1) from dbo.TestCnt  -----> 0
select COUNT(Col2) from dbo.TestCnt  -----> 0
select COUNT(Col1,Col2) from dbo.TestCnt  -----> Error
select COUNT(Col1),COUNT(Col2) from dbo.TestCnt  -----> 0,0
select COUNT(1),COUNT(Col2) from dbo.TestCnt  -----> 7,0
select COUNT(1),COUNT(*) from dbo.TestCnt  -----> 7,7
select COUNT(*),COUNT(*) from dbo.TestCnt  -----> 7,7

CREATE TABLE dbo.TestCnt1
(
Col1 varchar(max),
Col2 varchar(max),
Col3 varchar(max),
Col4 varchar(max)
)

insert into dbo.TestCnt1 values('',NULL,NULL,NULL)
insert into dbo.TestCnt1 values(NULL,'',NULL,NULL)
insert into dbo.TestCnt1 values(NULL,NULL,NULL,NULL)
insert into dbo.TestCnt1 values(NULL,NULL,NULL,NULL)
insert into dbo.TestCnt1 values(NULL,NULL,NULL,NULL)
insert into dbo.TestCnt1 values(NULL,NULL,NULL,NULL)
insert into dbo.TestCnt1 values(NULL,'',NULL,NULL)

select * from dbo.TestCnt1
  











select COUNT(*) from dbo.TestCnt1     -----> 7
select COUNT(1) from dbo.TestCnt1     -----> 7
select COUNT(Col1) from dbo.TestCnt1  -----> 1
select COUNT(Col2) from dbo.TestCnt1  -----> 2
select COUNT(Col1,Col2) from dbo.TestCnt1  -----> Error
select COUNT(Col1),COUNT(Col2) from dbo.TestCnt1  -----> 1,2
select COUNT(1),COUNT(Col2) from dbo.TestCnt1  -----> 7,2
select COUNT(1),COUNT(*) from dbo.TestCnt1  -----> 7,7
select COUNT(*),COUNT(*) from dbo.TestCnt1  -----> 7,7

No comments: