Nov
13
2008
0

SQL Joins and Applies – Understanding OUTER JOIN

This is part of a four part post series on SQL Joins and Applies.
If you want to follow the examples given, you can get the T-SQL to create the tables, function and test data from my earlier post SQL Joins and Applies – Test Data.

The other posts in this series are:
SQL Joins and Applies – Understanding INNER JOIN
SQL Joins and Applies – Understanding CROSS JOIN
SQL Joins and Applies – Understanding APPLY

Outer Joins are joins that include rows even if they do not have related rows in the joined table. There are three types of outer join, each one specifies if you will always see the rows from the LEFT, RIGHT or both (FULL) tables.

From Books Online (For an outer Join):

A join that includes rows even if they do not have related rows in the joined table.

LEFT OUTER JOIN
From Books Online:

All rows from the first-named table (the “left” table, which appears leftmost in the JOIN clause) are included. Unmatched rows in the right table do not appear.

SELECT		tsi.sourceValue [sourceValue_left],
		tsi.sysID [sysID_left],
		tss.sysID [sysID_Right],
		tss.sourceValue [sourceValue_right]
FROM		dbo.tblSourceIntegers tsi
LEFT OUTER JOIN	dbo.tblSourceStrings tss
	ON	tsi.sysID = tss.sysID;

Results:

sourceValue_left sysID_left  sysID_Right sourceValue_right
---------------- ----------- ----------- -----------------
100              1           NULL        NULL
200              2           2           Apple
NULL             NULL        NULL        NULL
400              4           NULL        NULL
500              5           5           Dog

(5 row(s) affected)

As above, all of the values in the left table were returned.

RIGHT OUTER JOIN
From Books Online:

All rows in the second-named table (the “right” table, which appears rightmost in the JOIN clause) are included. Unmatched rows in the left table are not included.

SELECT			tsi.sourceValue [sourceValue_left],
			tsi.sysID [sysID_left],
			tss.sysID [sysID_Right],
			tss.sourceValue [sourceValue_right]
FROM			dbo.tblSourceIntegers tsi
RIGHT OUTER JOIN	dbo.tblSourceStrings tss
	ON		tsi.sysID = tss.sysID;

Results:

sourceValue_left sysID_left  sysID_Right sourceValue_right
---------------- ----------- ----------- -----------------
200              2           2           Apple
NULL             NULL        3           Banana
NULL             NULL        NULL        NULL
500              5           5           Dog
NULL             NULL        6           Egg

(5 row(s) affected)

As above, all of the values in the right table were returned.

FULL OUTER JOIN
From Books Online:

All rows in all joined tables are included, whether they are matched or not.

SELECT		tsi.sourceValue [sourceValue_left],
		tsi.sysID [sysID_left],
		tss.sysID [sysID_Right],
		tss.sourceValue [sourceValue_right]
FROM		dbo.tblSourceIntegers tsi
FULL OUTER JOIN	dbo.tblSourceStrings tss
	ON	tsi.sysID = tss.sysID;

Results:

sourceValue_left sysID_left  sysID_Right sourceValue_right
---------------- ----------- ----------- -----------------
100              1           NULL        NULL
200              2           2           Apple
NULL             NULL        NULL        NULL
400              4           NULL        NULL
500              5           5           Dog
NULL             NULL        3           Banana
NULL             NULL        NULL        NULL
NULL             NULL        6           Egg

(8 row(s) affected)

Looking at the above result, all of the values in the left table were returned first, followed by the unmatched values from the right table.

Written by John Burns in: SQL Tips, Tips |
Nov
12
2008
0

SQL Joins and Applies – Understanding CROSS JOIN

This is part of a four part post series on SQL Joins and Applies.
If you want to follow the examples given, you can get the T-SQL to create the tables, function and test data from my earlier post SQL Joins and Applies – Test Data.

The other posts in this series are:
SQL Joins and Applies – Understanding INNER JOIN
SQL Joins and Applies – Understanding OUTER JOIN
SQL Joins and Applies – Understanding APPLY

Cross join is probably the least useful join in a relational database.

From Books Online:

A join whose result set includes one row for each possible pairing of rows from the two tables.

Warning:
Cross Join can use a lot of memory when applied to large tables. Every row in the left table is repeated for every row of the right table (Unless you use WHERE to filter the results)

Here is a sample cross join:

SELECT		tsi.sourceValue [sourceValue_left],
		tsi.sysID [sysID_left],
		tss.sysID [sysID_Right],
		tss.sourceValue [sourceValue_right]
FROM		dbo.tblSourceIntegers tsi
CROSS JOIN	dbo.tblSourceStrings tss;

And here is the large result set:

sourceValue_left sysID_left  sysID_Right sourceValue_right
---------------- ----------- ----------- -----------------
100              1           2           Apple
200              2           2           Apple
NULL             NULL        2           Apple
400              4           2           Apple
500              5           2           Apple
100              1           3           Banana
200              2           3           Banana
NULL             NULL        3           Banana
400              4           3           Banana
500              5           3           Banana
100              1           NULL        NULL
200              2           NULL        NULL
NULL             NULL        NULL        NULL
400              4           NULL        NULL
500              5           NULL        NULL
100              1           5           Dog
200              2           5           Dog
NULL             NULL        5           Dog
400              4           5           Dog
500              5           5           Dog
100              1           6           Egg
200              2           6           Egg
NULL             NULL        6           Egg
400              4           6           Egg
500              5           6           Egg

(25 row(s) affected)

Note:
The following two queries are identical both in their result set AND in execution plan.

SELECT		tsi.sourceValue [sourceValue_left],
		tsi.sysID [sysID_left],
		tss.sysID [sysID_Right],
		tss.sourceValue [sourceValue_right]
FROM		dbo.tblSourceIntegers tsi
INNER JOIN	dbo.tblSourceStrings tss
	ON	tsi.sysID = tss.sysID;
 
SELECT		tsi.sourceValue [sourceValue_left],
		tsi.sysID [sysID_left],
		tss.sysID [sysID_Right],
		tss.sourceValue [sourceValue_right]
FROM		dbo.tblSourceIntegers tsi
CROSS JOIN	dbo.tblSourceStrings tss
WHERE		tsi.sysID = tss.sysID;
Written by John Burns in: SQL Tips, Tips |
Nov
11
2008
0

SQL Joins and Applies – Understanding INNER JOIN

Understanding INNER JOIN.

The other posts in this series are:
SQL Joins and Applies – Understanding CROSS JOIN
SQL Joins and Applies – Understanding OUTER JOIN
SQL Joins and Applies – Understanding APPLY

This is part of a four part post series on SQL Joins and Applies.
If you want to follow the examples given, you can get the T-SQL to create the tables, function and test data from my earlier post SQL Joins and Applies – Test Data.

Inner join is the most useful join to know in a relational database.

From Books Online:

A join that displays only the rows that have a match in both joined tables.

Lets jump straight into some SQL.

SELECT		tsi.sourceValue [sourceValue_left],
		tsi.sysID [sysID_left],
		tss.sysID [sysID_Right],
		tss.sourceValue [sourceValue_right]
FROM		dbo.tblSourceIntegers tsi
INNER JOIN	dbo.tblSourceStrings tss
	ON	tsi.sysID = tss.sysID;

The result of this is:

sourceValue_left sysID_left  sysID_Right sourceValue_right
---------------- ----------- ----------- -----------------
200              2           2           Apple
500              5           5           Dog

(2 row(s) affected)

Note that the use of a comparison
In the above example, the comparison operator is comparing the sysID from both tables.
Results are only returned if the sysID’s match. Remember that nothing (not even another NULL) will evaluate to true when compared with another NULL.

A comparison operator is mandatory in an inner join.

Note: The use of “ON 1=1″ is identical to a CROSS JOIN

Written by John Burns in: SQL Tips, Tips |
Nov
10
2008
0

SQL Joins and Applies – Test Data

I’m writing a four part series on Joins and Applies in SQL.

Below is the T-SQL to create the two tables and a function used to explain SQL Joins and Applies.

--CREATE A TABLE VALUED FUNCTION
--FUNCTION IS ONLY USED FOR CROSS AND OUTER APPLY
--CREATE A TABLE VALUED FUNCTION
--FUNCTION IS ONLY USED FOR CROSS AND OUTER APPLY
CREATE FUNCTION [dbo].[udfGet5Rows](@inputInteger INT)
RETURNS @t TABLE ([calculatedValue] INT) AS
BEGIN
	IF @inputInteger IS NOT NULL
	BEGIN
		INSERT INTO @t([calculatedValue])
		SELECT @inputInteger		UNION ALL
		SELECT @inputInteger + 1	UNION ALL
		SELECT NULL			UNION ALL
		SELECT @inputInteger + 3	UNION ALL
		SELECT @inputInteger + 4
 	END
 
	RETURN
END;
GO
--CREATE TWO TABLES
CREATE TABLE [dbo].[tblSourceIntegers]
(
sysID INT NULL,
sourceValue INT NULL
);
GO
 
CREATE TABLE [dbo].[tblSourceStrings]
(
sysID INT NULL,
sourceValue VARCHAR(16) NULL
);
GO
 
--INSERT VALUES INTO THE TABLES
INSERT INTO tblSourceIntegers(sysID,sourceValue)
	SELECT 1,	100		UNION ALL
	SELECT 2,	200		UNION ALL
	SELECT NULL,	NULL		UNION ALL
	SELECT 4,	400		UNION ALL
	SELECT 5,	500;
GO
 
INSERT INTO tblSourceStrings(sysID,sourceValue)
	SELECT 2,	'Apple'		UNION ALL
	SELECT 3,	'Banana'	UNION ALL
	SELECT NULL,	NULL		UNION ALL
	SELECT 5,	'Dog'		UNION ALL
	SELECT 6,	'Egg';
GO

Test your results with:

SELECT * FROM dbo.udfGet5Rows(5);
SELECT * FROM dbo.tblSourceIntegers;
SELECT * FROM tblSourceStrings;

Your results should be:

calculatedValue
---------------
5
6
NULL
8
9

(5 row(s) affected)

sysID       sourceValue
----------- -----------
1           100
2           200
NULL        NULL
4           400
5           500

(5 row(s) affected)

sysID       sourceValue
----------- ----------------
2           Apple
3           Banana
NULL        NULL
5           Dog
6           Egg

(5 row(s) affected)
Written by John Burns in: SQL Tips, Tips |
Nov
06
2008
0

High Performance Bulk Inserts into SQL – Considerations

Did you know that simply setting the Database Recovery Model to BULK_LOGGED does not guarantee minimal logging when inserting?

At the very least, for performance:

  • Recovery Model should be set to BULK_LOGGED.
  • Destination table should not be replicated.
  • Table must be empty if indexes are on it.
  • TABLOCK should be used (Where applicable).
  • Import should be done during times of low DB load or during specified downtime.

The following table describes logging that will happen during an insert when the Recovery Model is BULK_LOGGED.

  Non Clustered Index Clustered Index
  Yes No Yes No
Table Has Data Index Minimal Full Minimal
Table Empty Minimal Minimal Minimal Minimal

Remember also that if you are inserting in batches, only the first batch is into an empty table, the other inserts are into a table with data. The best solution is to always insert into an empty table with no indexes with the BULK_LOGGED Recovery model set.

Checking the recovery model is as easy as

SELECT DATABASEPROPERTYEX('[databasename]','Recovery');

If you are using SQL Server Enterprise Edition, remember that you can rebuild indexes ONLINE. This means that indexes can be rebuilt without locking the table. This means users can still perform searches (naturally without the indexes) while the indexes are rebuilding.

Written by John Burns in: SQL Tips, Tips |
Nov
04
2008
0

Covering Indexes in SQL 2005 – Results

After a few queries regarding my recent post on Covering Indexes, I decided to do a couple of tests to show you the results.

Below are the actual execution plan results using the table structure and SQL query from the earlier post. The table had three rows of data.

Results WITHOUT a covering index

Results WITHOUT a covering index

Results WITH a covering index

Results WITH a covering index

As you can (hopefully) see, the covering index resulted in a simpler execution plan.

If we compare the two subtree costs of 0.0032831 (Covering index) and 0.0065704 (No covering index) we get a performance gain of 2.001 or almost bang on 200% improvement.

I hope this helps

Written by John Burns in: SQL Tips, Tips |
Oct
31
2008
0

Low CPU Load Delays with SQL

Use WAITFOR to pause scripts while using minimal CPU overhead.

WAITFOR TIME '09:15'; --Wait until 9:15am
 
WAITFOR DELAY '00:00:05' --Wait for 5 seconds

It is NOT advisable to introduce delays into scripts while transactions are open. 

Reference: http://msdn.microsoft.com/en-us/library/ms187331.aspx

Written by John Burns in: SQL Tips, Tips |
Oct
28
2008
0

Covering Indexes in SQL 2005

For some strange reason, covering indexes are not used or understood as much as they should.

To quote Microsoft… “A Covering Index is a nonclustered index built upon all of the columns required to satisfy a SQL Server query, both in the selection criteria and in the WHERE clause”.

Lets look at it from a technical perspective.

We have a table:

CREATE TABLE dbo.testTable (
sysID		INT 		IDENTITY(1,1)	PRIMARY KEY,
firstName 	VARCHAR(64)	NOT NULL,
lastName 	VARCHAR(64)	NOT NULL,
employeeID 	SMALLINT	NOT NULL	UNIQUE NONCLUSTERED
)

As you can see above, for some other (we’ll assume good) reason we’ve put a primary key and hence a clustered index on the sysID. We’ve also put a unique nonclustered constraint on employeeID.

Consider the following query:

SELECT	firstName
FROM	testTable
WHERE	employeeID = 12345

When this query is executed, the unique nonclustered index is searched for the employeeID value of 12345. Once found, the nonclustered index will return a pointer to the actual row of the table. At this point, the row is looked up and the firstName is retrieved.

With a covering index, all of the required data is included in the index. In this simple case, we could just create an ascending index on employeeID and firstName to create a covering index, but lets assume that the total amount of data in the index is more that 900 bytes per record (the limit for a index entry) but we need more than this limit. In this case we can use the INCLUDE clause.

We can use the following sql (drop the table first if it exists):

CREATE TABLE dbo.testTable (
sysID		INT 		IDENTITY(1,1)	PRIMARY KEY,
firstName 	VARCHAR(64)	NOT NULL,
lastName 	VARCHAR(64)	NOT NULL,
employeeID 	SMALLINT	NOT NULL
);
GO
 
CREATE UNIQUE NONCLUSTERED INDEX
	IX_dbo_testTable_employeeID_INC_firstName
ON		dbo.testTable(employeeID ASC)
INCLUDE	(firstName);
GO

Now, when the select statement earlier in this post is executed, all the data required to complete the query is “covered” in the index.

A covering index saves a table join to get the results. You can easily expect a 200% performance increase on most queries.

Written by John Burns in: SQL Tips, Tips |