SQL Server and BI Tools: Basic SQL Server Questions

Explain DML, DDL, DCL and TCL statements with examples?

DML: DML stands for Data Manipulation Language. DML is used to retrieve, store, modify, delete, insert and update data in database.
Examples of DML statements: SELECT, UPDATE, INSERT, DELETE statements.

DDL: DDL stands for Data Definition Language. DDL is used to create and modify the structure of database objects.

Examples: CREATE, ALTER, DROP statements.

DCL: DCL stands for Data Control Language. DCL is used to create roles, grant and revoke permissions, establish referential integrity etc.
Examples: GRANT, REVOKE statements

TCL: TCL stands for Transactional Control Language. TCL is used to manage transactions within a database.
Examples: COMMIT, ROLLBACK statements

What is the difference between Drop, Delete and Truncate statements in SQL Server?

Drop, Delete and Truncate - All operations can be rolled back.

All the statements (Delete, Truncate and Drop) are logged operations, but the amount of information that is logged varies. Delete statement logs an entry in the transaction log for each deleted row, where as Truncate Table logs only the Page deallocations.Hence, truncate is a little faster than Delete.

You can have a where clause in Delete statement where as Truncate statement cannot have a where clause. Truncate will delete all the rows in a Table, but the structure of the table remains. Drop would delete all the rows including the structure of the Table.

Please refer to the screen shot below for the differences summary snapshot between Drop, Delete and Truncate statements in SQL Server.

What is Cascading referential integrity constraint?
Cascading referential integrity constraints allow you to define the actions Microsoft SQL Server should take when a user attempts to delete or update a key to which an existing foreign keys point.

You can instruct SQL Server to do the following:

1. No Action: This is the default behaviour. No Action specifies that if an attempt is made to delete or update a row with a key referenced by foreign keys in existing rows in other tables, an error is raised and the DELETE or UPDATE is rolled back.

2. Cascade: Specifies that if an attempt is made to delete or update a row with a key referenced by foreign keys in existing rows in other tables, all rows containing those foreign keys are also deleted or updated.

3. Set NULL: Specifies that if an attempt is made to delete or update a row with a key referenced by foreign keys in existing rows in other tables, all rows containing those foreign keys are set to NULL.

4. Set Default: Specifies that if an attempt is made to delete or update a row with a key referenced by foreign keys in existing rows in other tables, all rows containing those foreign keys are set to default values.

Difference between primary key and unique key in SQL Server?
1. A table can have only one primary key. On the other hand a table can have more than one unique key.
2. Primary key column does not accept any null values, where as a unique key column accept one null value.

What are the 2 types of Temporary Tables in SQL Server?
1. Local Temporary Tables
2. Global Temporary Tables

What is the difference between Local and Global Temporary Tables?
Local Temporary Tables:
1. Prefixed with a single pound sign (#).
2. Local temporary tables are visible to that session of SQL Server which has created it.
3. Local temporary tables are automatically dropped, when the session that created the temporary tables is closed.

Global Temporary Tables:
1. Prefixed with two pound signs (##).
2. Global temporary tables are visible to all the SQL server sessions.
3. Global temporary tables are also automatically dropped, when the session that created the temporary tables is closed.

Can you create foreign key constraints on temporary tables?
No

Do you have to manually delete temporary tables?
No, temporary tables are automatically dropped, when the session that created the temporary tables is closed. But if you maintain a persistent connection or if connection pooling is enabled, then it is better to explicitly drop the temporary tables you have created.
However, It is generally considered a good coding practice to explicitly drop every temporary table you create.

In which database, the temporary tables get created?
TEMPDB database.

What is the difference between a Temporary Table and a Table Variable?
Or
When do you use Table Variable over a Temporary Table and vice versa?

1. Table variable is created in the memory where as a temporary table is created in the TempDB. But, if there is a memory pressure, the pages belonging to a table variable may be pushed out to tempdb.

2. Table variables cannot be involved in transactions, logging or locking. This makes table variable faster than a temporary table.

3. You can pass table variable as parameter to functions and stored procedures, where as you cannot do the same with temporary table.

4. A temporary table can have indexes, whereas a table variable can only have a primary index. If speed is an issue Table variables can be faster, but if there are a lot of records, or there is a need to search the temporary table based on a clustered index, then a Temporary Table would be better. If you have less than 100 rows generally use a table variable. Otherwise use a temporary table. This is because SQL Server won't create statistics on table variables.

What is the use of an Index in SQL Server?
Relational databases like SQL Server use indexes to find data quickly when a query is processed. Creating the proper index can drastically increase the performance of an application.

What is a table scan?
or
What is the impact of table scan on performance?

When a SQL Server has no index to use for searching, the result is similar to the reader who looks at every page in a book to find a word. The SQL engine needs to visit every row in a table. In database terminology we call this behavior a table scan, or just scan. A full table scan of a very large table can adversely affect the performance. Creating proper indexes will allow the database to quickly narrow in on the rows to satisfy the query, and avoid scanning every row in the table.

What is the system stored procedure that can be used to list all the indexes that are created for a specific table?
sp_helpindex is the system stored procedure that can be used to list all the indexes that are created for a specific table.

For example, to list all the indexes on table tblCustomers, you can use the following command.
EXEC sp_helpindex tblCustomers

What is the purpose of query optimizer in SQL Server?
An important feature of SQL Server is a component known as the query optimizer. The query optimizer's job is to find the fastest and least resource intensive means of executing incoming queries. An important part of this job is selecting the best index or indexes to perform the task.

What is the first thing you will check for, if the query below is performing very slow?
SELECT * FROM tblProducts ORDER BY UnitPrice ASC

Check if there is an Index created on the UntiPrice column used in the ORDER BY clause. An index on the UnitPrice column can help the above query to find data very quickly.When we ask for a sorted data, the database will try to find an index and avoid sorting the results during execution of the query. We control sorting of a data by specifying a field, or fields, in an ORDER BY clause, with the sort order as ASC (ascending) or DESC (descending).

With no index, the database will scan the tblProducts table and sort the rows to process the query. However, if there is an index, it can provide the database with a presorted list of prices. The database can simply scan the index from the first entry to the last entry and retrieve the rows in sorted order.

The same index works equally well with the following query, simply by scanning the index in reverse.
SELECT * FROM tblProducts ORDER BY UnitPrice DESC

What is the significance of an Index on the column used in the GROUP BY clause?
Creating an Index on the column, that is used in the GROUP BY clause, can greatly improve the perofrmance. We use a GROUP BY clause to group records and aggregate values, for example, counting the number of products with the same UnitPrice. To process a query with a GROUP BY clause, the database will often sort the results on the columns included in the GROUP BY.

The following query counts the number of products at each price by grouping together records with the same UnitPrice value.
SELECT UnitPrice, Count(*) FROM tblProducts GROUP BY UnitPrice

The database can use the index (Index on UNITPRICE column) to retrieve the prices in order. Since matching prices appear in consecutive index entries, the database is able to count the number of products at each price quickly. Indexing a field used in a GROUP BY clause can often speed up a query.

What is the role of an Index in maintaining a Unique column in table?
Columns requiring unique values (such as primary key columns) must have a unique index applied. There are several methods available to create a unique index.
1. Marking a column as a primary key will automatically create a unique index on the column.
2. We can also create a unique index by checking the Create UNIQUE checkbox when creating the index graphically.
3. We can also create a unique index using SQL with the following command:
CREATE UNIQUE INDEX IDX_ProductName On Products (ProductName)

The above SQL command will not allow any duplicate values in the ProductName column, and an index is the best tool for the database to use to enforce this rule. Each time an application adds or modifies a row in the table, the database needs to search all existing records to ensure none of values in the new data duplicate existing values.

What are the disadvantages of an Index?
There are 2 disadvantages of an Index
1. Increased Disk Space
2. Insert, Update and Delete statements could be slow. In short, all DML statements could be slow.

Disk Space: Indexes are stored on the disk, and the amount of space required will depend on the size of the table, and the number and types of columns used in the index. Disk space is generally cheap enough to trade for application performance, particularly when a database serves a large number of users.

Insert, Update and Delete statements could be slow: Another downside to using an index is the performance implication on data modification statements. Any time a query modifies the data in a table (INSERT, UPDATE, or DELETE), the database needs to update all of the indexes where data has changed. Indexing can help the database during data modification statements by allowing the database to quickly locate the records to modify, however, providing too many indexes to update can actually hurt the performance of data modifications. This leads to a delicate balancing act when tuning the database for performance.

What are the 2 types of Indexes in SQL Server?
1. Clustered Index
2. Non Clustered Index

How many Clustered and Non Clustered Indexes can you have per table?
Clustered Index - Only one Clustered Index per table. A clustered index contains all of the data for a table in the index, sorted by the index key. Phone Book is an example for Clustered Index.
Non Clustered Index - You can have multiple Non Clustered Indexes per table. Index at the back of a book is an example for Non Clustered Index.

Which Index is faster, Clustered or Non Clustered Index?
Clustered Index is slightly faster than Non Clustered Index. This is because, when a Non Clustered Index is used there is an extra look up from the Non Clustered Index to the table, to fetch the actual rows.

When is it usually better to create a unique nonclustered index on the primary key column?
Sometimes it is better to use a unique nonclustered index on the primary key column, and place the clustered index on a column used by more queries. For example, if the majority of searches are for the price of a product instead of the primary key of a product, the clustered index could be more effective if used on the price field.

What is a Composite Index in SQL Server?
or
What is the advantage of using a Composite Index in SQL Server?
or
What is Covering Query?
A composite index is an index on two or more columns. Both clustered and nonclustered indexes can be composite indexes. If all of the information for a query can be retrieved from an Index. A clustered index, if selected for use by the query optimizer, always covers a query, since it contains all of the data in a table. By creating a composite indexes, we can have covering queries.

What is a Trigger in SQL Server?
A Trigger is a database object that is attached to a table. In many aspects it is similar to a stored procedure. As a matter of fact, triggers are often referred to as a "special kind of stored procedure." The main difference between a trigger and a stored procedure is that the former is attached to a table and is only fired when an INSERT, UPDATE or DELETE occurs.

What are the two types of Triggers in SQL Server?
1. After Triggers : Fired after Insert, Update and Delete operations on a table.
2. Instead of Triggers: Fired instead of Insert, Update and Delete operations on a table.

What are the special tables used by Triggers in SQL Server?
Triggers make use of two special tables called inserted and deleted. The inserted table contains the data referenced in an INSERT before it is actually committed to the database. The deleted table contains the data in the underlying table referenced in a DELETE before it is actually removed from the database. When an UPDATE is issued both tables are used. More specifically, the new data referenced in the UPDATE statement is contained in inserted table and the data that is being updated is contained in deleted table.

What is the difference between a User Defined Function (UDF) and a Stored Procedure (SP) in SQL Server

1. Stored Procedure support deffered name resolution where as functions do not support deffered name resolution.

2. User Defined Function can be used in a select statement where as you cannot use a stored procedure in a select statement.

3. UDF's cannot return Image, Text where as a StoredProcedure can return any datatype.

4. In general User Defined Functions are used for computations where as Stored Procedures are used for performing business logic.

5. UDF should return a value where as Stored Procedure need not.

6. User Defined Functions accept lesser number of input parameters than Stored Procedures. UDF can have upto 1023 input parameters where as a Stored Procedure can have upto 21000 input parameters.

7. Temporary Tables can not be used in a UDF where as a StoredProcedure can use Temporary Tables.

8. UDF can not Execute Dynamic SQL where as a Stored Procedure can execute Dynamic SQL.

9. User Defined Function does not support error handling where as Stored Procedure support error handling. RAISEERROR or @@ERROR are not allowed in UDFs.

What is a View in SQL Server?
You can think of a view either as a compiled sql query or a virtual table. As a view represents a virtual table, it does not physically store any data. When you query a view, you actually retrieve the data from the underlying base tables.

What are the advantages of using views?
Or
When do you usually use views?
1. Views can be used to implement row level and column level security.
2. Simplify the database schema to the users. You can create a view based on multiple tables which join columns from all these multiple tables so that they look like a single table.
3. Views can be used to present aggregated and summarized data.

Can you create a view based on other views?
Yes, you can create a view based on other views. Usually we create views based on tables, but it also possible to create views based on views.

Can you update views?
Yes, views can be updated. However, updating a view that is based on multiple tables, may not update the underlying tables correctly. To correctly update a view that is based on multiple tables you can make use INSTEAD OF triggers in SQL Server.

What are indexed views?
Or
What are materialized views?

A view is a virtual table, it does not contain any physical data. A view is nothing more than compiled SQL query. Every time, we issue a select query against a view, we actually get the data from the underlying base tables and not from the view, as the view itself does not contain any data.

When you create an index on a view, the data gets physically stored in the view. So, when we issue a select query against an indexed view, the data is retrieved from the index without having to go to the underlying table, which will make the select statement to work slightly faster. However, the disadvantage is, INSERT, UPDATE and DELETE operations will become a little slow, because every time you insert or delete a row from the underlying table, the view index needs to be updated. Inshort, DML operations will have negative impact on performance.

Oracle refers to indexed views as materialized views.

Only the views created with schema binding, can have an Index. Simply adding WITH SCHEMABINDING to the end of the CREATE VIEW statement will accomplish this. However, the effect is that any changes to the underlying tables which will impact the view are not allowed. Since the indexed view is stored physically, any schema changes would impact the schema of the stored results set. Therefore, SQL Server requires that schema binding be used to prevent the view's schema (and therefore the underlying tables) from changing.

The first index for a view must be a UNIQUE CLUSTERED INDEX, after which, it's possible to create non-clustered indexes against the view.

Indexed Views are heavily used in data warehouses and reporting databases that are not highly transactional.

What are the limitations of a View?
1. You cannot pass parameters to a view.
2. Rules and Defaults cannot be associated with views.
3. The ORDER BY clause is invalid in views unless TOP or FOR XML is also specified.
4. Views cannot be based on temporary tables.

What are the different types of joins available in sql server?
There are 3 different types of joins available in sql server, and they are
1. Cross Join
2. Inner Join or Join
3. Outer Join

Outer Join is again divided into 3 types as shown below.
1. Left Outer Join or Left Join
2. Right Outer Join or Right Join
3. Full Outer Join or Full Join

You might have heard about self join, but self join is not a different type of join. A self join means joining a table with itself. We can have an inner self join or outer self join.

Key Points to remember about CROSS JOIN.
1. A cross join produces the Cartesian product of the tables involved in the join.This mean every row in the Left Table is joined to every row in the Right Table. Candidate is LEFT Table and Company is RIGHT Table.
2. In real time scenarios we rarley use CROSS JOIN. Most often we use either INNER JOIN or LEFT OUTER JOIN.
3. CROSS JOIN does not have an ON clause with a Join Condition. All the other JOINS use ON clause with a Join Condition.
4. Using an ON clause on a CROSS JOIN would generate a syntax error.

Index Scan:
Index Scan scans each and every record in the index. Table Scan is where the table is processed row by row from beginning to end. If the index is a clustered index then an index scan is really a table scan. Since a scan touches every row in the table whether or not it qualifies, the cost is proportional to the total number of rows in the table. Hence, a scan is an efficient strategy only if the table is small.

Index Seek:
Since a seek only touches rows that qualify and pages that contain these qualifying rows, the cost is proportional to the number of qualifying rows and pages rather than to the total number of rows in the table.

Advantages of stored procedures
1. Better Performance : Stored Procedures are compiled and their execution plan is cached and used again, when the same SP is executed again. Although adhoc queries also create and reuse plan, the plan is reused only when the query is textual match and the datatypes are matching with the previous call. Any change in the datatype or you have an extra space in the query then, a new plan is created.

2. Better Security : Applications making use of dynamically built adhoc sql queries are highly susceptible to sql injection attacks, where as Stored Procedures can avoid SQL injection attacks completely.

3. Reduced Network Traffic: Stored procedures can reduce network traffic to a very great extent when compared with adhoc sql queries. With stored procedures, you only need to send the name of the procedure between client and server. Imagine the amount of network bandwith that can be saved especially if the stored procedure contains 1000 to 2000 lines of SQL.

4. Better Maintainance and Reusability: Stored procedures can be used any where in the application. It is easier to maintain a stored procedure that is used on several pages as the modfifcations just need to be changed at one place where the stored procedure is defined. On the other hand, maintaining an adhoc sql query that's used on several pages is tedious and error prone, as we have to make modifications on each and every page.

SQL Server and BI Tools

Friday, October 30, 2015

Basic SQL Server Questions

Visited: