SQL Server and BI Tools

Friday, May 13, 2016

Dynamic SQL

A dynamic SQL in a stored procedure is a single Transact-SQL statement or a set of statements stored in a variable and executed using a SQL command. There may be several methods of implementing this in SQL Server.

"When to Use Dynamic SQL?" We can't definitely say that a Static SQL will meet all our programming needs. A Dynamic SQL is needed when we need to retrieve a set of records based on different search parameters. Say for example - An employee search screen or a general purpose report which needs to execute a different SELECT statement based on a different WHERE clause.

NOTE: Most importantly, the Dynamic SQL Queries in a variable are not compiled, parsed, checked for errors until they are executed.

sp_executesql Vs EXECUTE Command
A dynamically build Transact-SQL statements can be executed using EXECUTE Command or sp_executesql statement.

sp_executesql which is more efficient, faster in execution and also supports parameter substitution. If we are using EXECUTE command to execute the SQL String, then all the parameters should be converted to character and made as a part of the Query before execution. But the sp_executesql statement provides a better way of implementing this. It allows us to substitute the parameter values for any parameter specified in the SQL String. Before getting into the actual example, let me differentiate these two commands with a simple example. Say - selecting a record from the employee table using the ID in the WHERE clause.

The basic syntax for using EXECUTE command:

EXECUTE(@SQLStatement)

e.g.
/* Using EXECUTE Command */
/* Build and Execute a Transact-SQL String with a single parameter
value Using EXECUTE Command */

/* Variable Declaration */
DECLARE @EmpID AS SMALLINT
DECLARE @SQLQuery AS NVARCHAR(500)
/* set the parameter value */
SET @EmpID = 1001
/* Build Transact-SQL String with parameter value */
SET @SQLQuery = 'SELECT * FROM tblEmployees WHERE EmployeeID = ' +
CAST(@EmpID AS NVARCHAR(10))
/* Execute Transact-SQL String */

EXECUTE(@SQLQuery)

The basic syntax for using sp_executesql:

sp_executesql [@SQLStatement],[@ParameterDefinitionList],[@ParameterValueList]

e.g.

/* Variable Declaration */
DECLARE @EmpName AS NVARCHAR(50)
DECLARE @SQLQuery AS NVARCHAR(500)

/* Build and Execute a Transact-SQL String with a single parameter
value Using sp_executesql Command */
SET @EmpName = 'John'
SET @SQLQuery = 'SELECT * FROM tblEmployees
WHERE EmployeeName LIKE '''+ '%' + @EmpName + '%' + ''''
EXECUTE sp_executesql @SQLQuery

/* Variable Declaration */
DECLARE @EmpID AS NVARCHAR(50)
DECLARE @SQLQuery AS NVARCHAR(500)

/* Build and Execute a Transact-SQL String with a single
parameter value Using sp_executesql Command */
SET @EmpID = '1001,1003'
SET @SQLQuery = 'SELECT * FROM tblEmployees
WHERE EmployeeID IN(' + @EmpID + ')'
EXECUTE sp_executesql @SQLQuery

/* Variable Declaration */
DECLARE @OrderBy AS NVARCHAR(50)
DECLARE @SQLQuery AS NVARCHAR(500)

/* Build and Execute a Transact-SQL String with a single parameter
value Using sp_executesql Command */
SET @OrderBy = 'Department'
SET @SQLQuery = 'SELECT * FROM tblEmployees Order By ' + @OrderBy

EXECUTE sp_executesql @SQLQuery

Lock Escalations

Lock Escalations are an optimization technique used by SQL Server to control the amount of locks that are held within the Lock Manager of SQL Server. Let’s start in the first step with the description of the so-called Lock Hierarchy in SQL Server, because that’s the reason why the concept of the Lock Escalations exists in a relational database like SQL Server.

Lock Hierarchy

Database-->Table-->Page-->Row

The lock hierarchy starts at the database level, and goes down to the row level. You always have a Shared Lock (S) on the database level itself. When your query is connected to a database, the Shared Lock prevents the dropping of the database, or that backups are restored over that database. And underneath the database level, you have locks on the table, on the pages, and the records when you are performing an operation.

When you are executing a SELECT statement, you have an Intent Shared Lock (IS) on the table and page level, and a Shared Lock (S) on the record itself. When you are performing a data modification statement (INSERT, UPDATE, DELETE), you have an Intent Exclusive or Update Lock (IX or IU) on the table and page level, and a Exclusive or Update Lock (X or U) on the changed records. SQL Server always acquires locks from top to bottom to prevent so-called Race Conditions, when multiple threads trying to acquire locks concurrently within the locking hierarchy.

Lock Escalations

As soon as you have more than 5.000 locks on one level in your locking hierarchy, SQL Server escalates these many fine-granularity locks into a simple coarse-granularity lock. By default SQL Server *always* escalates to the table level.

As you can see, you have only one big lock on the table itself. In the case of the DELETE operation, you have one Exclusive Lock (X) on the table level. This will hurt the concurrency of your database in a very negative way! Holding an Exclusive Lock on the table level means that no other session is able anymore to access that table – every other query will just block. When you are running your SELECT statement in the Repeatable Read Isolation Level, you are also holding your Shared Locks till the end of the transaction, means you will have a Lock Escalation as soon as you have read more than 5.000 rows. The result is here one Shared Lock on the table itself! Your table is temporary readonly, because every other data modification on that table will be blocked!

There is also a misconception that SQL Server will escalate from the row level to the page level, and finally to the table level. Wrong! Such a code path doesn’t exist in SQL Server! SQL Server will by default *always* escalate directly to the table level. An escalation policy to the page level just doesn’t exist. If you have your table partitioned (Enterprise Edition only!), then you can configure an escalation to the partition level. But you have to test here very carefully your data access pattern, because a Lock Escalation to the partition level can cause a deadlock. Therefore this option is also not enabled by default.

Since SQL Server 2008 you can also control how SQL Server performs the Lock Escalation – through the ALTER TABLE statement and the property LOCK_ESCALATION. There are 3 different options available:

Alter Table dbo.Tablename Set (Lock_Escalation = AUTO --Table or Disable)

TABLE
AUTO
DISABLE

The default option is TABLE, means that SQL Server *always* performs the Lock Escalation to the table level – even when the table is partitioned. If you have your table partitioned, and you want to have a Partition Level Lock Escalation (because you have tested your data access pattern, and you don’t cause deadlocks with it), then you can change the option to AUTO. AUTO means that the Lock Escalation is performed to the partition level, if the table is partitioned, and otherwise to the table level. And with the option DISABLE you can completely disable the Lock Escalation for that specific table. But disabling Lock Escalations is not the very best option, because the Lock Manager of SQL Server can then consume a huge amount of memory, if you are not very carefully with your queries and your indexing strategy.

Conclusion

Lock Escalation in SQL Server is mainly a nightmare. How will you delete more than 5.000 rows from a table without running into Lock Escalations? You can disable Lock Escalation temporarily, but you have to be very careful here. Another option (that I’m suggesting) is to make your DELETE/UPDATE statements in a loop as different, separate transactions: DELETE/UPDATE less than 5.000 rows, so that you can prevent Lock Escalations. As a very nice side-effect your huge, big transaction will be splitted into multiple smaller ones, which will also help you with Auto Growth issues that you maybe have with your transaction log.

SQL Server Important Topics

What is a Stored Procedure?

A stored procedure is a named group of SQL statements that have been previously created and stored in the server database. Stored procedures accept input parameters so that a single procedure can be used over the network by several clients using different input data. And when the procedure is modified, all clients automatically get the new version. Stored procedures reduce network traffic and improve performance. Stored procedures can be used to help ensure the integrity of the database.

e.g. sp_helpdb, sp_renamedb, sp_depends etc.

What is a Trigger?

A trigger is a SQL procedure that initiates an action when an event (INSERT, DELETE or UPDATE) occurs. Triggers are stored in and managed by the DBMS. Triggers are used to maintain the referential integrity of data by changing the data in a systematic fashion. A trigger cannot be called or executed; DBMS automatically fires the trigger as a result of a data modification to the associated table. Triggers can be considered to be similar to stored procedures in that both consist of procedural logic that is stored at the database level. Stored procedures, however, are not event-drive and are not attached to a specific table as triggers are. Stored procedures are explicitly executed by invoking a CALL to the procedure while triggers are implicitly executed. In addition, triggers can also execute stored procedures.

Nested Trigger: A trigger can also contain INSERT, UPDATE and DELETE logic within itself; so when the trigger is fired because of data modification, it can also cause another data modification, thereby firing another trigger. A trigger that contains data modification logic within itself is called a nested trigger.

What are the Different Types of Triggers?

There are two types of Triggers.

1) DML Trigger

There are two types of DML Triggers

1.Instead of Trigger
Instead of Triggers are fired in place of the triggering action such as an insert, update, or delete.

2. After Trigger
After triggers execute following the triggering action, such as an insert, update, or delete.

3) DDL Trigger

This type of trigger is fired against Drop Table, Create Table, Alter Table or Login events. DDL Triggers are always After Triggers.

What is a View?

A simple view can be thought of as a subset of a table. It can be used for retrieving data as well as updating or deleting rows. Rows updated or deleted in the view are updated or deleted in the table the view was created with. It should also be noted that as data in the original table changes, so does the data in the view as views are the way to look at parts of the original table. The results of using a view are not permanently stored in the database. The data accessed through a view is actually constructed using standard T-SQL select command and can come from one to many different base tables or even other views.

What is an Index?

An index is a physical structure containing pointers to the data. Indices are created in an existing table to locate rows more quickly and efficiently. It is possible to create an index on one or more columns of a table, and each index is given a name. The users cannot see the indexes; they are just used to speed up queries. Effective indexes are one of the best ways to improve performance in a database application. A table scan happens when there is no index available to help a query. In a table scan, the SQL Server examines every row in the table to satisfy the query results. Table scans are sometimes unavoidable, but on large tables, scans have a terrific impact on performance.

What is a Linked Server?

Linked Servers is a concept in SQL Server by which we can add other SQL Server to a Group and query both the SQL Server databases using T-SQL Statements. With a linked server, you can create very clean, easy–to-follow SQL statements that allow remote data to be retrieved, joined and combined with local data. Stored Procedures sp_addlinkedserver, sp_addlinkedsrvlogin will be used to add new Linked Server.

What is a Cursor?

A cursor is a database object used by applications to manipulate data in a set on a row-by-row basis, instead of the typical SQL commands that operate on all the rows in the set at one time.

In order to work with a cursor, we need to perform some steps in the following order:

Declare cursor
Open cursor
Fetch row from the cursor
Process fetched row
Close cursor
Deallocate cursor

What is the Difference between a Function and a Stored Procedure?

UDF can be used in the SQL statements anywhere in the WHERE/HAVING/SELECT section, whereas Stored procedures cannot be. UDFs that return tables can be treated as another rowset. This can be used in JOINs with other tables. Inline UDF’s can be thought of as views that take parameters and can be used in JOINs and other Rowset operations.

What is subquery? Explain the Properties of a Subquery?

Subqueries are often referred to as sub-selects as they allow a SELECT statement to be executed arbitrarily within the body of another SQL statement. A subquery is executed by enclosing it in a set of parentheses. Subqueries are generally used to return a single row as an atomic value although they may be used to compare values against multiple rows with the IN keyword.

A subquery is a SELECT statement that is nested within another T-SQL statement. A subquery SELECT statement if executed independently of the T-SQL statement, in which it is nested, will return a resultset. This implies that a subquery SELECT statement can stand alone, and it does not depend on the statement in which it is nested. A subquery SELECT statement can return any number of values and can be found in the column list of a SELECT statement, and FROM, GROUP BY, HAVING, and/or ORDER BY clauses of a T-SQL statement. A subquery can also be used as a parameter to a function call. Basically, a subquery can be used anywhere an expression can be used.

What are the Difference between Clustered and a Non-clustered Index?

A clustered index is a special type of index that reorders the way records in the table are physically stored. Therefore, the table can have only one clustered index. The leaf nodes of a clustered index contain the data pages.

A non-clustered index is a special type of index in which the logical order of the index does not match the physical stored order of the rows on disk. The leaf node of a non-clustered index does not consist of the data pages. Instead, the leaf nodes contain index rows.

What are the Different Index Configurations a Table can have?

A table can have one of the following index configurations:

No indexes
A clustered index
A clustered index and many non-clustered indexes
A non-clustered index
Many non-clustered indexes

What are Different Types of Collation Sensitivity?

Case sensitivity – A and a, B and b, etc.

Accent sensitivity – a and á, o and ó, etc.

Kana Sensitivity – When Japanese Kana characters Hiragana and Katakana are treated differently, it is called Kana sensitive.

Width sensitivity – When a single-byte character (half-width) and the same character represented as a double-byte character (full-width) are treated differently, it is width sensitive.

What is Difference between DELETE and TRUNCATE Commands?

Delete command removes the rows from a table on the basis of the condition that we provide with a WHERE clause. Truncate will actually remove all the rows from a table, and there will be no data in the table after we run the truncate command.

TRUNCATE

TRUNCATE is faster and uses fewer system and transaction log resources than DELETE. (Read all the points below)
TRUNCATE removes the data by deallocating the data pages used to store the table’s data, and only the page deallocations are recorded in the transaction log.
TRUNCATE removes all the rows from a table, but the table structure, its columns, constraints, indexes and so on remains. The counter used by an identity for new rows is reset to the seed for the column.
You cannot use TRUNCATE TABLE on a table referenced by a FOREIGN KEY constraint.
Using T-SQL – TRUNCATE cannot be rolled back unless it is used in TRANSACTION. OR TRUNCATE can be rolled back when used with BEGIN … END TRANSACTION using T-SQL.
TRUNCATE is a DDL Command.
TRUNCATE resets the identity of the table.

DELETE

DELETE removes rows one at a time and records an entry in the transaction log for each deleted row.
DELETE does not reset Identity property of the table.
DELETE can be used with or without a WHERE clause
DELETE activates Triggers if defined on table.
DELETE can be rolled back.
DELETE is DML Command.
DELETE does not reset the identity of the table.

What are Different Types of Locks?

Shared Locks:

Used for operations that do not change or update data (read-only operations), such as a SELECT statement.

Update Locks:

Used on resources that can be updated. It prevents a common form of deadlock that occurs when multiple sessions are reading, locking, and potentially updating resources later.

Exclusive Locks:

Used for data-modification operations, such as INSERT, UPDATE, or DELETE. It ensures that multiple updates cannot be made to the same resource at the same time.

Intent Locks:

Used to establish a lock hierarchy. The types of intent locks are as follows: intent shared (IS), intent exclusive (IX), and shared with intent exclusive (SIX).

Schema Locks:

Used when an operation dependent on the schema of a table is executing. The types of schema locks are schema modification (Sch-M) and schema stability (Sch-S).

Bulk Update Locks:

Used when bulk-copying data into a table and the TABLOCK hint is specified.

What are Pessimistic Lock and Optimistic Lock?

Optimistic Locking is a strategy where you read a record, take note of a version number and check that the version hasn’t changed before you write the record back. If the record is dirty (i.e. different version to yours), then you abort the transaction and the user can re-start it.

Pessimistic Locking is when you lock the record for your exclusive use until you have finished with it. It has much better integrity than optimistic locking but requires you to be careful with your application design to avoid Deadlocks.

What is the ‘FILLFACTOR’?

A “FILLFACTOR” is one of the important arguments that can be used while creating an index.

According to MSDN, FILLFACTOR specifies a percentage that indicates how much the Database Engine should fill each index page during index creation or rebuild. Fill-factor is always an integer valued from 1 to 100. The fill-factor option is designed for improving index performance and data storage. By setting the fill-factor value, you specify the percentage of space on each page to be filled with data, reserving free space on each page for future table growth.

Specifying a fill-factor value of 70 would imply that 30 percent of each page will be left empty, providing space for index expansion as data is added to the underlying table. The fill-factor setting applies only when the index is created or rebuilt.

What are Points to Remember while Using the FILLFACTOR Argument?

If fill-factor is set to 100 or 0, the Database Engine fills pages to their capacity while creating indexes.

The server-wide default FILLFACTOR is set to 0.

To modify the server-wide default value, use the sp_configure system stored procedure.

To view the fill-factor value of one or more indexes, use sys.indexes.

To modify or set the fill-factor value for individual indexes, use CREATE INDEX or ALTER INDEX statements.

Creating a clustered index with a FILLFACTOR < 100 may significantly increase the amount of space the data occupies because the Database Engine physically reallocates the data while building the clustered index.

What is a Covered index?

It is an index that can satisfy a query just by its index keys without having needed to touch the data pages.

It means that when a query is fired, SQL Server doesn’t need to go to the table to retrieve the rows, but can produce the results directly from the index as the index covers all the columns used in query

When I Delete any Data from a Table, does the SQL Server reduce the size of that table?

When data are deleted from any table, the SQL Server does not reduce the size of the table right away; however, it marks those pages as free pages, showing that they belong to the table. When new data are inserted, they are put into those pages first. Once those pages are filled up, SQL Server will allocate new pages. If you wait for sometime, the background process de-allocates the pages, finally reducing the page size.

Software Project Cost Estimation Methods

1. Pure Expert Judgement
2. Historical Data
3. WideBand Delphi
4. Source Line of Code
5. Funtion Point
6. Use Case
7. Story Point
8. MonteCarlo
9. COCOMO -I, II

Inmon vs. Kimball

When it comes to designing a data warehouse for your business, the two most commonly discussed methods are the approaches introduced by Bill Inmon and Ralph Kimball. Debates on which one is better and more effective have been on for years. But a clear cut answer has never been arrived upon, as both philosophies have their own advantages and differentiating factors, and enterprises continue to use either of these.

Bill Inmon’s enterprise data warehouse approach (the top-down design): A normalized data model is designed first. Then the dimensional data marts, which contain data required for specific business processes or specific departments are created from the data warehouse.

Ralph Kimball’s dimensional design approach (the bottom-up design): The data marts facilitating reports and analysis are created first; these are then combined together to create a broad data warehouse.

Inmon defines the data warehouse in the following terms:

Inmon’s top-down approach

Subject-oriented: The data in the data warehouse is organized so that all the data elements relating to the same real-world event or object are linked together

Time-variant: The changes to the data in the database are tracked and recorded so that reports can be produced showing changes over time

Non-volatile: Data in the data warehouse is never over-written or deleted -- once committed, the data is static, read-only, and retained for future reporting

Integrated: The database contains data from most or all of an organization's operational applications, and that this data is made consistent

Kimball’s bottom-up approach

Inmon defines data warehouse as a centralized repository for the entire enterprise. Data warehouse stores the ‘atomic’ data at the lowest level of detail. Dimensional data marts are created only after the complete data warehouse has been created. Thus, data warehouse is at the center of the Corporate Information Factory (CIF), which provides a logical framework for delivering business intelligence.

Keeping in mind the most important business aspects or departments, data marts are created first. These provide a thin view into the organizational data, and as and when required these can be combined into a larger data warehouse. Kimball defines data warehouse as “A copy of transaction data specifically structured for query and analysis”.

Kimball’s data warehousing architecture is also known as Data Warehouse Bus (BUS). Dimensional modeling focuses on ease of end user accessibility and provides a high level of performance to the data warehouse.