Saturday, October 31, 2015

Data Warehouse Questions

Q: Define Data Warehouse? 
A: Data warehouse is Subject Oriented, Integrated, Time-Variant and Nonvolatile collection of data that support management's decision making process. 

Q: What does the subject oriented data warehouse signifies? 
A: Subject oriented signifies that the data warehouse stores the information around a particular subject such as product, customer, sales etc. 

Q: List any five applications of Data Warehouse? 
A: Some applications include Financial services, Banking Services, Customer goods, Retail Sectors, Controlled Manufacturing. 

Q: What does OLAP and OLTP stand for? 
A: OLAP is acronym of Online Analytical Processing and OLAP is acronym of Online Transactional Processing 

Q: What is the very basic difference between data warehouse and Operational Databases? 
A: Data warehouse contains the historical information that is made available for analysis of the business whereas the Operational database contains the current information that is required to run the business. 

Q: List the Schema that Data Warehouse System implements ? 
A: Data Warehouse can implement Star Schema, Snowflake Schema or the Fact Constellation Schema 

Q: What is Data Warehousing? 
A: Data Warehousing is the process of constructing and using the data warehouse. 

Q: List the process that are involved in Data Warehousing? 
A: Data Warehousing involves data cleaning, data integration and data consolidations. 

Q: List the functions of data warehouse tools and utilities? 
A: The functions performed by Data warehouse tool and utilities are Data Extraction, Data Cleaning, Data Transformation, Data Loading and Refreshing. 

Q: What do you mean by Data Extraction?
A: Data Extraction means gathering the data from multiple heterogeneous sources. 

Q: Define Metadata? 
A: Metadata is simply defined as data about data. In other words we can say that metadata is the summarized data that lead us to the detailed data. 

Q: What does MetaData Respiratory contains? 
A: Metadata respiratory contains Definition of data warehouse, Business Metadata, Operational Metadata, Data for mapping from operational environment to data warehouse and the Alorithms for summarization 

Q: How does a Data Cube help?
A: Data cube help us to represent the data in multiple dimensions. The data cube is defined by dimensions and facts. 

Q: Define Dimension? 
A: The dimensions are the entities with respect to which an enterprise keeps the records. 

Q: Explain Data mart? 
A: Data mart contains the subset of organisation-wide data. This subset of data is valuable to specific group of an organisation. in other words we can say that data mart contains only that data which is specific to a particular group. 

Q: What is Virtual Warehouse? 
A: The view over a operational data warehouse is known as virtual warehouse. 

Q: List the phases involved in Data warehouse delivery Process? 
A: The stages are IT strategy, Education, Business Case Analysis, technical Blueprint, Build the version, History Load, Ad hoc query,Requirement Evolution, Automation, Extending Scope. 

Q: Explain Load Manager? 
A: This Component performs the operations required to extract and load process. The size and complexity of load manager varies between specific solutions from data warehouse to data warehouse. 

Q: Define the function of Load Manager? 
A: Extract the data from source system.Fast Load the extracted data into temporary data store.Perform simple transformations into structure similar to the one in the data warehouse. 

Q: Explain Warehouse Manager? 
A: Warehouse manager is responsible for the warehouse management process.The warehouse manager consist of third party system software, C programs and shell scripts.The size and complexity of warehouse manager varies between specific solutions.


Q: Define functions of Warehouse Manager? 
A: The Warehouse Manager performs consistency and referential integrity checks, Creates the indexes, business views, partition views against the base data, transforms and merge the source data into the temporary store into the published data warehouse, Backup the data in the data warehouse and archives the data that has reached the end of its captured life. 

Q: What is Summary Information? 
A: Summary Information is the area in data warehouse where the predefined aggregations are kept. 

Q: What does the Query Manager responsible for? 
A: Query Manager is responsible for directing the queries to the suitable tables. 

Q: List the types of OLAP server? 
A: There are four types of OLAP Server namely Relational OLAP, Multidimensional OLAP, Hybrid OLAP, and Specialized SQL Servers 

Q: Which one is faster Multidimensional OLAP or Relational OLAP? 
A: Multidimensional OLAP is faster than the Relational OLAP. 

Q: List the functions performed by OLAP? 
A: The functions such as roll-up, drill-down, slice, dice, and pivot are performed by OLAP. 

Q: How many dimensions are selected in Slice operation? 
A: Only one dimension is selected for the slice operation. 

Q: How many dimensions are selected in dice operation? 
A: For dice operation two or more dimensions are selected for a given cube. 

Q: How many fact tables are there in Star Schema? 
A: There is only one fact table in Star Schema. 

Q: What is Normalization? 
A: The normalization split up the data into additional tables. 

Q: Out of Star Schema and Snowflake Schema, the dimension table is normalised? 
A: The snowflake schema uses the concept of normalization. 

Q: What is the benefit of Normalization? 
A: Normalization helps to reduce the data redundancy. 

Q: Which language is used for defining Schema Definition 
A: Data Mining Query Language (DMQL) id used for Schema Definition. 

Q: What language is the base of DMQL 
A: DMQL is based on Structured Query Language (SQL) 

Q: What are the reasons for partitioning? 
A: Partitioning is done for various reasons such as easy management, to assist backup recovery, to enhance performance.

Q: What kind of costs is involved in Data Marting? 
A: Data Marting involves Hardware & Software cost, Network access cost and Time cost.

Friday, October 30, 2015

What is Kanban?

Scrum and Kanban are two terms that are often (incorrectly) used interchangeably or thought to be two sides of the same coin. In reality, there are significant differences between these two Agile methodologies. Understanding these differences is key to choosing the path that will work best for your environment. In a nutshell, what is Scrum? Without getting too detailed, Scrum is a tool used to organize work into small, manageable pieces that can be completed by a cross-functional team within a prescribed time period (called a sprint, generally 2-4 weeks long). To plan, organize, administer, and optimize this process, Scrum relies on at least three prescribed roles: the Product Owner (responsible for initial planning, prioritizing, and communication with the rest of the company), the Scrum Master (responsible for overseeing the process during each sprint), and the Team Members (responsible to carry out the purpose of each sprint, such as producing software code.) Another common tool used by scrum teams is the Scrum Board – a visual representation of the work flow, broken down into manageable chunks called “stories”, with each story moved along the board from the “backlog” (the to-do list), into work-in-progress (WIP), and on to completion.

Both Scrum and Kanban allow for large and complex tasks to be broken down and completed efficiently. Both place a high value on continual improvement, optimization of the work and the process. And both share the very similar focus on a highly visible work flow that keeps all team members in the loop on WIP and what’s to come.


Kanban is also a tool used to organize work for the sake of efficiency. Like Scrum, Kanban encourages work to be broken down into manageable chunks and uses a Kanban Board (very similar to the Scrum Board) to visualize that work as it progresses through the work flow. Where Scrum limits the amount of time allowed to accomplish a particular amount of work (by means of sprints), Kanban limits the amount of work allowed in any one condition (only so many tasks can be ongoing, only so many can be on the to-do list.)

Kanban is a method for managing the creation of products with an emphasis on continual delivery while not overburdening the development team. Like scrum, Kanban is a process designed to help teams work together more effectively. 

Kanban is based on 3 basic principles:

Visualize what you do today (workflow): seeing all the items in context of each other can be very informative
Limit the amount of work in progress (WIP): this helps balance the flow-based approach so teams don€™t start and commit to too much work at once
Enhance flow: when something is finished, the next highest thing from the backlog is pulled into play

Kanban promotes continuous collaboration and encourages active, ongoing learning and improving by defining the best possible team workflow.

How is Kanban different from Scrum?

Both Kanban and Scrum focus on releasing software early and often. Both require highly-collaborative and self-managed teams. There are, however, differences between the approaches:


Kanban                                                                              Scrum
No prescribed roles                                                           Pre-defined roles of Scrum master, Product owner and team member
Continuous Delivery                                                         Timeboxed sprints
Work is pulled through the system (single piece flow)    Work is pulled  through the system in batches (the sprint backlog)
Changes can be made at any time                                     No changes allowed mid-sprint
Cycle time                                                                         Velocity
More appropriate in operational environments with a high degree of variability in priority  More appropriate in situations where work can be prioritized in batches that can be left alone


Organization, culture and team dynamics often determine which method is the best fit.

Benefits of Kanban:


Kanban and Scrum at their core are summarized by the premise: Stop Starting, Start Finishing. The entire team€™s focus is on €˜getting to done€™ for the tasks in progress.

Benefits:
Shorter cycle times can deliver features faster.
Responsiveness to Change:
When priorities change very frequently, Kanban is ideal.
Balancing demand against throughput guarantees that most the customer-centric features are always being worked.
Requires fewer organization / room set-up changes to get started
Reducing waste and removing activities that don€™t add value to the team/department/organization
Rapid feedback loops improve the chances of more motivated, empowered and higher-performing team members

Kanban and VersionOne:

VersionOne supports multiple teams using multiple agile methodologies. Teams use VersionOne to track the status of their items, setting WIP limits for each process stage to help them identify, manage, and mitigate constraints with the goal of optimizing the team€™s flow.

Teams interested in Kanban benefit from VersionOne€™s customizable Kanban board and collaboration capabilities.

If your Kanban teams are:
distributed
using multiple development methodologies across the entire €˜larger€™ team (Scrum, XP, DSDM)
seeking to automate tracking your projects
gaining insight from roll-up reporting via multiple teams and projects
searching to improve delivery performance.


HOW ARE SCRUM AND KANBAN DIFFERENT?


SCHEDULING, ITERATION, AND CADENCE

Scrum processes place heavy emphasis on schedule. The scrum team is provided with a prioritized list of story points that need to be completed to deliver a shippable product. The team must decide how many of the points they feel can be completed within one sprint. Anything outside the scope they commit to must wait for the next sprint. Optimally, an efficient scrum team will quickly learn their capabilities over the course of several sprints and their estimates will improve and be optimized as time goes on. Then, every two weeks (or however long their sprint is) the team produces a shippable product, carries out a retrospective to discuss optimizing the process, and moves into the next sprint. This iterative process is designed to allow for accurate estimations of work flow and effective management of multiple projects. On a Kanban team, there are no required time boxes or iterations. While the Kanban method is iterative in nature, the continual improvement is expected to occur in an evolutionary fashion as work is continually completed. The limitations placed on various conditions in the work flow will be regulated early in a team’s (or organization’s) use of Kanban until an optimal set of limits is arrived at to keep the flow steady and efficient.

ROLES AND RESPONSIBILITIES

On scrum teams, there are at least three roles that must be assigned in order to effectively process the work: the Product Owner, Scrum Master, and Team Members. Each role has its own set of responsibilities, and they must work together to achieve an orderly and efficient balance. The scrum team itself also must be cross-functional, which is to say that one team must have all the resources necessary to complete the entire sprint’s work. Under Kanban, no set roles are prescribed. Practically speaking, it makes sense for someone to serve as a project manager or supervisor, especially for larger more complex Kanban projects, but the roles should theoretically evolve with the needs of the project and the organization. A Kanban team is not required to be cross-functional since the Kanban work flow is intended to be used by any and all teams involved in the project. Therefore, a team of specialists and a separate team of generalists may be working on different aspects of the same Kanban project from the same board, and that’s ok.

THE BOARD ITSELF

While very similar, the Scrum Board and Kanban Board are different animals. On a Scrum board, the columns are labeled to reflect periods in the work flow beginning with the sprint backlog and ending with whatever fulfills the team’s definition of done. All the stories added to the board at the beginning of each sprint should be found in the final column at the end of that sprint or the sprint was unsuccessful. After the sprint retrospective, the board is cleared and prepped for the next sprint. On a Kanban board, the columns are likewise labeled to show work flow states, but with one vital difference: they also publish the maximum number of stories allowed in each column at any one time. This enforces the team-determined limitations Kanban prescribes for each condition. Since each column has a limited number of allowed stories and there are no required time boxes (such as sprint length), there is no reason to reset the Kanban board as work progresses. It will continue to flow for as long as the project continues, with new stories being added as the need arises, and completed stories being re-evaluated should it be necessary.

WHICH IS BETTER FOR YOUR NEEDS?

There’s really no way to answer that question for you in this article. Both Scrum and Kanban are powerful, proven process tools that can vastly improve your project management. The best option is to become familiar with both of them and experiment with various aspects of both in your production environment. Creating a hybrid of both is perfectly acceptable if that works best for you

Basic SQL Server Questions

Explain DML, DDL, DCL and TCL statements with examples?

DML: DML stands for Data Manipulation Language. DML is used to retrieve, store, modify, delete, insert and update data in database.
Examples of DML statements: SELECT, UPDATE, INSERT, DELETE statements.

DDL: DDL stands for Data Definition Language. DDL is used to create and modify the structure of database objects.

Examples: CREATE, ALTER, DROP statements.

DCL: DCL stands for Data Control Language. DCL is used to create roles, grant and revoke permissions, establish referential integrity etc.
Examples: GRANT, REVOKE statements

TCL: TCL stands for Transactional Control Language. TCL is used to manage transactions within a database.
Examples: COMMIT, ROLLBACK statements


What is the difference between Drop, Delete and Truncate statements in SQL Server?

Drop, Delete and Truncate - All operations can be rolled back.

All the statements (Delete, Truncate and Drop) are logged operations, but the amount of information that is logged varies. Delete statement logs an entry in the transaction log for each deleted row, where as Truncate Table logs only the Page deallocations.Hence, truncate is a little faster than Delete.

You can have a where clause in Delete statement where as Truncate statement cannot have a where clause. Truncate will delete all the rows in a Table, but the structure of the table remains. Drop would delete all the rows including the structure of the Table.

Please refer to the screen shot below for the differences summary snapshot between Drop, Delete and Truncate statements in SQL Server.





What is Cascading referential integrity constraint?
Cascading referential integrity constraints allow you to define the actions Microsoft SQL Server should take when a user attempts to delete or update a key to which an existing foreign keys point.

You can instruct SQL Server to do the following:

1. No Action: This is the default behaviour. No Action specifies that if an attempt is made to delete or update a row with a key referenced by foreign keys in existing rows in other tables, an error is raised and the DELETE or UPDATE is rolled back.

2. Cascade: Specifies that if an attempt is made to delete or update a row with a key referenced by foreign keys in existing rows in other tables, all rows containing those foreign keys are also deleted or updated.

3. Set NULL: Specifies that if an attempt is made to delete or update a row with a key referenced by foreign keys in existing rows in other tables, all rows containing those foreign keys are set to NULL.

4. Set Default: Specifies that if an attempt is made to delete or update a row with a key referenced by foreign keys in existing rows in other tables, all rows containing those foreign keys are set to default values.

Difference between primary key and unique key in SQL Server?
1. A table can have only one primary key. On the other hand a table can have more than one unique key.
2. Primary key column does not accept any null values, where as a unique key column accept one null value.


What are the 2 types of Temporary Tables in SQL Server?
1. Local Temporary Tables
2. Global Temporary Tables


What is the difference between Local and Global Temporary Tables?

Local Temporary Tables:
1. Prefixed with a single pound sign (#).
2. Local temporary tables are visible to that session of SQL Server which has created it.
3. Local temporary tables are automatically dropped, when the session that created the temporary tables is closed.


Global Temporary Tables:
1. Prefixed with two pound signs (##).
2. Global temporary tables are visible to all the SQL server sessions.
3. Global temporary tables are also automatically dropped, when the session that created the temporary tables is closed.

Can you create foreign key constraints on temporary tables?
No

Do you have to manually delete temporary tables?
No, temporary tables are automatically dropped, when the session that created the temporary tables is closed. But if you maintain a persistent connection or if connection pooling is enabled, then it is better to explicitly drop the temporary tables you have created.
However, It is generally considered a good coding practice to explicitly drop every temporary table you create.


In which database, the temporary tables get created?
TEMPDB database.


What is the difference between a Temporary Table and a Table Variable?
Or
When do you use Table Variable over a Temporary Table and vice versa?



1. Table variable is created in the memory where as a temporary table is created in the TempDB. But, if there is a memory pressure, the pages belonging to a table variable may be pushed out to tempdb.

2. Table variables cannot be involved in transactions, logging or locking. This makes table variable faster than a temporary table.

3. You can pass table variable as parameter to functions and stored procedures, where as you cannot do the same with temporary table.

4. A temporary table can have indexes, whereas a table variable can only have a primary index. If speed is an issue Table variables can be faster, but if there are a lot of records, or there is a need to search the temporary table based on a clustered index, then a Temporary Table would be better. If you have less than 100 rows generally use a table variable. Otherwise use a temporary table. This is because SQL Server won't create statistics on table variables.


What is the use of an Index in SQL Server?
Relational databases like SQL Server use indexes to find data quickly when a query is processed. Creating the proper index can drastically increase the performance of an application.


What is a table scan?
or
What is the impact of table scan on performance?

When a SQL Server has no index to use for searching, the result is similar to the reader who looks at every page in a book to find a word. The SQL engine needs to visit every row in a table. In database terminology we call this behavior a table scan, or just scan. A full table scan of a very large table can adversely affect the performance. Creating proper indexes will allow the database to quickly narrow in on the rows to satisfy the query, and avoid scanning every row in the table.


What is the system stored procedure that can be used to list all the indexes that are created for a specific table?
sp_helpindex is the system stored procedure that can be used to list all the indexes that are created for a specific table.

For example, to list all the indexes on table tblCustomers, you can use the following command.
EXEC sp_helpindex tblCustomers


What is the purpose of query optimizer in SQL Server?
An important feature of SQL Server is a component known as the query optimizer. The query optimizer's job is to find the fastest and least resource intensive means of executing incoming queries. An important part of this job is selecting the best index or indexes to perform the task.


What is the first thing you will check for, if the query below is performing very slow?

SELECT * FROM tblProducts ORDER BY UnitPrice ASC

Check if there is an Index created on the UntiPrice column used in the ORDER BY clause. An index on the UnitPrice column can help the above query to find data very quickly.When we ask for a sorted data, the database will try to find an index and avoid sorting the results during execution of the query. We control sorting of a data by specifying a field, or fields, in an ORDER BY clause, with the sort order as ASC (ascending) or DESC (descending).

With no index, the database will scan the tblProducts table and sort the rows to process the query. However, if there is an index, it can provide the database with a presorted list of prices. The database can simply scan the index from the first entry to the last entry and retrieve the rows in sorted order.

The same index works equally well with the following query, simply by scanning the index in reverse.
SELECT * FROM tblProducts ORDER BY UnitPrice DESC


What is the significance of an Index on the column used in the GROUP BY clause?
Creating an Index on the column, that is used in the GROUP BY clause, can greatly improve the perofrmance. We use a GROUP BY clause to group records and aggregate values, for example, counting the number of products with the same UnitPrice. To process a query with a GROUP BY clause, the database will often sort the results on the columns included in the GROUP BY.

The following query counts the number of products at each price by grouping together records with the same UnitPrice value.
SELECT UnitPrice, Count(*) FROM tblProducts GROUP BY UnitPrice

The database can use the index (Index on UNITPRICE column) to retrieve the prices in order. Since matching prices appear in consecutive index entries, the database is able to count the number of products at each price quickly. Indexing a field used in a GROUP BY clause can often speed up a query.


What is the role of an Index in maintaining a Unique column in table?
Columns requiring unique values (such as primary key columns) must have a unique index applied. There are several methods available to create a unique index.
1. Marking a column as a primary key will automatically create a unique index on the column.
2. We can also create a unique index by checking the Create UNIQUE checkbox when creating the index graphically.
3. We can also create a unique index using SQL with the following command:
CREATE UNIQUE INDEX IDX_ProductName On Products (ProductName)

The above SQL command will not allow any duplicate values in the ProductName column, and an index is the best tool for the database to use to enforce this rule. Each time an application adds or modifies a row in the table, the database needs to search all existing records to ensure none of values in the new data duplicate existing values.


What are the disadvantages of an Index?

There are 2 disadvantages of an Index
1. Increased Disk Space
2. Insert, Update and Delete statements could be slow. In short, all DML statements could be slow.


Disk Space: Indexes are stored on the disk, and the amount of space required will depend on the size of the table, and the number and types of columns used in the index. Disk space is generally cheap enough to trade for application performance, particularly when a database serves a large number of users.

Insert, Update and Delete statements could be slow: Another downside to using an index is the performance implication on data modification statements. Any time a query modifies the data in a table (INSERT, UPDATE, or DELETE), the database needs to update all of the indexes where data has changed. Indexing can help the database during data modification statements by allowing the database to quickly locate the records to modify, however, providing too many indexes to update can actually hurt the performance of data modifications. This leads to a delicate balancing act when tuning the database for performance.


What are the 2 types of Indexes in SQL Server?

1. Clustered Index
2. Non Clustered Index


How many Clustered and Non Clustered Indexes can you have per table?
Clustered Index - Only one Clustered Index per table. A clustered index contains all of the data for a table in the index, sorted by the index key. Phone Book is an example for Clustered Index.
Non Clustered Index - You can have multiple Non Clustered Indexes per table. Index at the back of a book is an example for Non Clustered Index.


Which Index is faster, Clustered or Non Clustered Index?

Clustered Index is slightly faster than Non Clustered Index. This is because, when a Non Clustered Index is used there is an extra look up from the Non Clustered Index to the table, to fetch the actual rows.


When is it usually better to create a unique nonclustered index on the primary key column?

Sometimes it is better to use a unique nonclustered index on the primary key column, and place the clustered index on a column used by more queries. For example, if the majority of searches are for the price of a product instead of the primary key of a product, the clustered index could be more effective if used on the price field.


What is a Composite Index in SQL Server?
or
What is the advantage of using a Composite Index in SQL Server?
or
What is Covering Query?

A composite index is an index on two or more columns. Both clustered and nonclustered indexes can be composite indexes. If all of the information for a query can be retrieved from an Index. A clustered index, if selected for use by the query optimizer, always covers a query, since it contains all of the data in a table. By creating a composite indexes, we can have covering queries.


What is a Trigger in SQL Server?
A Trigger is a database object that is attached to a table. In many aspects it is similar to a stored procedure. As a matter of fact, triggers are often referred to as a "special kind of stored procedure." The main difference between a trigger and a stored procedure is that the former is attached to a table and is only fired when an INSERT, UPDATE or DELETE occurs.

What are the two types of Triggers in SQL Server?
1. After Triggers : Fired after Insert, Update and Delete operations on a table.
2. Instead of Triggers: Fired instead of Insert, Update and Delete operations on a table.


What are the special tables used by Triggers in SQL Server?
Triggers make use of two special tables called inserted and deleted. The inserted table contains the data referenced in an INSERT before it is actually committed to the database. The deleted table contains the data in the underlying table referenced in a DELETE before it is actually removed from the database. When an UPDATE is issued both tables are used. More specifically, the new data referenced in the UPDATE statement is contained in inserted table and the data that is being updated is contained in deleted table.


What is the difference between a User Defined Function (UDF) and a Stored Procedure (SP) in SQL Server

1. Stored Procedure support deffered name resolution where as functions do not support deffered name resolution.

2. User Defined Function can be used in a select statement where as you cannot use a stored procedure in a select statement.

3. UDF's cannot return Image, Text where as a StoredProcedure can return any datatype.

4. In general User Defined Functions are used for computations where as Stored Procedures are used for performing business logic.

5. UDF should return a value where as Stored Procedure need not.

6. User Defined Functions accept lesser number of input parameters than Stored Procedures. UDF can have upto 1023 input parameters where as a Stored Procedure can have upto 21000 input parameters.

7. Temporary Tables can not be used in a UDF where as a StoredProcedure can use Temporary Tables.

8. UDF can not Execute Dynamic SQL where as a Stored Procedure can execute Dynamic SQL.

9. User Defined Function does not support error handling where as Stored Procedure support error handling. RAISEERROR or @@ERROR are not allowed in UDFs.


What is a View in SQL Server?
You can think of a view either as a compiled sql query or a virtual table. As a view represents a virtual table, it does not physically store any data. When you query a view, you actually retrieve the data from the underlying base tables.

What are the advantages of using views?
Or
When do you usually use views?

1. Views can be used to implement row level and column level security.
2. Simplify the database schema to the users. You can create a view based on multiple tables which join columns from all these multiple tables so that they look like a single table.
3. Views can be used to present aggregated and summarized data.


Can you create a view based on other views?
Yes, you can create a view based on other views. Usually we create views based on tables, but it also possible to create views based on views.

Can you update views?
Yes, views can be updated. However, updating a view that is based on multiple tables, may not update the underlying tables correctly. To correctly update a view that is based on multiple tables you can make use INSTEAD OF triggers in SQL Server.


What are indexed views?
Or
What are materialized views?


A view is a virtual table, it does not contain any physical data. A view is nothing more than compiled SQL query. Every time, we issue a select query against a view, we actually get the data from the underlying base tables and not from the view, as the view itself does not contain any data.

When you create an index on a view, the data gets physically stored in the view. So, when we issue a select query against an indexed view, the data is retrieved from the index without having to go to the underlying table, which will make the select statement to work slightly faster. However, the disadvantage is, INSERT, UPDATE and DELETE operations will become a little slow, because every time you insert or delete a row from the underlying table, the view index needs to be updated. Inshort, DML operations will have negative impact on performance.

Oracle refers to indexed views as materialized views.

Only the views created with schema binding, can have an Index. Simply adding WITH SCHEMABINDING to the end of the CREATE VIEW statement will accomplish this. However, the effect is that any changes to the underlying tables which will impact the view are not allowed. Since the indexed view is stored physically, any schema changes would impact the schema of the stored results set. Therefore, SQL Server requires that schema binding be used to prevent the view's schema (and therefore the underlying tables) from changing.

The first index for a view must be a UNIQUE CLUSTERED INDEX, after which, it's possible to create non-clustered indexes against the view.

Indexed Views are heavily used in data warehouses and reporting databases that are not highly transactional.

What are the limitations of a View?
1. You cannot pass parameters to a view.
2. Rules and Defaults cannot be associated with views.
3. The ORDER BY clause is invalid in views unless TOP or FOR XML is also specified.
4. Views cannot be based on temporary tables.


What are the different types of joins available in sql server?
There are 3 different types of joins available in sql server, and they are
1. Cross Join
2. Inner Join or Join
3. Outer Join

Outer Join is again divided into 3 types as shown below.
1. Left Outer Join or Left Join
2. Right Outer Join or Right Join
3. Full Outer Join or Full Join

You might have heard about self join, but self join is not a different type of join. A self join means joining a table with itself. We can have an inner self join or outer self join.


Key Points to remember about CROSS JOIN.
1. A cross join produces the Cartesian product of the tables involved in the join.This mean every row in the Left Table is joined to every row in the Right Table. Candidate is LEFT Table and Company is RIGHT Table.
2. In real time scenarios we rarley use CROSS JOIN. Most often we use either INNER JOIN or LEFT OUTER JOIN.
3. CROSS JOIN does not have an ON clause with a Join Condition. All the other JOINS use ON clause with a Join Condition.
4. Using an ON clause on a CROSS JOIN would generate a syntax error.


Index Scan:
Index Scan scans each and every record in the index. Table Scan is where the table is processed row by row from beginning to end. If the index is a clustered index then an index scan is really a table scan. Since a scan touches every row in the table whether or not it qualifies, the cost is proportional to the total number of rows in the table. Hence, a scan is an efficient strategy only if the table is small.

Index Seek:
Since a seek only touches rows that qualify and pages that contain these qualifying rows, the cost is proportional to the number of qualifying rows and pages rather than to the total number of rows in the table.


Advantages of stored procedures
1. Better Performance : Stored Procedures are compiled and their execution plan is cached and used again, when the same SP is executed again. Although adhoc queries also create and reuse plan, the plan is reused only when the query is textual match and the datatypes are matching with the previous call. Any change in the datatype or you have an extra space in the query then, a new plan is created.

2. Better Security : Applications making use of dynamically built adhoc sql queries are highly susceptible to sql injection attacks, where as Stored Procedures can avoid SQL injection attacks completely.

3. Reduced Network Traffic: Stored procedures can reduce network traffic to a very great extent when compared with adhoc sql queries. With stored procedures, you only need to send the name of the procedure between client and server. Imagine the amount of network bandwith that can be saved especially if the stored procedure contains 1000 to 2000 lines of SQL.

4. Better Maintainance and Reusability: Stored procedures can be used any where in the application. It is easier to maintain a stored procedure that is used on several pages as the modfifcations just need to be changed at one place where the stored procedure is defined. On the other hand, maintaining an adhoc sql query that's used on several pages is tedious and error prone, as we have to make modifications on each and every page.

Thursday, October 29, 2015

FIRST_VALUE and LAST_VALUE functions in SQL Server

CREATE TABLE [dbo].[Employees1]
(
[Id] [int] NOT NULL,
[Name] [nvarchar](50) NULL,
[Dept] [nvarchar](50) NULL,
[Gender] [nvarchar](50) NULL,
[Country] [nvarchar](50) NULL,
[Salary] [float] NULL,
)

Insert into Employees Values(1,'John','IT','Male','UK',5000)
Insert into Employees Values(2,'Mary','HR','Female','India',3000)
Insert into Employees Values(3,'Todd','IT','Male','UK',3500)
Insert into Employees Values(4,'Pam','HR','Female','India',4000)
Insert into Employees Values(5,'Tatcher','Payroll','Male','USA',2000)
Insert into Employees Values(6,'Sunil','IT','Male','USA',1400)
Insert into Employees Values(7,'Hari','Payroll','Male','UK',2500)
Insert into Employees Values(8,'Sunitha','HR','Female','India',4000)
Insert into Employees Values(9,'Sneha','IT','Female','India',3000)
Insert into Employees Values(10,'Ruby','Payroll','Male','UK',4600)

Select Name, Gender, Salary
FIRST_VALUE(NAME) Over (Order by Salary) as FirstValue
From Employees

Select Name, Gender, Salary
FIRST_VALUE(NAME) Over (PARTITION BY Gender Order by Salary) as FirstValue
From Employees


Select Name, Gender, Salary
LAST_VALUE(NAME) Over (Order by Salary Rows Between UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) as FirstValue
From Employees

Select Name, Gender, Salary
LAST_VALUE(NAME) Over (PARTITION BY Gender Order by Salary Order by Salary Rows Between UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) as LastValue
From Employees

Lead and Lag functions in SQL Server 2012

CREATE TABLE [dbo].[Employees1]
(
[Id] [int] NOT NULL,
[Name] [nvarchar](50) NULL,
[Dept] [nvarchar](50) NULL,
[Gender] [nvarchar](50) NULL,
[Country] [nvarchar](50) NULL,
[Salary] [float] NULL,
)

Insert into Employees Values(1,'John','IT','Male','UK',5000)
Insert into Employees Values(2,'Mary','HR','Female','India',3000)
Insert into Employees Values(3,'Todd','IT','Male','UK',3500)
Insert into Employees Values(4,'Pam','HR','Female','India',4000)
Insert into Employees Values(5,'Tatcher','Payroll','Male','USA',2000)
Insert into Employees Values(6,'Sunil','IT','Male','USA',1400)
Insert into Employees Values(7,'Hari','Payroll','Male','UK',2500)
Insert into Employees Values(8,'Sunitha','HR','Female','India',4000)
Insert into Employees Values(9,'Sneha','IT','Female','India',3000)
Insert into Employees Values(10,'Ruby','Payroll','Male','UK',4600)

Select Name, Gender, Salary
LEAD(Salary) Over (Order by Salary) as LEADS,
LAG(Salary) Over (Order by Salary) as LAGS
From Employees

Select Name, Gender, Salary
LEAD(Salary,2,-1) Over (Order by Salary) as LEADS,
LAG(Salary,1,-1) Over (Order by Salary) as LAGS
From Employees


Select Name, Gender, Salary
LEAD(Salary,2,-1) Over (PARTITION BY Gender Order by Salary) as LEADS,
LAG(Salary,1,-1) Over ( PARTITION BY Gender Order by Salary) as LAGS
From Employees

NTILE function in SQL Server

CREATE TABLE [dbo].[Employees1]
(
[Id] [int] NOT NULL,
[Name] [nvarchar](50) NULL,
[Dept] [nvarchar](50) NULL,
[Gender] [nvarchar](50) NULL,
[Country] [nvarchar](50) NULL,
[Salary] [float] NULL,
)

Insert into Employees Values(1,'John','IT','Male','UK',5000)
Insert into Employees Values(2,'Mary','HR','Female','India',3000)
Insert into Employees Values(3,'Todd','IT','Male','UK',3500)
Insert into Employees Values(4,'Pam','HR','Female','India',4000)
Insert into Employees Values(5,'Tatcher','Payroll','Male','USA',2000)
Insert into Employees Values(6,'Sunil','IT','Male','USA',1400)
Insert into Employees Values(7,'Hari','Payroll','Male','UK',2500)
Insert into Employees Values(8,'Sunitha','HR','Female','India',4000)
Insert into Employees Values(9,'Sneha','IT','Female','India',3000)
Insert into Employees Values(10,'Ruby','Payroll','Male','UK',4600)

Select Name, Gender, Salary
NTILE (2) Over (Order by Salary) as NTiles
From Employees

Select Name, Gender, Salary
NTILE (3) Over (Order by Salary) as NTiles
From Employees


--- If the No of rows are less then NTILE parameter value

Select Name, Gender, Salary
NTILE (12) Over (Order by Salary) as NTiles
From Employees

-- Use of Partition

Select Name, Gender, Salary
NTILE (3) Over (Partition by Gender Order by Salary) as NTiles
From Employees

Rank and Dense Rank in SQL Server

CREATE TABLE [dbo].[Employees1]
(
[Id] [int] NOT NULL,
[Name] [nvarchar](50) NULL,
[Dept] [nvarchar](50) NULL,
[Gender] [nvarchar](50) NULL,
[Country] [nvarchar](50) NULL,
[Salary] [float] NULL,
)

Insert into Employees Values(1,'John','IT','Male','UK',5000)
Insert into Employees Values(2,'Mary','HR','Female','India',3000)
Insert into Employees Values(3,'Todd','IT','Male','UK',3500)
Insert into Employees Values(4,'Pam','HR','Female','India',4000)
Insert into Employees Values(5,'Tatcher','Payroll','Male','USA',2000)
Insert into Employees Values(6,'Sunil','IT','Male','USA',1400)
Insert into Employees Values(7,'Hari','Payroll','Male','UK',2500)
Insert into Employees Values(8,'Sunitha','HR','Female','India',4000)
Insert into Employees Values(9,'Sneha','IT','Female','India',3000)
Insert into Employees Values(10,'Ruby','Payroll','Male','UK',4600)

With Result AS
(
Select Salary, RANK() Over (Order by Salary Desc) as SalaryRank From employees
)
Select Top 1 Salary from Result Where SalaryRank = 2


With Result AS
(
Select Salary, DENSE_RANK() Over (Order by Salary Desc) as SalaryRank From employees
)
Select Top 1 Salary from Result Where SalaryRank = 2


With Result AS
(
Select Gender, Salary, DENSE_RANK() Over (Pratition by Gender Order by Salary Desc) as SalaryRank From employees
)
Select Top 1 Salary from Result Where SalaryRank = 3 and Gender ='Female'

The Use of Over Function

CREATE TABLE [dbo].[Employees1]
(
[Id] [int] NOT NULL,
[Name] [nvarchar](50) NULL,
[Dept] [nvarchar](50) NULL,
[Gender] [nvarchar](50) NULL,
[Country] [nvarchar](50) NULL,
[Salary] [float] NULL,
)

Insert into Employees Values(1,'John','IT','Male','UK',5000)
Insert into Employees Values(2,'Mary','HR','Female','India',3000)
Insert into Employees Values(3,'Todd','IT','Male','UK',3500)
Insert into Employees Values(4,'Pam','HR','Female','India',4000)
Insert into Employees Values(5,'Tatcher','Payroll','Male','USA',2000)
Insert into Employees Values(6,'Sunil','IT','Male','USA',1400)
Insert into Employees Values(7,'Hari','Payroll','Male','UK',2500)
Insert into Employees Values(8,'Sunitha','HR','Female','India',4000)
Insert into Employees Values(9,'Sneha','IT','Female','India',3000)
Insert into Employees Values(10,'Ruby','Payroll','Male','UK',4600)

Select  Name, Salary, Employee.Gender,Genders.GenderTotal, Genders.AvgSal,Genders.MinSal,Genders.MaxSal
From Employees
Inner Join
(Select Gender,Count(*) as GenderTotal, AVG(Salary) as AvgSal, Min(Salary) as MinSalary, Max(Salary) as MaxSal
From Employees
Group By Gender) as Genders
On Genders.Gender = Employee.Gender

--Now use of Over Function

Select Name, Salary, Gender,
Count(Gender) OVER (Partition by Gender) as GenderTotal,
AVG(Salary) OVER (Partition by Gender) as AvgSal,
MIN(Salary) OVER (Partition by Gender) as MinSal,
MAX(Salary) OVER (Partition by Gender) as MaxSal
From Employees

Row_NUMBER () Function

CREATE TABLE [dbo].[Employees1]
(
[Id] [int] NOT NULL,
[Name] [nvarchar](50) NULL,
[Dept] [nvarchar](50) NULL,
[Gender] [nvarchar](50) NULL,
[Country] [nvarchar](50) NULL,
[Salary] [float] NULL,
)

Insert into Employees Values(1,'John','IT','Male','UK',5000)
Insert into Employees Values(2,'Mary','HR','Female','India',3000)
Insert into Employees Values(3,'Todd','IT','Male','UK',3500)
Insert into Employees Values(4,'Pam','HR','Female','India',4000)
Insert into Employees Values(5,'Tatcher','Payroll','Male','USA',2000)
Insert into Employees Values(6,'Sunil','IT','Male','USA',1400)
Insert into Employees Values(7,'Hari','Payroll','Male','UK',2500)
Insert into Employees Values(8,'Sunitha','HR','Female','India',4000)
Insert into Employees Values(9,'Sneha','IT','Female','India',3000)
Insert into Employees Values(10,'Ruby','Payroll','Male','UK',4600)

Select Name, Gender,Salary, 
Row_NUMBER () OVER (Order by Gender) as RowNo
From Employees

-- so here all the rows will have rownumber start with 1 till end

Select Name, Gender,Salary, 
Row_NUMBER () OVER (Partition by Gender Order by Gender) as RowNo
From Employees

-- so here when the partition will change then the Rownumber will reset and start with 1 

How to Delete the duplicate Records

Create table Employees
(
     ID int,
     FirstName nvarchar(50),
     LastName nvarchar(50),
     Gender nvarchar(50),
     Salary int
)
GO

Insert into Employees values (1, 'Mark', 'Hastings', 'Male', 60000)
Insert into Employees values (1, 'Mark', 'Hastings', 'Male', 60000)
Insert into Employees values (1, 'Mark', 'Hastings', 'Male', 60000)
Insert into Employees values (2, 'Mary', 'Lambeth', 'Female', 30000)
Insert into Employees values (2, 'Mary', 'Lambeth', 'Female', 30000)
Insert into Employees values (3, 'Ben', 'Hoskins', 'Male', 70000)
Insert into Employees values (3, 'Ben', 'Hoskins', 'Male', 70000)
Insert into Employees values (3, 'Ben', 'Hoskins', 'Male', 70000)


With EmployeeCTE
(
Select * , Row_Number () Over (Partition By ID Order by ID) as RowNumber
From Employees
)
Delete From EmployeeCTE where RowNumber>1




Cube and Rollup Functions in the SQL

CREATE TABLE [dbo].[Employees1]
(
[Id] [int] NOT NULL,
[Name] [nvarchar](50) NULL,
[Dept] [nvarchar](50) NULL,
[Gender] [nvarchar](50) NULL,
[Country] [nvarchar](50) NULL,
[Salary] [float] NULL,
)

Insert into Employees Values(1,'John','IT','Male','UK',5000)
Insert into Employees Values(2,'Mary','HR','Female','India',3000)
Insert into Employees Values(3,'Todd','IT','Male','UK',3500)
Insert into Employees Values(4,'Pam','HR','Female','India',4000)
Insert into Employees Values(5,'Tatcher','Payroll','Male','USA',2000)
Insert into Employees Values(6,'Sunil','IT','Male','USA',1400)
Insert into Employees Values(7,'Hari','Payroll','Male','UK',2500)
Insert into Employees Values(8,'Sunitha','HR','Female','India',4000)
Insert into Employees Values(9,'Sneha','IT','Female','India',3000)
Insert into Employees Values(10,'Ruby','Payroll','Male','UK',4600)


Select Country, State, District, City, Sum(Sales) as TotalSales
From Sales
Group By Cube (Country, State, District, City) -- Works on maximum numbers of combinations


Select Country, State, District, City, Sum(Sales) as TotalSales
From Sales
Group By ROLLUP(Country, State, District, City)-- works based on the Hirerchy



Select Country, State, District, City, Sum(Sales) as TotalSales
From Sales
Group By Cube (Country)


Select Country, Sum(Sales) as TotalSales
From Sales
Group By ROLLUP(Country)

Cube Function in the SQL Server

Select State,City, Count(Acct_NUM) as TotalAccounts
From CustomerProfile
Group By Cube (State,City)

Select Country,Gender, Sum(Salary) as TotalSalary
From Employees
Group By Cube (Country,Gender)

Select Country,Gender, Sum(Salary) as TotalSalary
From Employees
Group By  Country,Gender With Cube


-------Group By, Country and Gender  then By Country then Gender and then all---------


Select Country,Gender, Sum(Salary) as TotalSalary
From Employees
Group By 
        Grouping SETS
         (
             (Country,Gender),
             (Country),
             (Gender),
             ()
        )

-- Now the Alternative of the above Query

Select Country,Gender, Sum(Salary) as TotalSalary
From Employees
Group By Country,Gender

UNION ALL

Select Country,NULL, Sum(Salary) as TotalSalary
From Employees
Group By Country

UNION ALL

Select NULL,Gender, Sum(Salary) as TotalSalary
From Employees
Group By Gender

UNION ALL

Select NULL,NULL, Sum(Salary) as TotalSalary
From Employees

SQL SERVER Objects modified in last one month


SELECT * FROM sys.objects
WHERE TYPE = 'U' AND DATEDIFF(D,modify_date, GETDATE()) < 30

SELECT * FROM sys.objects
WHERE TYPE = 'P' AND DATEDIFF(D,modify_date, GETDATE()) < 30

SELECT * FROM sys.objects
WHERE TYPE = 'F' AND DATEDIFF(D,modify_date, GETDATE()) < 30

SELECT * FROM sys.objects
WHERE TYPE = 'V' AND DATEDIFF(D,modify_date, GETDATE()) < 30

Wednesday, October 28, 2015

sys.sysobjects (Transact-SQL)

Select * from INFORMATION_SCHEMA

Select * from Sys.Tables

Select * from Sys.procedures

Select * from Sys.views

Select * from Sys.columns

Select * from Sys.configurations

Select * from Sys.objects where TYPE ='D'


AF = Aggregate function (CLR)
C = CHECK constraint
D = Default or DEFAULT constraint
F = FOREIGN KEY constraint
L = Log
FN = Scalar function
FS = Assembly (CLR) scalar-function
FT = Assembly (CLR) table-valued function
IF = In-lined table-function
IT = Internal table
P = Stored procedure
PC = Assembly (CLR) stored-procedure
PK = PRIMARY KEY constraint (type is K)
RF = Replication filter stored procedure
S = System table
SN = Synonym
SQ = Service queue
TA = Assembly (CLR) DML trigger
TF = Table function
TR = SQL DML Trigger
TT = Table type
U = User table
UQ = UNIQUE constraint (type is K)
V = View
X = Extended stored procedure

Monday, October 19, 2015

Backup Terminologies



Complete backup - In complete backup the entire database is backed up at the same time. This backup includes all the database files, control files and journal files. 

Partial backup - Partial backup is not the complete backup of database. Partial backup are very useful in large databases because they allow a strategy whereby various parts of the database are backed up in a round robin fashion on day by day basis, so that the whole database is backed up effectively once a week.

Cold backup - Cold backup is taken while the database is completely shut down. In multiinstance environment all the instances should be shut down.

Hot backup - The hot backup is take when the database engine is up and running. Hot backup requirements that need to be considered vary from RDBMS to RDBMS. Hot backups are extremely useful.

Online backup - It is same as the hot backup.

Friday, October 16, 2015

Agile Scrum


Definition of Scrum


Scrum (n): A framework within which people can address complex adaptive problems, while productively and creatively delivering products of the highest possible value.

Scrum is:
  • Lightweight
  • Simple to understand
  • Difficult to master

Scrum is a process framework that has been used to manage complex product development since the early 1990s. Scrum is not a process or a technique for building products; rather, it is a framework within which you can employ various processes and techniques. Scrum makes clear the relative efficacy of your product management and development practices so that you can improve.

The Scrum framework consists of Scrum Teams and their associated roles, events, artifacts, and rules. Each component within the framework serves a specific purpose and is essential to Scrum’s success and usage.

The rules of Scrum bind together the events, roles, and artifacts, governing the relationships and interaction between them. The rules of Scrum are described throughout the body of this document.

Scrum Theory
Scrum is founded on empirical process control theory, or empiricism. Empiricism asserts that knowledge comes from experience and making decisions based on what is known. Scrum employs an iterative, incremental approach to optimize predictability and control risk. Three pillars uphold every implementation of empirical process control: transparency, inspection, and adaptation.

Transparency
Significant aspects of the process must be visible to those responsible for the outcome. Transparency requires those aspects be defined by a common standard so observers share a common understanding of what is being seen.

For example:
A common language referring to the process must be shared by all participants; and,
Those performing the work and those accepting the work product must share a common definition of “Done

Inspection

Scrum users must frequently inspect Scrum artifacts and progress toward a Sprint Goal to detect undesirable variances. Their inspection should not be so frequent that inspection gets in the way of the work. Inspections are most beneficial when diligently performed by skilled inspectors at the point of work.

Adaptation

If an inspector determines that one or more aspects of a process deviate outside acceptable limits, and that the resulting product will be unacceptable, the process or the material being processed must be adjusted. An adjustment must be made as soon as possible to minimize further deviation.

Scrum prescribes four formal events for inspection and adaptation, as described in the Scrum Events section of this document:

  • Sprint Planning
  • Daily Scrum
  • Sprint Review
  • Sprint Retrospective
  • The Scrum Team

The Scrum Team consists of a Product Owner, the Development Team, and a Scrum Master. Scrum Teams are self-organizing and cross-functional. Self-organizing teams choose how best to accomplish their work, rather than being directed by others outside the team. Cross-functional teams have all competencies needed to accomplish the work without depending on others not part of the team. The team model in Scrum is designed to optimize flexibility, creativity, and productivity.

Scrum Teams deliver products iteratively and incrementally, maximizing opportunities for feedback. Incremental deliveries of “Done” product ensure a potentially useful version of working product is always available.

The Product Owner

The Product Owner is responsible for maximizing the value of the product and the work of the Development Team. How this is done may vary widely across organizations, Scrum Teams, and individuals.

  • The Product Owner is the sole person responsible for managing the Product Backlog. Product Backlog management includes:
  • Clearly expressing Product Backlog items;
  • Ordering the items in the Product Backlog to best achieve goals and missions;
  • Optimizing the value of the work the Development Team performs;
  • Ensuring that the Product Backlog is visible, transparent, and clear to all, and shows what the Scrum Team will work on next; and,
  • Ensuring the Development Team understands items in the Product Backlog to the level needed.
  • The Product Owner may do the above work, or have the Development Team do it. However, the Product Owner remains accountable.
  • The Product Owner is one person, not a committee. The Product Owner may represent the desires of a committee in the Product Backlog, but those wanting to change a Product Backlog item’s priority must address the Product Owner.

For the Product Owner to succeed, the entire organization must respect his or her decisions. The Product Owner’s decisions are visible in the content and ordering of the Product Backlog. No one is allowed to tell the Development Team to work from a different set of requirements, and the Development Team isn’t allowed to act on what anyone else says.

The Development Team


The Development Team consists of professionals who do the work of delivering a potentially releasable Increment of “Done” product at the end of each Sprint. Only members of the Development Team create the Increment.

Development Teams are structured and empowered by the organization to organize and manage their own work. The resulting synergy optimizes the Development Team’s overall efficiency and effectiveness.

Development Teams have the following characteristics:

They are self-organizing. No one (not even the Scrum Master) tells the Development Team how to turn Product Backlog into Increments of potentially releasable functionality;
Development Teams are cross-functional, with all of the skills as a team necessary to create a product Increment;

Scrum recognizes no titles for Development Team members other than Developer, regardless of the work being performed by the person; there are no exceptions to this rule;
Scrum recognizes no sub-teams in the Development Team, regardless of particular domains that need to be addressed like testing or business analysis; there are no exceptions to this rule; and,
Individual Development Team members may have specialized skills and areas of focus, but accountability belongs to the Development Team as a whole.

Development Team Size

Optimal Development Team size is small enough to remain nimble and large enough to complete significant work within a Sprint. Fewer than three Development Team members decrease interaction and results in smaller productivity gains. Smaller Development Teams may encounter skill constraints during the Sprint, causing the Development Team to be unable to deliver a potentially releasable Increment. Having more than nine members requires too much coordination. Large Development Teams generate too much complexity for an empirical process to manage. The Product Owner and Scrum Master roles are not included in this count unless they are also executing the work of the Sprint Backlog.

The Scrum Master

The Scrum Master is responsible for ensuring Scrum is understood and enacted. Scrum Masters do this by ensuring that the Scrum Team adheres to Scrum theory, practices, and rules.

The Scrum Master is a servant-leader for the Scrum Team. The Scrum Master helps those outside the Scrum Team understand which of their interactions with the Scrum Team are helpful and which aren’t. The Scrum Master helps everyone change these interactions to maximize the value created by the Scrum Team.

Scrum Master Service to the Product Owner

The Scrum Master serves the Product Owner in several ways, including:
  • Finding techniques for effective Product Backlog management;
  • Helping the Scrum Team understand the need for clear and concise Product Backlog items;
  • Understanding product planning in an empirical environment;
  • Ensuring the Product Owner knows how to arrange the Product Backlog to maximize value;
  • Understanding and practicing agility; and,
  • Facilitating Scrum events as requested or needed.
  • Scrum Master Service to the Development Team
  • The Scrum Master serves the Development Team in several ways, including:
  • Coaching the Development Team in self-organization and cross-functionality;
  • Helping the Development Team to create high-value products;
  • Removing impediments to the Development Team’s progress;
  • Facilitating Scrum events as requested or needed; and,
  • Coaching the Development Team in organizational environments in which Scrum is not yet fully adopted and understood.

Scrum Master Service to the Organization


The Scrum Master serves the organization in several ways, including:
  • Leading and coaching the organization in its Scrum adoption;
  • Planning Scrum implementations within the organization;
  • Helping employees and stakeholders understand and enact Scrum and empirical product development;
  • Causing change that increases the productivity of the Scrum Team; and,
  • Working with other Scrum Masters to increase the effectiveness of the application of Scrum in the organization.
Scrum Events

Prescribed events are used in Scrum to create regularity and to minimize the need for meetings not defined in Scrum. All events are time-boxed events, such that every event has a maximum duration. Once a Sprint begins, its duration is fixed and cannot be shortened or lengthened. The remaining events may end whenever the purpose of the event is achieved, ensuring an appropriate amount of time is spent without allowing waste in the process.

Other than the Sprint itself, which is a container for all other events, each event in Scrum is a formal opportunity to inspect and adapt something. These events are specifically designed to enable critical transparency and inspection. Failure to include any of these events results in reduced transparency and is a lost opportunity to inspect and adapt.

The Sprint

The heart of Scrum is a Sprint, a time-box of one month or less during which a “Done”, useable, and potentially releasable product Increment is created. Sprints best have consistent durations throughout a development effort. A new Sprint starts immediately after the conclusion of the previous Sprint.

Sprints contain and consist of the Sprint Planning, Daily Scrums, the development work, the Sprint Review, and the Sprint Retrospective.

During the Sprint:

No changes are made that would endanger the Sprint Goal;

Quality goals do not decrease; and,


Scope may be clarified and re-negotiated between the Product Owner and Development Team as more is learned.

Each Sprint may be considered a project with no more than a one-month horizon. Like projects, Sprints are used to accomplish something. Each Sprint has a definition of what is to be built, a design and flexible plan that will guide building it, the work, and the resultant product.

Sprints are limited to one calendar month. When a Sprint’s horizon is too long the definition of what is being built may change, complexity may rise, and risk may increase. Sprints enable predictability by ensuring inspection and adaptation of progress toward a Sprint Goal at least every calendar month. Sprints also limit risk to one calendar month of cost.
Cancelling a Sprint

A Sprint can be cancelled before the Sprint time-box is over. Only the Product Owner has the authority to cancel the Sprint, although he or she may do so under influence from the stakeholders, the Development Team, or the Scrum Master.

A Sprint would be cancelled if the Sprint Goal becomes obsolete. This might occur if the company changes direction or if market or technology conditions change. In general, a Sprint should be cancelled if it no longer makes sense given the circumstances. But, due to the short duration of Sprints, cancellation rarely makes sense.

When a Sprint is cancelled, any completed and “Done” Product Backlog items are reviewed. If part of the work is potentially releasable, the Product Owner typically accepts it. All incomplete Product Backlog Items are re-estimated and put back on the Product Backlog. The work done on them depreciates quickly and must be frequently re-estimated.

Sprint cancellations consume resources, since everyone has to regroup in another Sprint Planning to start another Sprint. Sprint cancellations are often traumatic to the Scrum Team, and are very uncommon.

Sprint Planning

The work to be performed in the Sprint is planned at the Sprint Planning. This plan is created by the collaborative work of the entire Scrum Team.

Sprint Planning is time-boxed to a maximum of eight hours for a one-month Sprint. For shorter Sprints, the event is usually shorter. The Scrum Master ensures that the event takes place and that attendants understand its purpose. The Scrum Master teaches the Scrum Team to keep it within the time-box.

Sprint Planning answers the following:

What can be delivered in the Increment resulting from the upcoming Sprint?

How will the work needed to deliver the Increment be achieved?

Topic One: What can be done this Sprint?

The Development Team works to forecast the functionality that will be developed during the Sprint. The Product Owner discusses the objective that the Sprint should achieve and the Product Backlog items that, if completed in the Sprint, would achieve the Sprint Goal. The entire Scrum Team collaborates on understanding the work of the Sprint.

The input to this meeting is the Product Backlog, the latest product Increment, projected capacity of the Development Team during the Sprint, and past performance of the Development Team. The number of items selected from the Product Backlog for the Sprint is solely up to the Development Team. Only the Development Team can assess what it can accomplish over the upcoming Sprint.

After the Development Team forecasts the Product Backlog items it will deliver in the Sprint, the Scrum Team crafts a Sprint Goal. The Sprint Goal is an objective that will be met within the Sprint through the implementation of the Product Backlog, and it provides guidance to the Development Team on why it is building the Increment.

Topic Two: how will the chosen work get done?

Having set the Sprint Goal and selected the Product Backlog items for the Sprint, the Development Team decides how it will build this functionality into a “Done” product Increment during the Sprint. The Product Backlog items selected for this Sprint plus the plan for delivering them is called the Sprint Backlog.

The Development Team usually starts by designing the system and the work needed to convert the Product Backlog into a working product Increment. Work may be of varying size, or estimated effort. However, enough work is planned during Sprint Planning for the Development Team to forecast what it believes it can do in the upcoming Sprint. Work planned for the first days of the Sprint by the Development Team is decomposed by the end of this meeting, often to units of one day or less. The Development Team self-organizes to undertake the work in the Sprint Backlog, both during Sprint Planning and as needed throughout the Sprint.

The Product Owner can help to clarify the selected Product Backlog items and make trade-offs. If the Development Team determines it has too much or too little work, it may renegotiate the selected Product Backlog items with the Product Owner. The Development Team may also invite other people to attend in order to provide technical or domain advice.

By the end of the Sprint Planning, the Development Team should be able to explain to the Product Owner and Scrum Master how it intends to work as a self-organizing team to accomplish the Sprint Goal and create the anticipated Increment.

Sprint Goal


The Sprint Goal is an objective set for the Sprint that can be met through the implementation of Product Backlog. It provides guidance to the Development Team on why it is building the Increment. It is created during the Sprint Planning meeting. The Sprint Goal gives the Development Team some flexibility regarding the functionality implemented within the Sprint. The selected Product Backlog items deliver one coherent function, which can be the Sprint Goal. The Sprint Goal can be any other coherence that causes the Development Team to work together rather than on separate initiatives.

As the Development Team works, it keeps the Sprint Goal in mind. In order to satisfy the Sprint Goal, it implements the functionality and technology. If the work turns out to be different than the Development Team expected, they collaborate with the Product Owner to negotiate the scope of Sprint Backlog within the Sprint.

Daily Scrum

The Daily Scrum is a 15-minute time-boxed event for the Development Team to synchronize activities and create a plan for the next 24 hours. This is done by inspecting the work since the last Daily Scrum and forecasting the work that could be done before the next one. The Daily Scrum is held at the same time and place each day to reduce complexity. During the meeting, the Development Team members explain:

  • What did I do yesterday that helped the Development Team meet the Sprint Goal?
  • What will I do today to help the Development Team meet the Sprint Goal?
  • Do I see any impediment that prevents me or the Development Team from meeting the Sprint Goal?

The Development Team uses the Daily Scrum to inspect progress toward the Sprint Goal and to inspect how progress is trending toward completing the work in the Sprint Backlog. The Daily Scrum optimizes the probability that the Development Team will meet the Sprint Goal. Every day, the Development Team should understand how it intends to work together as a self-organizing team to accomplish the Sprint Goal and create the anticipated Increment by the end of the Sprint. The Development Team or team members often meet immediately after the Daily Scrum for detailed discussions, or to adapt, or replan, the rest of the Sprint’s work.

The Scrum Master ensures that the Development Team has the meeting, but the Development Team is responsible for conducting the Daily Scrum. The Scrum Master teaches the Development Team to keep the Daily Scrum within the 15-minute time-box.

The Scrum Master enforces the rule that only Development Team members participate in the Daily Scrum.

Daily Scrums improve communications, eliminate other meetings, identify impediments to development for removal, highlight and promote quick decision-making, and improve the Development Team’s level of knowledge. This is a key inspect and adapt meeting.

Sprint Review

A Sprint Review is held at the end of the Sprint to inspect the Increment and adapt the Product Backlog if needed. During the Sprint Review, the Scrum Team and stakeholders collaborate about what was done in the Sprint. Based on that and any changes to the Product Backlog during the Sprint, attendees collaborate on the next things that could be done to optimize value. This is an informal meeting, not a status meeting, and the presentation of the Increment is intended to elicit feedback and foster collaboration.

This is a four-hour time-boxed meeting for one-month Sprints. For shorter Sprints, the event is usually shorter. The Scrum Master ensures that the event takes place and that attendants understand its purpose. The Scrum Master teaches all to keep it within the time-box.

The Sprint Review includes the following elements:

Attendees include the Scrum Team and key stakeholders invited by the Product Owner;

The Product Owner explains what Product Backlog items have been “Done” and what has not been “Done”;

The Development Team discusses what went well during the Sprint, what problems it ran into, and how those problems were solved;

The Development Team demonstrates the work that it has “Done” and answers questions about the Increment;

The Product Owner discusses the Product Backlog as it stands. He or she projects likely completion dates based on progress to date (if needed);

The entire group collaborates on what to do next, so that the Sprint Review provides valuable input to subsequent Sprint Planning;

Review of how the marketplace or potential use of the product might have changed what is the most valuable thing to do next; and,

Review of the timeline, budget, potential capabilities, and marketplace for the next anticipated release of the product.

The result of the Sprint Review is a revised Product Backlog that defines the probable Product Backlog items for the next Sprint. The Product Backlog may also be adjusted overall to meet new opportunities.

Sprint Retrospective

The Sprint Retrospective is an opportunity for the Scrum Team to inspect itself and create a plan for improvements to be enacted during the next Sprint.

The Sprint Retrospective occurs after the Sprint Review and prior to the next Sprint Planning. This is a three-hour time-boxed meeting for one-month Sprints. For shorter Sprints, the event is usually shorter. The Scrum Master ensures that the event takes place and that attendants understand its purpose. The Scrum Master teaches all to keep it within the time-box. The Scrum Master participates as a peer team member in the meeting from the accountability over the Scrum process.

The purpose of the Sprint Retrospective is to:

Inspect how the last Sprint went with regards to people, relationships, process, and tools;

Identify and order the major items that went well and potential improvements; and,

Create a plan for implementing improvements to the way the Scrum Team does its work.


The Scrum Master encourages the Scrum Team to improve, within the Scrum process framework, its development process and practices to make it more effective and enjoyable for the next Sprint. During each Sprint Retrospective, the Scrum Team plans ways to increase product quality by adapting the definition of “Done” as appropriate.

By the end of the Sprint Retrospective, the Scrum Team should have identified improvements that it will implement in the next Sprint. Implementing these improvements in the next Sprint is the adaptation to the inspection of the Scrum Team itself. Although improvements may be implemented at any time, the Sprint Retrospective provides a formal opportunity to focus on inspection and adaptation.

Scrum Artifacts

Scrum’s artifacts represent work or value to provide transparency and opportunities for inspection and adaptation. Artifacts defined by Scrum are specifically designed to maximize transparency of key information so that everybody has the same understanding of the artifact.

Product Backlog


The Product Backlog is an ordered list of everything that might be needed in the product and is the single source of requirements for any changes to be made to the product. The Product Owner is responsible for the Product Backlog, including its content, availability, and ordering.

A Product Backlog is never complete. The earliest development of it only lays out the initially known and best-understood requirements. The Product Backlog evolves as the product and the environment in which it will be used evolves. The Product Backlog is dynamic; it constantly changes to identify what the product needs to be appropriate, competitive, and useful. As long as a product exists, its Product Backlog also exists.

The Product Backlog lists all features, functions, requirements, enhancements, and fixes that constitute the changes to be made to the product in future releases. Product Backlog items have the attributes of a description, order, estimate and value.

As a product is used and gains value, and the marketplace provides feedback, the Product Backlog becomes a larger and more exhaustive list. Requirements never stop changing, so a Product Backlog is a living artifact. Changes in business requirements, market conditions, or technology may cause changes in the Product Backlog.

Multiple Scrum Teams often work together on the same product. One Product Backlog is used to describe the upcoming work on the product. A Product Backlog attribute that groups items may then be employed.

Product Backlog refinement is the act of adding detail, estimates, and order to items in the Product Backlog. This is an ongoing process in which the Product Owner and the Development Team collaborate on the details of Product Backlog items. During Product Backlog refinement, items are reviewed and revised. The Scrum Team decides how and when refinement is done. Refinement usually consumes no more than 10% of the capacity of the Development Team. However, Product Backlog items can be updated at any time by the Product Owner or at the Product Owner’s discretion.

Higher ordered Product Backlog items are usually clearer and more detailed than lower ordered ones. More precise estimates are made based on the greater clarity and increased detail; the lower the order, the less detail. Product Backlog items that will occupy the Development Team for the upcoming Sprint are refined so that any one item can reasonably be “Done” within the Sprint time-box. Product Backlog items that can be “Done” by the Development Team within one Sprint are deemed “Ready” for selection in a Sprint Planning. Product Backlog items usually acquire this degree of transparency through the above described refining activities.

The Development Team is responsible for all estimates. The Product Owner may influence the Development Team by helping it understand and select trade-offs, but the people who will perform the work make the final estimate.

Monitoring Progress Toward a Goal

At any point in time, the total work remaining to reach a goal can be summed. The Product Owner tracks this total work remaining at least every Sprint Review. The Product Owner compares this amount with work remaining at previous Sprint Reviews to assess progress toward completing projected work by the desired time for the goal. This information is made transparent to all stakeholders.

Various projective practices upon trending have been used to forecast progress, like burn-downs, burn-ups, or cumulative flows. These have proven useful. However, these do not replace the importance of empiricism. In complex environments, what will happen is unknown. Only what has happened may be used for forward-looking decision-making.

Sprint Backlog

The Sprint Backlog is the set of Product Backlog items selected for the Sprint, plus a plan for delivering the product Increment and realizing the Sprint Goal. The Sprint Backlog is a forecast by the Development Team about what functionality will be in the next Increment and the work needed to deliver that functionality into a “Done” Increment.

The Sprint Backlog makes visible all of the work that the Development Team identifies as necessary to meet the Sprint Goal.

The Sprint Backlog is a plan with enough detail that changes in progress can be understood in the Daily Scrum. The Development Team modifies the Sprint Backlog throughout the Sprint, and the Sprint Backlog emerges during the Sprint. This emergence occurs as the Development Team works through the plan and learns more about the work needed to achieve the Sprint Goal.

As new work is required, the Development Team adds it to the Sprint Backlog. As work is performed or completed, the estimated remaining work is updated. When elements of the plan are deemed unnecessary, they are removed. Only the Development Team can change its Sprint Backlog during a Sprint. The Sprint Backlog is a highly visible, real-time picture of the work that the Development Team plans to accomplish during the Sprint, and it belongs solely to the Development Team.

Monitoring Sprint Progress

At any point in time in a Sprint, the total work remaining in the Sprint Backlog can be summed. The Development Team tracks this total work remaining at least for every Daily Scrum to project the likelihood of achieving the Sprint Goal. By tracking the remaining work throughout the Sprint, the Development Team can manage its progress.
Increment

The Increment is the sum of all the Product Backlog items completed during a Sprint and the value of the increments of all previous Sprints. At the end of a Sprint, the new Increment must be “Done,” which means it must be in useable condition and meet the Scrum Team’s definition of “Done.” It must be in useable condition regardless of whether the Product Owner decides to actually release it.

Artifact Transparency

Scrum relies on transparency. Decisions to optimize value and control risk are made based on the perceived state of the artifacts. To the extent that transparency is complete, these decisions have a sound basis. To the extent that the artifacts are incompletely transparent, these decisions can be flawed, value may diminish and risk may increase.

The Scrum Master must work with the Product Owner, Development Team, and other involved parties to understand if the artifacts are completely transparent. There are practices for coping with incomplete transparency; the Scrum Master must help everyone apply the most appropriate practices in the absence of complete transparency. A Scrum Master can detect incomplete transparency by inspecting the artifacts, sensing patterns, listening closely to what is being said, and detecting differences between expected and real results.

The Scrum Master’s job is to work with the Scrum Team and the organization to increase the transparency of the artifacts. This work usually involves learning, convincing, and change. Transparency doesn’t occur overnight, but is a path.

Definition of "Done"
When a Product Backlog item or an Increment is described as “Done”, everyone must understand what “Done” means. Although this varies significantly per Scrum Team, members must have a shared understanding of what it means for work to be complete, to ensure transparency. This is the definition of “Done” for the Scrum Team and is used to assess when work is complete on the product Increment.

The same definition guides the Development Team in knowing how many Product Backlog items it can select during a Sprint Planning. The purpose of each Sprint is to deliver Increments of potentially releasable functionality that adhere to the Scrum Team’s current definition of “Done.” Development Teams deliver an Increment of product functionality every Sprint. This Increment is useable, so a Product Owner may choose to immediately release it. If the definition of "done" for an increment is part of the conventions, standards or guidelines of the development organization, all Scrum Teams must follow it as a minimum. If "done" for an increment is not a convention of the development organization, the Development Team of the Scrum Team must define a definition of “done” appropriate for the product. If there are multiple Scrum Teams working on the system or product release, the development teams on all of the Scrum Teams must mutually define the definition of “Done.”

Each Increment is additive to all prior Increments and thoroughly tested, ensuring that all Increments work together.

As Scrum Teams mature, it is expected that their definitions of “Done” will expand to include more stringent criteria for higher quality. Any one product or system should have a definition of “Done” that is a standard for any work done on it.

End Note

Scrum is free and offered in this Guide. Scrum’s roles, artifacts, events, and rules are immutable and although implementing only parts of Scrum is possible, the result is not Scrum. Scrum exists only in its entirety and functions well as a container for other techniques, methodologies, and practices.