Tuesday, June 21, 2011

Working with SQL Table Partitions

When a database table grows in size to the hundreds of gigabytes or more, it can become more difficult to load new data, remove old data, and maintain indexes.
Even the data that we want to load or remove may be very few records, making INSERT and DELETE operations on the table becomes impractical.

In this Scenario, How can we face this issue and How can we achieve Performance ?

Partitioning a large table divides the table and its indexes into smaller partitions, so that maintenance operations can be applied on a partition-by-partition basis, rather than on the entire table.

In addition, the SQL Server optimizer can direct properly filtered queries to appropriate partitions rather than the entire table.

Table partitioning was introduced in SQL Server 2005 Enterprise Edition and enhanced in SQL Server 2008 Enterprise.

Every table in SQL Server 2005 and SQL Server 2008 may be looked upon as partitioned, in the sense that each table is given a default partition in the sys.partitions catalog view. However, a table is not truly partitioned unless it is created using a partition scheme and function, which ordinary database tables by default do not have.

In SQL Server, there are two ways to partition:
a.
You can manually subdivide a large table's data into multiple physical tables.
b. you can use SQL Server's table partitioning feature to partition a single table.

1st way: Again there are two primary ways to partition data into multiple tables:

i. Horizontal Partitioning: where selected subsets of rows are placed in different tables
When a view is created over all the tables, and queries directed to the view, the result is a partitioned view.

Disadvantages: In a partitioned view, you have to manually apply the required constraints and operations, and maintenance can be complex and time-consuming.

ii. Vertical Partitioning: where the columns of a very wide table are spread across multiple tables containing distinct subsets of the columns with the same number of rows. The result is multiple tables containing the same number of rows but different columns, usually with the same primary key column in each table.

Often a view is defined across the multiple tables and queries directed against the view. SQL Server does not provide built-in support for vertical partitioning, but the new sparse columns feature of SQL Server 2008 can be a better solution for tables that require large numbers of columns

2nd Way: SQL Server's table partitioning differs from the above two approaches by partitioning a single table: here Multiple physical tables are no longer involved.

When a table is created as a partitioned table, SQL Server automatically places the table's rows in the correct partition, and SQL Server maintains the partitions behind the scenes.
You can then perform maintenance operations on individual partitions, and properly filtered queries will access only the correct partitions. But it is still one table as far as SQL Server is concerned.

A partitioned table is a unique kind of table in SQL Server. It depends on two pre-existing objects, the partition function and partition scheme, which are used only for partitioned tables and indexes


A partitioned table has a column identified as a partitioning column, and that column is referenced when the table is created. The partitioned table must be created on a partition scheme, which defines the filegroup storage locations for the partitioned table. The partition scheme in turn depends on a previously created partition function that defines the number of partitions the table will have and how the boundaries of the partitions are defined


Planning for Table Partitioning:

In order to successfully partition a large table, you must make a number of decisions. In particular, you need to:


Plan the partitioning:
Decide which table or tables can benefit from the increased manageability and availability of partitioning.
Decide the column or column combination upon which to base the partition.
Specify the partition boundaries in a partition function.
Plan on how to store the partitions in filegroups using a partition scheme.

Choosing a Table to Partition:

There is no firm rule or formula that would determine when a table is large enough to be partitioned, or whether even a very large table would benefit from partitioning.
Sometimes large tables may not require partitioning, some conditions where partitioning a table could benefit are:
Index maintenance on the table is costly or time-consuming and could benefit from re-indexing partitions of the table rather than the whole table at once.
Data must be aged out of the table periodically, and the delete process is currently too slow or blocks users trying to query the table.
New data is loaded periodically into the table, the load process is too slow or interferes with queries on the table, and the table data lends itself to a partition column based on ascending date or time.
In other words, make sure that the large table will actually benefit from partitioning; don’t partition it just because it’s big.


Choosing a Partition Column:

The partition column choice is critical, because after you have partitioned a table, choosing a different partition column will require re-creating the table, reloading all the data, and rebuilding all the indexes. Some things to note about the partition column:
The partition column must be a single column in the table (either a single column or a computed column).
If you have a combination of columns that form the best partition column, you can add a persisted computed column to the table that combines the values of the original columns and then partition on it.
In a clustered table, the partition column must be part of either the primary key or the clustered index. If the partition column is not part of the table's primary key, the partition column might allow NULL. Any data with a NULL in the partition column will reside in the leftmost partition.
You should also try to choose a partitioned column that will be used as a filter criterion in most of the queries run against the table. This enables partition elimination, where the query processor can eliminate inapplicable partitions from the query plan, and just access the partitions implied by the filter on the queries.

After you have identified the table and its partition column, you can then go through the steps of partitioning a table or index, which are:
Create or use an existing partition function that sets the correct range boundaries.
Create or use an existing partition scheme tied to the partition function.
Create the table using the partition scheme.

Benefits of Partitioned Tables and Indexes:

SQL Server’s partitioned tables and indexes offer a number of advantages when compared with partitioned views and other forms of partitioning data:
SQL Server automatically manages the placement of data in the proper partitions.
A partitioned table and its indexes appear as a normal database table with indexes, even though the table might have numerous partitions.
The table can be managed at the partition and filegroup level for ease of maintenance.
Partitioned tables support easier and faster data loading, aging, and archiving, as illustrated in the example above.
Application queries that are properly filtered on the partition column can perform better by making use of partition elimination and parallelism.
In cases where partitioned data will not be modified, you can mark some or most of a partitioned table's filegroups as read-only, making management of the filegroups easier.
In SQL Server 2008, you can compress individual partitions as well as control lock escalation at a partition level.

Partitioning large tables has some challenges:

There is a maximum of 1,000 partitions for a table.
You must manage filegroups and file placement if you place partitions on individual filegroups.
The metadata-only operations (SWITCH, MERGE, and SPLIT) can be blocked by other DML actions on the table at the time, until a schema-modification lock can be obtained.
Managing date or time-based data can be complex.
You cannot rebuild a partitioned index with the ONLINE option set to ON, because the entire table will be locked during the rebuild.
Automating changes to partitioned tables, as in a sliding window scenario, can be difficult, but Microsoft provides some tools to assist in automating the process.

Table Partitioning Best Practices:

Table partitioning is useful on very large data tables for primarily two reasons.
The major reason for partitioning is to gain better management and control of large tables by partitioning them. To gain better management of large tables, you can:
Rebuild and reorganize indexes by partition.
Use partition-aligned indexed views in switching operations.
Use a sliding window strategy for quickly bringing in new data and archiving old data.
Additionally, SQL Server's query optimizer can use partition elimination and parallelism to increase appropriately filtered query performance against partitioned tables. To make use of partition elimination:
Ensure that indexes are aligned with the partitioned table, and that indexed views are partition-aligned.
Ensure that queries against the partitioned tables have filters based on the partition column.
On data warehouse joins, keep the join column simple (such as an integer or date) and explicit, so as to take advantage of bitmap filtering for star joins.
In general, to take full advantage of table partitioning, you should:
Make sure that the configuration of max degree of parallelism is set sufficiently high to take advantage of parallel operations, or else add a MAXDOP query hint to fine-tune the degree of parallelism.
Maintain an empty partition on both ends of the partitioned table and ensure that only empty partitions are split and merged in a sliding window scenario.
Remember that RANGE RIGHT may be more convenient than RANGE LEFT in partition functions, especially when you are specifying date ranges.
Use data types without fractional components as partition columns, such as a date or an integer.
Always use a standard language-independent date format when specifying partition function boundary values.
Use an integer-based date and date dimension in data warehouses.
Use a single column of the table as the partitioned column whenever possible. If you must partition across more than one column, you can use a persisted computed column as the partitioning column. But then to achieve partition elimination, you must control queries to ensure they reference the partition column in their filters.
Use SWITCH with MERGE to drop partition data: Switch out the partition and remove the partition's boundary value using MERGE.
Use TRUNCATE TABLE to delete partition data by switching a partition out to a staging table and truncating the staging table.
Check for partition elimination in query plans.
Place read only data on read-only filegroups to reduce locking and simplify recovery for piecemeal restores.
Spread filegroups across all disks for maximum I/O performance.
Automate sliding window scenarios using available tools.

Example:

USE [master]
GO

CREATE DATABASE [SampleDB] ON PRIMARY
(
NAME = N'SampleDB',
FILENAME = N'C:\All Projects\TestDB\SampleDB.mdf' ,
SIZE = 3048KB ,
MAXSIZE = UNLIMITED,
FILEGROWTH = 1024KB
),

FILEGROUP [Filegroup1]
(
NAME = N'File1',
FILENAME = N'C:\All Projects\TestDB\File1.ndf' ,
SIZE = 2048KB ,
MAXSIZE = UNLIMITED,
FILEGROWTH = 1024KB
),

FILEGROUP [Filegroup2]
(
NAME = N'File2',
FILENAME = N'C:\All Projects\TestDB\File2.ndf' ,
SIZE = 2048KB ,
MAXSIZE = UNLIMITED,
FILEGROWTH = 1024KB
),

FILEGROUP [Filegroup3]
(
NAME = N'File3',
FILENAME = N'C:\All Projects\TestDB\File3.ndf' ,
SIZE = 2048KB ,
MAXSIZE = UNLIMITED,
FILEGROWTH = 1024KB
),

FILEGROUP [Filegroup4]
(
NAME = N'File4',
FILENAME = N'C:\All Projects\TestDB\File4.ndf' ,
SIZE = 2048KB ,
MAXSIZE = UNLIMITED,
FILEGROWTH = 1024KB
)
LOG ON
(
NAME = N'SampleDB_log',
FILENAME = N'C:\All Projects\TestDB\SampleDB_log.ldf' ,
SIZE = 1024KB ,
MAXSIZE = 2048GB ,
FILEGROWTH = 10%
)
GO

USE SampleDB
GO

CREATE PARTITION FUNCTION PF2_Left (int)
AS RANGE LEFT FOR VALUES (0, 10, 20);

CREATE PARTITION FUNCTION PF2_Right (int)
AS RANGE RIGHT FOR VALUES (0, 10, 20);

GO


CREATE PARTITION SCHEME PS2_Right
AS PARTITION PF2_Right
TO (Filegroup1, Filegroup2, Filegroup3, Filegroup4)


CREATE TABLE PartitionedTable
(
PartitionColumnID int not null,
Name varchar(100) not null
)
ON PS2_Right(PartitionColumnID);


INSERT PartitionedTable
select * from (values (31,'Amit'),(34,'Rahul'),(32,'Rajesh'),(33,'Ajay'),(28,'Ketan'))as tab(ID,NAME)


select * from PartitionedTable


More Examples:

REATE PARTITION FUNCTION myRangePF1 (int)
AS RANGE LEFT FOR VALUES (1, 100, 1000);
GO

CREATE PARTITION SCHEME myRangePS1
AS PARTITION myRangePF1
TO (test1fg, test2fg, test3fg, test4fg);



CREATE PARTITION FUNCTION myRangePF2 (int)
AS RANGE LEFT FOR VALUES (1, 100, 1000);
GO

CREATE PARTITION SCHEME myRangePS2
AS PARTITION myRangePF2
TO ( test1fg, test1fg, test1fg, test2fg );


CREATE PARTITION FUNCTION myRangePF3 (int)
AS RANGE LEFT FOR VALUES (1, 100, 1000);
GO

CREATE PARTITION SCHEME myRangePS3
AS PARTITION myRangePF3
ALL TO ( test1fg );



CREATE PARTITION FUNCTION myRangePF4 (int)
AS RANGE LEFT FOR VALUES (1, 100, 1000);
GO

CREATE PARTITION SCHEME myRangePS4
AS PARTITION myRangePF4
TO (test1fg, test2fg, test3fg, test4fg, test5fg)

--Creating a RANGE LEFT partition function on an int column


CREATE PARTITION FUNCTION myRangePF1 (int)
AS RANGE LEFT FOR VALUES (1, 100, 1000);

Values
col1 <= 1 col1 > 1 AND col1 <= 100 col1 > 100 AND col1 <= 1000 col1 > 1000



--Creating a RANGE RIGHT partition function on an int column


CREATE PARTITION FUNCTION myRangePF2 (int)
AS RANGE RIGHT FOR VALUES (1, 100, 1000);

Values
col1 < 1 col1 >= 1 AND col1 < 100 col1 >= 100 AND col1 < 1000 col1 >= 1000




--Creating a RANGE RIGHT partition function on a datetime column

CREATE PARTITION FUNCTION [myDateRangePF1] (datetime)
AS RANGE RIGHT FOR VALUES ('20030201', '20030301', '20030401',
'20030501', '20030601', '20030701', '20030801',
'20030901', '20031001', '20031101', '20031201');

DROP PARTITION FUNCTION myRangePF;