Monday, February 14, 2011

SSIS Configurations

The Real task of creating SSIS packages lies when we are about to deploy them to certain environment, the point where most of the hard coded values needs a real exercise. In order to make this task easy SSIS provides us a concept called Configurations. In this blog we will guide you through the various kinds of configurations available in SSIS and also lets you know when to use what.

Everything goes fine when we design, develop and deploy the packages on/ to our local environment (server). The real problems arise when we try to deploy the same to other (Remote) environments. Since every Question has its own Answer we too have a solution (Configuration Files) for this kind of situations. Generally most of us during development hard code the connection strings, file paths, package paths etc. which will create a problem when they are moved to some other environment. So, by using Configuration file it becomes easy where we need to change at only one place instead of changing multiple places inside a solution. This configuration file in turn adds a dynamic execution feature to our SSIS Packages.
A configuration is a name-value pair. The name is a property within a package that needs to be changed, and the value is the value to be assigned to that property.
An Integration Services package consists of many objects for ex: Containers (this includes all tasks, as well as other containers such as the ForEach Loop Enumerator), Variables, Connection Managers, Data Flow Components etc. However you cannot apply a configuration to an object; you can only apply a configuration to a property of an object. For example, if you wanted to change the value returned when referencing a package variable, you would apply the configuration to the Value property of the variable, not to the variable object itself.
For Ex: \Package.Variables[User::MyVar].Properties[Value]


Configuration Types
There five basic types of Integration Services configurations.
1. XML Configuration File
This is the most commonly used configuration type. An XML file, which by default has the .dtsConfig suffix, is created which contains one or more configurations.
Advantage: They can be easily deployed onto multiple environments simply by copying the file.

Disadvantage: The path to the file must be hard-coded within the package. However this requirement can be alleviated by using indirect configurations. “Indirect configurations” is a term used to describe the practice of using an operating system environment variable to redirect the Integration Services execution engine to the location of the XML configuration file.
A configuration stored in an XML configuration file stores both the property path and the value outside the package.

2. Environment Variable Configuration
An environment variable configuration takes the value for a property from whatever is stored in a named environment variable. An environment variable configuration stores the property path inside the package and the value outside the package.
Note that an environment variable configuration by itself is not an example of an indirect configuration.

3. Parent Package Configuration
Parent package configurations are used to fetch a value from a variable in a calling package. Just like environment variable configurations, a parent package configuration stores the property path inside the package and the value outside the package.

4. Registry Configuration
The value to be applied to a package property is stored in a registry entry. Just like environment variable configurations and parent package configurations, a registry configuration stores the property path inside the package and the value outside the package.

5. SQL Server Configuration
SQL Server configurations are stored in a SQL Server table. The table can have any name you like, and can be in any database on any server that you like. If you are using a SQL Server configuration, then you can and should make it an indirect configuration. As explained earlier in this article, when using indirect configurations, an environment variable stores the location of the configurations themselves.

A configuration stored in SQL Server stores both the property path and the value outside the package.

Create a user variable. When you add configurations to a package, you are basically exposing package properties and allowing them to be updated with a new value that comes from a file, table, environment variable, registry key or parent package variable. Package configurations are disabled by default, and have to be enabled on each package in which you want to use them. You can enable package configurations by choosing the Package Configurations option on the SSIS menu.
• Select Enable Package Configurations Option.
• Click on Add button.
You can choose the configuration type, which in this case is an XML configuration file. Then you have two options for providing the location of the configuration file.

• The first option is known as the direct method, where the path to the file is stored inside of the package definition. This option works fine if you don’t have to deploy the package to different machines, or if the path provided is valid on every machine the package will be deploy to.
• The second method is called indirect, where you can pick the name of a Windows environment variable that would hold the path to the configuration file. This method is ideal when you have to deploy packages to different servers, and when the location of the XML file is subject to change, since you would only need to change the value of the environment variable where the package is deployed. Notice that the environment variable must exist prior to creating the package configuration, and in some instances you may have to close and re-open the Business Intelligence Development Studio (BIDS) session in order to get the name of the environment variable in the dropdown list.
When you press the Finish button, the configuration file is created and you should be able to see a new entry in the package configuration organizer.

An XML configuration file has two parts. The header contains metadata about the file itself, like creator, the name and ID of the package that was used when creating the file, and the creation date and time. The configuration section is where the path to the properties being updated and the configuration values to be used are stored.
In this example, we decide to provide “Run-time value” as new value for the variable being configured. For that we open the file in a text editor and change the line:
During Development
To
Run-time value from config file

Environment Variable Configurations:
With this type of package configuration, you have to create an environment variable for each package property you intend to update, and place the configuration value as the value of the environment variable.
We have to choose the environment variable to be used from a drop down list and next it lets you choose which object property is going to be affected by the configuration we are creating.

Registry Entry Configurations:
This type will let us store the configuration values in Windows registry entries similar to environment variable configurations.
After we select a registry key configuration type from the dropdown list, we have to choose the method to be used. The first option is the direct method, where the wizard expects a valid registry key name that exists under the Windows registry HKEY_CURRENT_USER key. The second option is the indirect method, where you provide the name of an environment variable that in turns contains the registry key name to be used by the configuration. The indirect method gives you the flexibility to change the name of the registry key or to point to a different one by updating the environment variable value.
For Ex: If we want to use a registry key that exists directly under HKEY_CURRENT_USER,
Expected value is: SampleSSISProjConfig

If you create a registry key to be used by the configuration that is not directly under HKEY_CURRENT_USER key, as shown below, then the wizard expects this value: SampleSSISProjConfig\config1. Next we select the property object we want to update trough the configuration.
There are a few other things we need to keep in mind when using this type of configurations:
• The wizard does not create or modify the registry keys; the registry keys must be manually created and edited.
• The registry keys must exist, either directly or indirectly, under HKEY_CURRENT_USER, and the configuration value has to be store in an entry called ‘value’.
• Registry keys under HKEY_CURRENT_USER are visible only to the user that created them, hence, the package can apply the configurations only when it is executed using the credentials of that user. If you execute the package via SQL Server Agent, a workaround is to set up a proxy account in SQL Server Agent. Make sure that you consider all the implications of this behavior, as this may complicate your deployment scenarios.
• You have to use one registry key per configuration value. If you are configuring multiple properties, consider using SQL Server or XML file configurations instead.
• You do not need to close and reopen the BIDS session each time you create or make changes to the registry keys being used by the configuration. This is an advantage over environment variable configuration type.
• You need access to modify the Windows registry settings on each machine were you intend to deploy the package. This makes this configuration type risky.
• Be extremely careful when using the registry editor, as any error could cause serious problems to the system. Always create a backup of the registry before making changes.

Parent Package Variable Configurations:
When we execute a package (the child) from another package (the parent) via the Execute Package task, we can use Parent Package Variable configurations in the child package to pass variable values from the parent. In the Package Configuration Wizard, in the child package, we have to specify the name of the variable (that exists in the parent) that holds the desired configuration value.
Notice that the child package is unaware of the existence of the parent package, and the name of the variable that we enter is not validated when we create the configuration. When using the direct method, we have to type the variable name exactly as it appears in the parent package. Alternatively, we can select an environment variable that contains the name of the parent package variable, thus adding the flexibility the indirect method offers.
The next step of the wizard allows us to select the property to be updated, in the same way as when using registry entry or environment variable configurations.
• By design, parent package variables are always applied last, regardless of where we place them in the configurations organizer.
• This type of configuration is applied only when the child package is executed via the Execute Package task.
• SSIS variable names are case-sensitive. Always we need to make sure that both spelling and casing of the variable name in the configuration entry are identical to the ones in the parent package.
• Standalone execution and unit testing of child packages using this type of configuration may be challenging when the child package does not run in the context of the parent. A simple but effective workaround is to always apply this type of configurations to variables in the child package, and then use expressions in the objects that need to be updated (connection managers, server names, and so forth) based on those variables. This way, if you need to run the child package without the parent, you can manually edit the variables in the child package in BIDS, or via the /SET option of the DTExec utility.
• Each package configuration entry can only affect one property in the package being configured.

SQL Server Configurations:
This configuration type offers almost the same level of flexibility and functionality as XML configuration files, with the difference that configuration information is stored in a SQL Server table instead of an XML file. The table can be created in any database that is accessible by the package at execution time. You can use the Package Configuration Wizard to create the table. This is the default structure of the table:
CREATE TABLE [dbo].[SSIS Configurations]
(
ConfigurationFilter NVARCHAR(255) NOT NULL,
ConfiguredValue NVARCHAR(255) NULL,
PackagePath NVARCHAR(255) NOT NULL,
ConfiguredValueType NVARCHAR(20) NOT NULL
)

Above fields are used as follows:
• ConfigurationFilter: This field is used by SSIS to indentify a set of property/values pairs that are part of the same configuration entry in the Package Configurations Organizer.
• ConfigurationValue: It stores the value that is used to update the package property specified in PackagePath column.
• PackagePath: The path that point to the property being configured.
• ConfiguredValueType: the SSIS data type of the property being configured.
When we create SQL Server package configurations, we first have to choose which method we would use to provide the connection information to the configuration table

With the direct method, the connection information, configuration table and filter are stored inside of the package. The indirect method instead allows storing that information in an environment variable. Notice that both methods use an SSIS connection manager and its connection string is hard-coded inside of the package. Therefore, if we use this configuration type, it is a good practice to ensure that the connection string in this connection manager can be updated from an external source. A common approach is to use a separate configuration (XML, Registry Key or environment variable) to update the connection manager when required.
The next step in the configuration wizard lets us choose the set of properties to be targeted by the configuration being created.

• SQL Server configurations are very flexible, as you can store multiple configuration properties/values in the same table and Multiple packages can use the same configuration table and filter.
• We can have a configuration filter for each entry in the configuration table, or we can use the same filter to group a set of entries. If we use the second approach, we need to make sure that all properties under the selected filter exist in the package being configured. Otherwise warning messages are generated for those properties in the configuration filter that do not exist in the target package.
• We can use existing database security and backup policies to protect the configuration table contents.
• A drawback of this configuration type is that it relies on an SSIS connection manager to get access to the configuration table, but does not have built-in support for updating its connection string. In other words, the connection string that points to the configuration table is hard-coded inside of the package, and we have to take extra steps in order to update it (for example, through an extra package configuration placed at the top of the Package Configurations Organizer, or in SQL Server 2008, by using the /CONN option of DTExec).
• The /CONN option of DTExec utility can be used to override the connection string used by the configuration only in SSIS 2008. This is because SQL Server 2008 Integration Services reloads configurations after applying DTExec command line options.
• The configuration wizard can create the table and insert the required rows when they do not exist, but only if the direct method is chosen. The wizard never includes any sensitive data in the ConfiguredValue column of the table, and you would need to manually update the table to add the required sensitive data.
• If we use the direct method, and provide an existing configuration filter, the wizard will prompt us to either reuse the configuration information in the table or to override it with the property values from the current package.
• When we use the indirect method, the environment variable must contain the name of the configuration connection manager, the name of the configuration table and the configuration filter. The syntax should be: “ConfigurationManagerName”;”Schema.ConfigurationTableName”;’ConfigurationFilter”

Additional Information:
1. .dtsConfig file is not a configuration, it is a collection of one or more configurations.
2. You can save some development time by using package templates. If you anticipate creating a large number of packages that use a common set of package configurations, you can simply include the required package configurations in a template, and then use the template when creating new packages.
3. The data flow task in SSIS does not allow changes to the pipeline structure at run time (number of columns, column names, and data types), and package configurations should not be used to attempt such changes in dataflow pipeline metadata, since it would cause validation errors in the package.
4. Multiple packages can use the same XML configuration file.
5. The package configuration wizard can create the XML file only when the direct method is used. If you decide to use the indirect method, consider creating the configuration by using the direct method first, and once the file is created, change the method to the indirect one.
6. When creating the file, the wizard does not include any sensitive information (such as passwords in connection strings) in the configuration file, and you have to edit the configuration file to add such information manually..
7. The Package Configurations Organizer does not show any value for Target Object and Target Property columns for XML file based configurations, as the file may contain multiple configurations
8. The environment variable(s) must exist on all machines where the package is going to be deployed.This is because the name of the environment variable is stored inside of the package.
9. The configuration wizard does not create the environment variables, and they are visible to the wizard only if they were created prior to opening the current BIDS session.

Behavior Changes Related to Package Configurations:
In Short: In SQL Server 2005 Integration Services, all package configurations, except Parent Package Variable configurations, are applied before applying the options specified with the DTExec command. Parent Package Variables are applied after applying DTExec options.
In SQL Server 2008 Integration Services, package configurations are applied twice, before and after applying the options of the DTExec command prompt utility. This should be seen as an improvement, since you can now use the /SET, /CONF or /CONN options of the DTExec utility to alter the original definition of the configurations. For example, you can now use the /CONN option to alter the connection manager being used in SQL Server configurations - something that was not possible with the 2005 version.
In Detail: In SQL Server 2005 Integration Services, events occur in the following order:
1. The utility first applies the design-time configurations. (Design-time configurations are the configurations that you created and associated with the package when you designed the package in BI Development Studio.)
2. The utility then applies the run-time options that you specified on the command line when you started the utility.
In SQL Server 2008 Integration Services, events occur in the following order:
1. The utility first applies the design-time configurations.
2. The utility then applies the run-time options that you specified on the command line when you started the utility.
3. Finally, the utility reloads and reapplies the design-time configurations.
Because the events occur in this order, you can dynamically redirect a configuration by using the /SET command-line option to change the configuration string.