Step 1
(Click on Start button -->Programs -->Microsoft SQL Server 2005 -->SQL Server Business Intelligence Development Studio)
Step 2
(File --> New--> Project)
Step 3
(Give Project name, Solution name and Directory name)
Creation of package in SSIS:
1. In Business Intelligence Development Studio, open the Integration Services project in which you want to create a package.
2. In Solution Explorer, right-click the SSIS Packages folder, and then click New SSIS Package.
3. Optionally, add control flow, data flow tasks, and event handlers to the package. For more information, see Creating Package Control Flow, Creating Package Data Flow, and Creating Package Event Handlers.
4. On the File menu, click Save Selected Items to save the new package.
Renaming Package:
1. In Solution Explorer, right-click the created Packages i.e. Package.dtsx, and then click Rename.
2. After rename child dialogue box appear.
3. Click Yes.
Adding an existing package in SSIS:
If you have sales.dtsx package in some location and you need to add this to your project. Here all your packages stored in file system.
1. In Solution Explorer, right-click the SSIS Packages folder, and then click Existing SSIS Package.
Executing the Package
With the package open in BIDS, you can run it using the standard Visual Studio tools for running a project. Choose any of these options:
• Right-click the package in Solution Explorer and select Execute Package.
• Click the Start Debugging toolbar button.
• Press F5.
SSIS will execute the package, highlighting the steps in the package as they are completed.
Green color highlight specify that execution is successful, Yellow color specify that execution is in progress and Red color indicates that execution stops due to some error.
We can add Data Viewer that is break point to the running package to see what data is coming from different transformation. We can achieve this by Right clicking the green line and select Data Viewer and then add the grid to it.
Working with Connection Manager:
SSIS uses connection managers to integrate different data sources into packages. SSIS includes a wide variety of different connection managers that allow you to move data around from place to place.
We can create Shared Data sources and Data sets as well for better performance, generic/common for all packages in the project. By using Shared Data sources and data sets we need not to create separate data source/dataset for each package. If we need to modify something is data source/data set, we will modify only on one location and these changes will effect on all packages.
Steps:
1. In Solution Explorer, right-click the Data Source folder, and then click New Data Source.
2. Create a data source or select the data connection
3. Provide the data source name and finish.
Step 1
Step 2
Step 3
Step 4
Package created in data source folder
Step 5:
Alternate way to create connection is as below
• Right click the area of connection manager and add your connection.
Step 6:
Another way to create the connection is by Adding Dynamic Connection to your package.
Firstly Add variable PackageFilePath with string data type. Then follow steps:
List of Connection Managers:
Connection Manager Handles
ADO Connection Manager Connecting to ADO objects such as a Recordset.
ADO.NET Connection Manager Connecting to data sources through an ADO.NET provider.
Analysis Services Connection Manager Connecting to an Analysis Services database or cube.
Excel Connection Manager Connecting to an Excel worksheet.
File Connection Manager Connecting to a file or folder.
Flat File Connection Manager Connecting to delimited or fixed width flat files.
FTP Connection Manager Connecting to an FTP data source.
HTTP Connection Manager Connecting to an HTTP data source.
MSMQ Connection Manager Connecting to a Microsoft Message Queue.
Multiple Files Connection Manager Connecting to a set of files, such as all text files on a particular hard drive.
Multiple Flat Files Connection Manager Connecting to a set of flat files.
ODBC Connection Manager Connecting to an ODBC data source.
OLE DB Connection Manager Connecting to an OLE DB data source.
SMO Connection Manager Connecting to a server via SMO.
SMTP Connection Manager Connecting to a Simple Mail Transfer Protocol server.
SQL Server Mobile Connection Manager Connecting to a SQL Server Mobile database.
WMI Connection Manager Connecting to Windows Management Instrumentation data.
SSIS Tabs:
1. Control Flow
2. Data Flow
3. Event Handler
4. Package Explorer
5. Progress Execution
Control Flow Tab:
The Control Flow tab of the Package Designer is where you tell SSIS what the package will do. You create your control flow by dragging and dropping items from the toolbox to the surface, and then dragging and dropping connections between the objects.
Tasks in Control Flow:
The objects you can drop here break up into different groups:
• Control Flow items: Control Flow items are things that SSIS can do, such as execute SQL statements or transfer objects from one SQL Server to another.
• Maintenance Plan tasks: are a special group of tasks that handle jobs such as checking database integrity and rebuilding indexes.
• Containers: are objects that can hold a group of tasks.
Control Flow items
1. Bulk Insert Task
Bulk Insert task copies the data from file to table. The Bulk Insert task provides an efficient way to copy large amounts of data into a SQL Server table or view.
If the Bulk Insert task uses a Flat File connection manager to access the source file, the task does not use the format specified in the Flat File connection manager. Instead, the Bulk Insert task uses either the format specified in a format file, or the values of the RowDelimiter and ColumnDelimiter properties of the task.
Properties:
• Connection: specifies the destination connection.
• Destination Table: specifies the destination table.
• ColumnDelimiter: specifies the column delimiter in the text file.ie Comma {,}
• RowDelimiter: specifies the row delimiter in the input data. i.e.{CR}{LF}
Before you use the Bulk Insert task, consider the following:
• The Bulk Insert task can transfer data only from a text file into a SQL Server table or view. To use the Bulk Insert task to transfer data from other database management systems (DBMSs), you must export the data from the source to a text file and then import the data from the text file into a SQL Server table or view.
• The destination must be a table or view in a SQL Server database. If the destination table or view already contains data, the new data is appended to the existing data when the Bulk Insert task runs. If you want to replace the data, run an Execute SQL task that runs a DELETE or TRUNCATE statement before you run the Bulk Insert task.
2. Data Flow Task
The Data Flow Task is a general purpose task for ETL (extract, transform, and load) operations on data. There's a separate design tab for building the details of a Data Flow Task.
3. Execute Package Task
The Execute Package task extends the enterprise capabilities of Integration Services by letting packages run other packages as part of a workflow.
Properties:
• Location: Specifies the location type of the package to be run. Usually ssis package stored as a file system.
• Connection: path of the package where it is located.ie in our project solution path.
4. Execute SQL Task
The Execute SQL Task can be used to execute any TSQL code such as truncating a staging table, executing queries, stored procedures etc. from a SSIS Package. This task can also be used in combination with For Loop and Foreach Loop containers to run multiple TSQL statements.
The Execute SQL task runs SQL statements or stored procedures from a package. The task can contain either a single SQL statement or multiple SQL statements that run sequentially. You can use the Execute SQL task for the following purposes:
• Truncate a table or view in preparation for inserting data.
• Create, alter, and drop database objects such as tables and views.
• Re-create fact and dimension tables before loading data into them.
• Run stored procedures.
• Save the row set returned from a query into a variable.
Properties:
• Connection : specify the connection manager
• SQL Statement: Insert your Sql query which you want to execute.
• Parameter Mapping: If you need to pass parameter then add the parameter and also specify the data type, parameter name to 0, 1 because position of your parameter passing.
5. File System Task
File System Task Perform file system operations such as copy or delete. The File System task performs operations on files and directories in the file system. For example, by using the File System task, a package can create, move, or delete directories and files. You can also use the File System task to set attributes on files and directories.
Properties:
• Destination Connection : specify the destination file
• Overwrite Destination: specify the value if you want to overwrite it or not.
• Operation: select an operation to perform.
• Source Connection: Specify the source file.
6. Script Task
The Script task provides code to perform functions that are not available in the built-in tasks and transformations that SQL Server Integration Services provides. The Script task can also combine functions in one script instead of using multiple tasks and transformations.
Properties:
• Script Language: specify the language you want to write your script.
• Entry Point: specify the entry point should be your main ( ).
• Then Click the edit script button below and write your script or code in that.
List of Tasks in Control Flow:
Task Purpose
ActiveX Script Execute an ActiveX Script
Analysis Services Execute DDL Execute DDL query statements against an Analysis Services server
Analysis Services Processing Process an Analysis Services cube
Bulk Insert Insert data from a file into a database
Data Mining Query Execute a data mining query
Execute DTS 2000 Package Execute a Data Transformation Services Package (DTS was the SQL Server 2000 version of SSIS)
Execute Package Execute an SSIS package
Execute Process Shell out to a Windows application
Execute SQL Run a SQL query
File System Perform file system operations such as copy or delete
FTP Perform FTP operations
Message Queue Send or receive messages via MSMQ
Script Execute a custom task
Send Mail Send e-mail
Transfer Database Transfer an entire database between two SQL Servers
Transfer Error Messages Transfer custom error messages between two SQL Servers
Transfer Jobs Transfer jobs between two SQL Servers
Transfer Logins Transfer logins between two SQL Servers
Transfer Master Stored Procedures Transfer stored procedures from the master database on one SQL Server to the master database on another SQL Server
Transfer SQL Server Objects Transfer objects between two SQL Servers
Web Service Execute a SOAP Web method
WMI Data Reader Read data via WMI
WMI Event Watcher Wait for a WMI event
XML Perform operations on XML data
List of Maintenance Plan Tasks:
Task Purpose
Back Up Database Back up an entire database to file or tape
Check Database Integrity Perform database consistency checks
Execute SQL Server Agent Job Run a job
Execute T-SQL Statement Run any T-SQL script
History Cleanup Clean out history tables for other maintenance tasks
Maintenance Cleanup Clean up files left by other maintenance tasks
Notify Operator Send e-mail to SQL Server operators
Rebuild Index Rebuild a SQL Server index
Reorganize Index Compacts and defragments an index
Shrink Database Shrinks a database
Update Statistics Update statistics used to calculate query plans
List of Containers:
Container Purpose
For Loop Repeat a task a fixed number of times
Foreach Repeat a task by enumerating over a group of objects
Sequence Group multiple tasks into a single unit for easier management
Data Flow Tab
The Data Flow tab of the Package Designer is where you specify the details of any Data Flow tasks that you've added on the Control Flow tab. SQL Server Integration Services provides three different types of data flow components: sources, transformations, and destinations. Sources extract data from data stores such as tables and views in relational databases, files, and Analysis Services databases. Transformations modify, summarize, and clean data. Destinations load data into data stores or create in-memory datasets.
Data Flows are made up of various objects that you drag and drop from the Toolbox:
• Data Flow Sources are ways that data gets into the system.
• Data Flow Transformations let you alter and manipulate the data in various ways.
• Data Flow Destinations are the places that you can send the transformed data.
Data Flow Sources:
Source Use
DataReader Extracts data from a database using a .NET DataReader
Excel Extracts data from an Excel workbook
Flat File Extracts data from a flat file
OLE DB Extracts data from a database using an OLE DB provider
Raw File Extracts data from a raw file
XML Extracts data from an XML file
1. OleDB Source
This is likely to be the most commonly used source in SSIS packages as it connects to a wide variety of OLE DB compliant databases, such as Oracle and SQL server. Oledb source Extracts data from a database using an OLE DB provider.
Properties:
• Connection Manager: specify the oledb connection manager and also select the data access mode i.e table or view, sql command, table or view with variable, sql command with variable.
There are four are available data access modes:
• Table or view
• Table or view name variable
• SQL Command
• SQL Command from variable
You will note that two of the available data access modes are essentially duplicates that allow the data access mode to be populated from a variable. The table view data access mode allows you to select the entire contents of any table or view within the database that you have connected to. The a SQL Command data access mode is much more powerful as it allows you to craft custom T-SQL queries to retrieve the specific data you require from the database, including the use of stored procedures to return data.
• Column: select columns which you want to display.
2. Flat File Source
Flat File Source extracts data from a flat file.
Properties:
• Connection Manager: specify the flat file connection manager.
• Column: select columns which you want to display.
3. Raw File Source
This particular Source extracts data from raw File.
Properties:
• Connection Manager: specify Access Mode File name or file name or variable and also specify the file name.
• Column: select columns which you want to display.
4. Excel Source
Properties:
• Connection Manager: specify the connection manager and also select the data access mode i.e table or view, sql command, table or view with variable, sql command with variable and also the name of particular sheet.
• Column: select columns which you want to display.
5. ADO NET Source
Properties:
• Connection Manager: specify the connection manager and also select the data access mode i.e table or view, sql command, table or view with variable, sql command with variable.
• Column: select columns which you want to display.
Data Flow Destination
Destination Use
Data Mining Model Training Sends data to an Analysis Services data mining model
DataReader Sends data to an in-memory ADO.NET DataReader
Dimension Processing Processes a cube dimension
Excel Sends data to an Excel worksheet
Flat File Sends data to a flat file
OLE DB Sends data to an OLE DB database
Partition Processing Processes an Analysis Services partition
Raw File Sends data to a raw file
Recordset Sends data to an in-memory ADO Recordset
SQL Server Sends data to a SQL Server database
SQL Server Mobile Sends data to a SQL Server Mobile database
1. OleDB Destination
Load data to an oledb database. This Transformation send data to an oledb database.
Properties:
• Connection Manager: specify the oledb connection manager and also select the data access mode i.e table or view, sql command, table or view with variable, sql command with variable.
There are four are available data access modes:
• Table or view
• Table or view name variable
• SQL Command
• SQL Command from variable
You will note that two of the available data access modes are essentially duplicates that allow the data access mode to be populated from a variable. The table view data access mode allows you to select the entire contents of any table or view within the database that you have connected to. The a SQL Command data access mode is much more powerful as it allows you to craft custom T-SQL queries to retrieve the specific data you require from the database, including the use of stored procedures to return data.
• Mapping Column: Map the input column accordingly.
2. Flat File Destination
Load data to file. This Transformation sends data to file.
Properties:
• Connection Manager: specify the flat file connection manager.
• Mapping Column: Map the input column accordingly.
3. Excel Destination
Load data to Excel worksheet. This Transformation sends data to Excel worksheet.
Properties:
• Connection Manager: specify the connection manager and also select the data access mode i.e table or view, sql command, table or view with variable, sql command with variable and also the name of particular sheet.
• Mapping Column: Map the input column accordingly.
4. Raw File Destination
Load data to Raw File. This Transformation sends data to Raw File.
Properties:
• Connection Manager: specify Access Mode File name or file name or variable and also specify the file name.
• Mapping Column: Map the input column accordingly.
Data Flow Transformation
1. Character Map
The Character Map transformation applies string functions, such as conversion from lowercase to uppercase, to character data. This transformation operates only on column data with a string data type.
You configure the Character Map transformation in the following ways:
• Specify the columns to convert.
• Specify the operations to apply to each column.
Properties:
Select input columns to which you want to apply string operations. i.e Lower to upper and vice versa.
2. Conditional Splits
Evaluates and direct rows to Database. The Conditional Split transformation can route data rows to different outputs depending on the content of the data. The implementation of the Conditional Split transformation is similar to a CASE decision structure in a programming language. The transformation evaluates expressions, and based on the results, directs the data row to the specified output.
Properties:
Specify the any condition used to direct the input rows to specific output.
You can configure the Conditional Split transformation in the following ways:
• Provide an expression that evaluates to a Boolean for each condition you want the transformation to test.
• Specify the order in which the conditions are evaluated. Order is significant, because a row is sent to the output corresponding to the first condition that evaluates to true.
• Specify the default output for the transformation. The transformation requires that a default output be specified.
3. Data Conversion
The Data Conversion transformation converts the data in an input column to a different data type and then copies it to a new output column. For example, a package can extract data from multiple sources, and then use this transformation to convert columns to the data type required by the destination data store. You can apply multiple conversions to a single input column.
Properties:
Specify the data type to particular column to which you want to change.
You can configure the Data Conversion transformation in the following ways:
• Specify the columns that contain the data to convert and the types of data conversions to perform.
• Set the column length of string data and the precision and scale on numeric data.
• If the length of an output column of string data is shorter than the length of its corresponding input column, the output data is truncated
4. Derived columns
The Derived Column transformation creates new column values by applying expressions to transformation input columns. An expression can contain any combination of variables, functions, operators, and columns from the transformation input. The result can be added as a new column or inserted into an existing column as a replacement value.
Properties:
5. Look Up
Look up Transformation Joins the input data set to the reference table, view or row set created by a SQL statement to lookup corresponding values. If some rows in the input data do not have corresponding rows in the lookup table then you must redirect such rows to a different output
Properties:
General Tab: You can configure the Lookup transformation to use the partial cache or no cache mode:
• Partial Cache
the rows with matching entries in the reference dataset and, optionally, the rows without matching entries in the dataset are stored in cache.
• No Cache
No data is loaded into cache whether you select partial cache or no cache; you use an OLE DB connection manager to connect to the reference dataset.
Connection: Specify the connection manager i.e Oledb connection manager and also select destination table or view or Sql query.
Columns: Map columns you need from inputs. The Lookup transformation automatically maps columns that have the same name and the same data type. Columns must have matching data types to be mapped.
6. Merge
The Merge transformation combines two sorted datasets into a single dataset. The Merge Transformation requires sorted data for its inputs. The Merge transformation also requires that the merged columns in its inputs have matching metadata. If the data has a string data type, the length of the column in the second input must be less than or equal to the length of the column in the first input with which it is merged
By including the Merge transformation in a data flow, you can perform the following tasks:
• Merge data from two data sources, such as tables and files.
• Create complex datasets by nesting Merge transformations.
• Remerge rows after correcting errors in the data.
Properties:
When to Use merge:
when you have to Union two very large sorted datasets, need the output to keep that sort, and cannot afford the memory overhead of a blocking Sort component after the Union which will have to receive all the data before it can do that sort.
7. Merge Join
The Merge Join transformation provides an output that is generated by joining two sorted datasets using a FULL, LEFT, or INNER join. The Merge Join Transformation requires sorted data for its inputs. The Merge Join transformation requires that the joined columns have matching metadata.
You can configure the Merge Join transformation in the following ways:
• Specify the join is a FULL, LEFT, or INNER join.
• Specify the columns the join uses.
• Specify whether the transformation handles null values as equal to other nulls.
Properties:
8. Multicast
The Multicast transformation distributes its input to one or more outputs. This transformation is similar to the Conditional Split transformation. Both transformations direct an input to multiple outputs. The difference between the two is that the Multicast transformation directs every row to every output, and the Conditional Split directs a row to a single output. Using the Multicast transformation, a package can create logical copies of data.
9. OleDb Command
The OLE DB Command transformation runs an SQL statement for each row in a data flow. For example, you can run an SQL statement that inserts, updates, or deletes rows in a database table.
Properties:
Typically, the SQL statement includes parameters. The parameter values are stored in external columns in the transformation input, and mapping an input column to an external column maps an input column to a parameter.
10. Row Count
The Row Count transformation counts rows as they pass through a data flow and stores the final count in a variable.
Properties:
Here we need to add variable to store the counts of the rows.
A SSIS package supports two types of variable:
• System Variable: These variables contain package, task & transformation metadata such as name, Id
• User Variable: These variables are created by package designer and can contain any desire data stored in variety of data types.
Add variables
Select particular variable which we have created from the rowcount editor.
11. Slowly Changed Dimension
The Slowly Changing Dimension transformation coordinates the updating and inserting of records in data warehouse dimension tables.
The term slowly changing dimensions encompasses the following three different methods for handling changes to columns in a data warehouse dimension table:
• Type 0 – (Fixed Attribute) when the values in the column should not be changing. Fixed attribute changes indicate the column value must not change.
• Type 1 – (Changing Attribute) Update the columns in the dimension row without preserving any change history. Changing attribute changes overwrite existing records.
• Type 2- (Historical Attribute) preserve the change history in the dimension table and create a new row when there are changes. Historical attribute changes create new records instead of updating existing ones.
At run time, the Slowly Changing Dimension transformation first tries to match the incoming row to a record in the lookup table. If no match is found, the incoming row is a new record; therefore, the Slowly Changing Dimension transformation performs no additional work, and directs the row to New Output.
If a match is found, the Slowly Changing Dimension transformation detects whether the row contains changes. If the row contains changes, the Slowly Changing Dimension transformation identifies the update type for each column and directs the row to the Changing Attributes Updates Output, Fixed Attribute Output, Historical Attributes Inserts Output.
Properties:
Specify the connection manager and table then mention which is the business key of that particular table. The Slowly Changing Dimension transformation requires at least one business key column. The Slowly Changing Dimension transformation does not support null business keys.
Specify the columns which should be of type 0, type 1 and type 2.
Store the particular results of historical attribute to an existing column or new column in destination table. i.e below E.g. : “status” a new column added to destination table and also specify the value current or true.
Example of SDC
Alternate to SCD is CDC
SCD is very slowly in performance so alternative to SCD is CDC (concept of data capturing). We can achieve functionality of SDC in CDC by implementing lookup and conditional split. Below is the example of implementing SDC through CDC.
Output Comparison for both SCD & CDC:
Table Source table and Destination table
12. Sort
The Sort transformation sorts input data in ascending or descending order and copies the sorted data to the transformation output. You can apply multiple sorts to an input; each sort is identified by a numeral that determines the sort order. The column with the lowest number is sorted first; the sort column with the second lowest number is sorted next, and so on.
Properties:
Specify the sort type for particular column.
13. Union All
The Union All transformation combines multiple inputs into one output. The transformation inputs are added to the transformation output one after the other; no reordering of rows occurs.
The Merge transformation is similar to the Union All transformations. Use the Union All transformation instead of the Merge transformation in the following situations:
• The transformation inputs are not sorted.
• The combined output does not need to be sorted.
• The transformation has more than two inputs.
Difference between Merge & Union All
• Merge takes input from two Data Sources whereas union all takes input from more than two Data Sources.
• Before merge data should be sorted whereas in union all no such condition.
14. Term Lookup
The Term Lookup transformation matches terms extracted from text in a transformation input column with terms in a reference table. It then counts the number of times a term in the lookup table occurs in the input data set, and writes the count together with the term from the reference table to columns in the transformation output. This transformation is useful for creating a custom word list based on the input text, complete with word frequency statistics.
The Term Lookup transformation adds two columns to the transformation output, named by default Term and Frequency. Term contains a term from the lookup table and Frequency contains the number of times the term in the reference table occurs in the input data set.
• The value 0 indicates the column is passed through to the output only and is not used in the lookup.
• The value 1 indicates the column is used in the lookup only.
• The value 2 indicates the column is passed through to the output, and is also used in the lookup.
Properties:
Specify the destination table with connection manager.
Select the column to which you need to perform the operation. Eg: Below It will compare the firstname of source table of firstsname of destination table for an exact match of text/word.
Example:
Source Table and Destination Table here when we apply this transformation we will get the exact match of two tables.
Output:
15. Tern Extraction
The Term Extraction transformation extracts terms from text in a transformation input column, and then writes the terms to a transformation output column. The transformation works only with English text and it uses its own English dictionary and linguistic information about English.
You can use the Term Extraction transformation to discover the content of a data set. For example, text that contains e-mail messages may provide useful feedback about products, so that you could use the Term Extraction transformation to extract the topics of discussion in the messages, as a way of analyzing the feedback.
The Term Extraction transformation also stems nouns to extract only the singular form of a noun. For example, the transformation extracts man from men, mouse from mice, and bicycle from bicycles. The transformation uses its dictionary to stem nouns. Gerunds are treated as nouns if they are in the dictionary.
The output of the Term Extraction transformation includes only two columns. One column contains the extracted terms and the other column contains the score. The default names of the columns are Term and Score.
The Term Extraction transformation generates a score for each term that it extracts. The score can be either a TFIDF value or the raw frequency, meaning the number of times the normalized term appears in the input. In either case, the score is represented by a real number that is greater than 0. For example, the TFIDF score might have the value 0.5, and the frequency would be a value like 1.0 or 2.0.
Properties:
Here you need specify the term need to be extracted.
As for using Term Extraction and Term Lookup transformations in Text Mining, here are the scenarios we envision:
1. Key term extraction from texts
Given a collection of texts, we can use the Term Extraction transformation to extract "key terms" from the texts. We sort the extracted terms by their scores and use the top N terms as key terms. Thus we get a rush sense about the contents of the texts.
For example, if the texts are emails from customers to a company, then by using these transformations, it is possible to find the topics discussed in the emails, and to take business actions based on the customer feedback.
2. Automatic categorization of texts
Given a collection of texts, we can use the transformations to extract terms from the texts (using Term Extraction), and use them as features of the texts (using Term Lookup to feature the texts). In this way, each text is represented as a feature vector. We can then use the Data Mining tools in SQL Server (Decision Tree or Naive Bayes) to train a mining model, and conduct automatic classification of new texts by using the model.
For example, if the texts are customer feedback, then we can use the tools to perform automatic, routine categorization of the feedback into a number of predetermined categories.
3. Analysis of texts (similar to previous, but different goals and algorithms)
Given a collection of texts, we can use the transformations to extract terms from the texts (using Term Extraction), and use them as features of the texts (using Term Lookup to feature the texts). In this way, each text is represented as a feature vector. We can then use the Data Mining tools in SQL Server (Clustering or Association Rule) to train a mining model, and analyze the contents of the texts.
For example, if the texts are survey results from a Web site, we can use these tools to conduct analysis of the survey results. For example, by grouping similar opinions together in a single cluster, and by determining how many clusters there are.
4. Combining text mining data with tabular data
If one column in a database table contains texts and the other columns contain structured tabular data, then we can extract terms from the text column and add the extracted terms in additional columns. The additional columns and the existing columns become a larger table.
For example, imagine a table in which each row represents information about a person - age, gender, and so forth. It also contains the text of a message from the person. The terms extracted from the message can become additional "properties" of the person.
16. Fuzzy Lookup
The Fuzzy Lookup transformation performs data cleaning tasks such as standardizing data, correcting data, and providing missing values. The Fuzzy Lookup transformation differs from the Lookup transformation in its use of fuzzy matching. The Lookup transformation uses an equi-join to locate matching records in the reference table. It returns either an exact match or nothing from the reference table. In contrast, the Fuzzy Lookup transformation uses fuzzy matching to return one or more close matches from the reference table.
A lookup becomes Fuzzy when it can match to records that are similar, but not identical to, the lookup key. For example, it can match “Jon Smith” to “John Smith” – this component is very useful for helping consolide client data such as names and addresses from multiple systems.
An important thing to get to grips with is that when the Fuzzy Lookup outputs its matches, it also outputs a Similarity and Confidence score for the match. The Similarity score is easy to understand – it’s a measure on a 0-1 scale of how similar the matched item is to the lookup key. So for example, “John Smith” and “Jon Smith” rate a 0.89, whereas “John Smith” and “James Smith” rate a lower 0.62. The Confidence score is on the same scale but not fixed for a given match – essentially the higher it is, the lower the number of alternative matches it has found. So from my example, “Johnathan Fruitbat” has just two matches, and the highest Confidence is 0.38. In contrast “John James Walliams” has 15 matches and the highest Confidence is just 0.08.
A Fuzzy Lookup transformation frequently follows a Lookup transformation in a package data flow. First, the Lookup transformation tries to find an exact match. If it fails, the Fuzzy Lookup transformation provides close matches from the reference table. Basically term lookup works similar to soundex() function in Sql server.
Properties:
Fuzzy Lookup Indexes
In Fuzzy Lookup Specify the Destination connection manager to referenced table and option for the index that transformation uses. You can select the new index and can stored that Index or can select the existing index. Like a normal lookup, you need an input table and a lookup reference table. However in this case the lookup reference is a called an Index (this is a bit misleading as it is not an Index in the normal database sense of the word). These indexes store fragments of the lookup items for fuzzy searching. When you use a Fuzzy lookup you have 4 options on working with these indexes.
1. Create a new index every time (”Generate New Index”)
2. Create a new index and store it (as above, plus check “Store New Index” and provide a name for it)
3. Create a new index with maintenance features (as above, plus also check “Maintain Stored Index”)
4. Reuse an existing index (”Use Existing Index”, and select it)
Advanced settings
First is “Maximum number of matches to output per lookup”. This allows you to constrain the number of matches the fuzzy lookup will return per input key. So again referring to my example, if I set this to 10, even though “John James Walliams” has 15 possible matches, only 10 would be returned. “Johnathan Fruitbat” would still send back just 2 matches, as there are only two to return. An important takeaway here is that the Fuzzy Lookup does not return one row per input row – it can return many, or even zero.
Second is “Similarity Threshold” – this is a sliding scale running from 0 to 1 which sets a cut-off for the quality of returns from Fuzzy Lookup. In practice you rarely want to set this much below 0.75 – but you will need to do a few test runs to see which level is appropriate for your data.
Third is “Token Delimiters” – this is one of the means the Fuzzy Lookup uses to break your lookup keys into smaller pieces for fuzzy matching. You would only add to or remove from these if your data has special characters delimiting (or not delimiting) items within the data.
Example:
EG: If the Source table contains firstname as “preethi” and the destination table contains firstname as “prithi” so lookup will compare the firstname of source table with firstname with firstname of destination table. First it will check for the exact match, if not the closest match like Preethi is similar sound with prithi.
Output:
17. Fuzzy Grouping
Fuzzy grouping develops a master set of unique records from a data set containing a combination of unique and duplicate records. The Fuzzy Grouping transformation performs data cleaning tasks by identifying rows of data that are likely to be duplicates and selecting a canonical row of data to use in standardizing the data.
The transformation output includes all input columns, one or more columns with standardized data, and a column that contains the similarity score. The score is a decimal value between 0 and 1. The canonical row has a score of 1. Other rows in the fuzzy group have scores that indicate how well the row matches the canonical row. The closer the score is to 1, the more closely the row matches the canonical row. If the fuzzy group includes rows that are exact duplicates of the canonical row, these rows also have a score of 1. The transformation does not remove duplicate rows; it groups them by creating a key that relates the canonical row to similar rows.
The transformation produces one output row for each input row, with the following additional columns:
• _key_in, a column that uniquely identifies each row.
• _key_out, a column that identifies a group of duplicate rows. The _key_out column has the value of the _key_in column in the canonical data row. Rows with the same value in _key_out are part of the same group. The _key_out value for a group corresponds to the value of _key_in in the canonical data row.
• _score, a value between 0 and 1 that indicates the similarity of the input row to the canonical row. 1 indicates the exact match otherwise _score differs.
EXAMPLE:
Employee Table with following data with three rows as a duplicates
Below is my data flow task with Empolyee table as a source and I want to group the duplications and also group the similar records with exact match with similar match.
The first tab, Connection Manager is used to assign the connection. The next tab, columns is to set the columns that you need to group.
Then In column tab
There are two parameters which are important when configuring fuzzy grouping - Numerals and Comparison Flags.
The numerals option allows confuguration of the significance of numbers in the input stream when grouping text logically. The options are to consider leading, trailing , leading and trailing, or neither leading nor trailing numbers significant. This option would need to be considered when comparing address or similar types of information. Comparison flags have a few more options for grouping
Next is advanced tab
The Advanced tab contains the settings for controlling the fuzzy logic algorithms that assign groupings to text in the input stream. You can set the names of the three additional fields that will be added automatically to the output of this transformation. These fileds are named “_key_out”, “_key_in” and “_score” by default. As the rows flow through the pipeline, Fuzzy Grouping assigns an ID to each of them, which is the “_key_in”. When Fuzzy Grouping groups a collection of rows, it determines which row should be the representative. It then assigns all the “_key_out” columns of the rows in the group to the “_key_in” value of the representative. As a result, if you want to write only the representative rows to your output, you can filter the Fuzzy Grouping output through a conditional split transform that selects only rows where “_key_in” equals “_key_out”.
Output:
18. Import Column
The Import Column transformation reads data from files and adds the data to columns in a data flow. Using this transformation, a package can add text and images stored in separate files to a data flow.
The Import Column transformation is used to load binary data (photos, documents, media etc.) from the file system into a Data Flow. From the Data Flow it can then be loaded into a table, moved to a different location, modified – whatever you need to do.
Example with properties
Create particular table
CREATE TABLE [dbo].[ImportedFile](
[FileID] [int] IDENTITY(1,1) NOT NULL,
[Filename] [nvarchar](255) NOT NULL,
[File] [image] NOT NULL,
CONSTRAINT [PK__ImportedFile__7E6CC920] PRIMARY KEY CLUSTERED
(
[FileID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
In Script Component specify the connection manager location from where you are retrieving set of files from particular folder.
Edit the script and write down the below code in particular method
Public Overrides Sub CreateNewOutputRows()
Dim fileName As String
For Each fileName In Directory.GetFiles(Me.Connections.ImportFilesDir.AcquireConnection(Nothing).ToString())
Output0Buffer.AddRow()
Output0Buffer.Filename = fileName
Next
Output0Buffer.SetEndOfRowset()
End Sub
Select the input column that provides the file name and path of the binary data so the component can find the file it is going to import.
Edit the Import Column in input tab map the particular input and usage type.
Finally specify the destination location into which you need to place these file.
Output:
Particular files are stored in ImportedFile table
19. Export Column
The Export Column transformation reads data in a data flow and inserts the data into a file. For example, if the data flow contains product information, such as a picture of each product, you could use the Export Column transformation to save the images to files.
It exports Binary Data – which means things like Images, Documents and other media – which have been stored in a relational database. It exports them out to the file system.
Properties:
There’s not much to configure – you need to decide which column(s) you are going to export to the file system, and which column you will use to specify the full file path and file name (you cannot just set a folder globally and use a file name only).
The transformation uses pairs of input columns: One column contains a file name, and the other column contains data. Each row in the data set can specify a different file. As the transformation processes a row, the data is inserted into the specified file. At run time, the transformation creates the files, if they do not already exist, and then the transformation writes the data to the files.
• Allow Append – Will create a new file if none exists, otherwise it will add the data to the end of an existing file
• Force Truncate – If a file exists, it will overwrite it
Append Truncate File exists Results
False False No The transformation creates a new file and writes the data to the file.
True False No The transformation creates a new file and writes the data to the file.
False True No The transformation creates a new file and writes the data to the file.
True True No The transformation fails design time validation. It is not valid to set both properties to true.
False False Yes A run-time error occurs. The file exists, but the transformation cannot write to it.
False True Yes The transformation deletes and re-creates the file and writes the data to the file.
True False Yes The transformation opens the file and writes the data at the end of the file.
True True Yes The transformation fails design time validation. It is not valid to set both properties to true.
20. Script Component
The Script Component allows us to modify the data in a data flow path using managed code and we can use it to generate surrogate keys. The Script component hosts script and enables a package to include and run custom script code. The Script component provides an easy and quick way to include custom functions in a data flow. However, if you plan to reuse the script code in multiple packages, you should consider programming a custom component instead of using the Script component. The Script component can be used as a source, a transformation, or a destination. This component supports one input and multiple outputs. Depending on how the component is used, it supports either an input or outputs or both.
The script is invoked by every row in the input or output.
• If used as a source, the Script component supports multiple outputs.
• If used as a transformation, the Script component supports one input and multiple outputs.
• If used as a destination, the Script component supports one input.
When you drag a script component from the toolbox to the design surface you will be prompted as to whether the component is to be a source adapter, a destination adapter or a transformation. Select Transformation.
Then after that, specify the connection manager and edit the script by writing the code in window appears.
Quick reference to list of Transformations
Transformation Description Examples of when Transformation Would be Used
Aggregate Calculates aggregations such as SUM, COUNT, AVG, MIN and MAX based on the values of a given numeric column. This transformation produces additional output records. Adding aggregated information to your output. This can be useful for adding totals and sub-totals to your output.
Audit Includes auditing information, such as computer name where the package runs, package version ID, task name, etc. in the data flow. Creates advanced logs which indicate where and when the package was executed, how long it took to run the package and the outcome of execution.
Character Map Performs minor manipulations on string columns. Converts all letters to uppercase, lowercase, reverse bytes, etc. Applying string manipulations prior to loading data into the data warehouse. You can also apply the same manipulations to the data while it is being loaded into the warehouse.
Conditional Split Accepts an input and determines which destination to pipe the data into based on the result of an expression. Cleansing the data to extract specific rows from the source. If a specific column does not conform to the predefined format (perhaps it has leading spaces or zeros), move such records to the error file.
Copy Column Makes a copy of a single or multiple columns which will be further transformed by subsequent tasks in the package. Extracting columns that need to be cleansed of leading / trailing spaces, applying character map transformation to uppercase all data and then load it into the table.
Data Conversion Converts input columns from one data type to another. Converting columns extracted from the data source to the proper data type expected by the data warehouse. Having such transformation options allows us the freedom of moving data directly from its source into the destination without having an intermediary staging database.
Data Mining Query Queries a data mining model. Includes a query builder to assist you with development of Data Mining expressions (DMX) prediction queries. Evaluating the input data set against a data mining model developed with Analysis Services.
Derived Column Calculates new column value based on an existing column or multiple columns. Removing leading and trailing spaces from a column. Add title of courtesy (Mr., Mrs., Dr. , etc.) to the name.
Export Column Exports contents of large columns (TEXT, NTEXT, IMAGE data types) into files. Saving large strings or images into files while moving the rest of the columns into a transactional database or data warehouse.
Fuzzy Grouping Finds close or exact matches between multiple rows in the data source. Adds columns to the output including the values and similarity scores. Cleansing data by translating various versions of the same value to a common identifier. For example, "Dr.", "Dr.", "doctor", "M.D." should all be considered equivalent.
Fuzzy Lookup Compares values in the input data source rows to values in the lookup table. Finds the exact matches as well as those values that are similar. Cleansing data by translating various versions of the same value to a common identifier. For example, "Dr.", "Dr.", "doctor", "M.D." should all be considered equivalent.
Import Column Imports contents of a file and appends to the output. Can be used to append TEXT, NTEXT and IMAGE data columns to the input obtained from a separate data source. This transformation could be useful for web content developers. For example, suppose you offer college courses online. Normalized course meta-data, such as course_id, name, and description is stored in a typical relational table. Unstructured course meta-data, on the other hand, is stored in XML files. You can use Import Column transformation to add XML meta-data to a text column in your course table.
Lookup Joins the input data set to the reference table, view or row set created by a SQL statement to lookup corresponding values. If some rows in the input data do not have corresponding rows in the lookup table then you must redirect such rows to a different output. Obtaining additional data columns. For example, the majority of employee demographic information might be available in a flat file, but other data such as department where each employee works, their employment start date and job grade might be available from a table in relational database.
Merge Merges two sorted inputs into a single output based on the values of the key columns in each data set. Merged columns must have either identical or compatible data types. For example you can merge VARCHAR(30) and VARCHAR(50) columns. You cannot merge INT and DATETIME columns. Combining the columns from multiple data sources into a single row set prior to populating a dimension table in a data warehouse. Using Merge transformation saves the step of having a temporary staging area. With prior versions of SQL Server you had to populate the staging area first if your data warehouse had multiple transactional data sources.
Merge Join Joins two sorted inputs using INNER JOIN, LEFT OUTER JOIN or FULL OUTER JOIN algorithm. You can specify columns used for joining inputs. Combining the columns from multiple data sources into a single row set prior to populating a dimension table in a data warehouse. Using Merge Join transformation saves the step of having a temporary staging area. With prior versions of SQL Server you had to populate the staging area first if your data warehouse had multiple transactional data sources.
Note that Merge and Merge Join transformations can only combine two data sets at a time. However, you could use multiple Merge Join transformations to include additional data sets.
Multicast Similar to the conditional split transformation, but the entire data set is piped to multiple destinations. Populating the relational warehouse as well as the source file with the output of a derived column transformation.
OLEDB Command Runs a SQL command for each input data row. Normally your SQL statement will include a parameter (denoted by the question mark), for example: UPDATE employee_source SET has_been_loaded=1 WHERE employee_id=? Setting the value of a column with BIT data type (perhaps called "has_been_loaded") to 1 after the data row has been loaded into the warehouse. This way the subsequent loads will only attempt importing the rows that haven't made it to the warehouse as of yet.
Percentage Sampling Loads only a subset of your data, defined as the percentage of all rows in the data source. Note that rows are chosen randomly. Limiting the data set during development phases of your project. Your data sources might contain billions of rows. Processing cubes against the entire data set can be prohibitively lengthy.
If you're simply trying to ensure that your warehouse functions properly and data values on transactional reports match the values obtained from your Analysis Services cubes you might wish to only load a subset of data into your cubes.
Pivot Pivots the normalized data set by certain column to create a more easily readable output. Similar to PIVOT command in Transact-SQL. You can think of this transformation as converting rows into columns. For example if your input rows have customer, account number and account balance columns the output will have the customer and one column for each account. Creating a row set that displays the table data in a more user-friendly format. The data set could be consumed by a web service or could be distributed to users through email.
Row count Counts the number of transformed rows and store in a variable. Determining the total size of your data set. You could also execute a different set of tasks based on the number of rows you have transformed. For example, if you increase the number of rows in your fact table by 5% you could perform no maintenance. If you increase the size of the table by 50% you might wish to rebuild the clustered index.
Row sampling Loads only a subset of your data, defined as the number of rows. Note that rows are chosen randomly. Limiting the data set during development phases of your project. Your data warehouse might contain billions of rows. Processing cubes against the entire data set can be prohibitively lengthy.
If you're simply trying to ensure that your warehouse functions properly and data values on transactional reports match the values obtained from your Analysis Services cubes you might wish to only load a subset of data into your cubes.
Script Component Every data flow consists of three main components: source, destination and transformation. Script Component allows you to write transformations for otherwise un-supported source and destination file formats. Script component also allows you to perform transformations not directly available through the built-in transformation algorithms.
Custom transformations can call functions in managed assemblies, including .NET framework. This type of transformation can be used when the data source (or destination) file format cannot be managed by typical connection managers. For example, some log files might not have tabular data structures. At times you might also need to parse strings one character at a time to import only the needed data elements.
Much like Script Task the Script Component transformation must be written using Visual Basic .NET.
Slowly Changing Dimension Maintains historical values of the dimension members when new members are introduced. Useful for maintaining dimension tables in a data warehouse when maintaining historical dimension member values is necessary.
Sort Sorts input by column values. You can sort the input by multiple columns in either ascending or descending order. The transformation also allows you to specify the precedence of columns used for sorting. This transformation could also discard the rows with duplicate sort values. Ordering the data prior to loading it into a data warehouse. This could be useful if you're ordering your dimension by member name values as opposed to sorting by member keys.
You can also use Sort transformation prior to feeding the data as the input to the Merge Join or Merge transformation.
Term Extraction Extracts terms (nouns and noun phrases) from the input text into the transformation output column. Processing large text data and extracting main concepts. For example, you could extract the primary terms used in this section of SQLServerPedia by feeding the Term Extraction transformation the text column containing the entire section.
Term Lookup Extracts terms from the input column with TEXT data type and match them with same or similar terms found in the lookup table. Each term found in the lookup table is scanned for in the input column. If the term is found the transformation returns the value as well as the number of times it occurs in the row. You can configure this transformation to perform case-sensitive search. Analyzing large textual data for specific terms. For example, suppose you accept email feedback for latest version of your software. You might not have time to read through every single email messages that comes to the generic inbox. Instead you could use this task to look for specific terms of interest.
Union ALL Combines multiple inputs into a single output. Rows are sorted in the order they're added to the transformation. You can ignore some columns from each output, but each output column must be mapped to at least one input column. Import data from multiple disparate data sources into a single destination. For example, you could extract data from mail system, text file, Excel spreadsheet and Access database and populate a SQL Server table.
Unlike Merge and Merge Join transformations Union ALL can accept more than two inputs.
Unpivot
Opposite of Pivot transformation, Unpivot coverts columns into rows. It normalizes the input data set that has many duplicate values in multiple columns by creating multiple rows that have the same value in a single column.
For example if your input has a customer name and a separate column for checking and savings' accounts Unpivot can transform it into a row set that has customer, account and account balance columns. Massaging a semi-structured input data file and convert it into a normalized input prior to loading data into a warehouse.
Event Handlers:
SSIS packages also support a complete event system. You can attach event handlers to a variety of events for the package itself or for the individual tasks within a package. Events within a package “bubble up.” That is, suppose an error occurs within a task inside of a package. If you've defined an OnError event handler for the task, then that event handler is called. Otherwise, an OnError event handler for the package itself is called. If no event handler is defined for the package either, the event is ignored.
Event handlers are defined on the Event Handlers tab of the Package Designer. When you create an event handler, you handle the event by building an entire secondary SSIS package, and you have access to the full complement of data flows, control flows, and event handlers to deal with the original event.
The event handlers provided with are:
• OnError: The event handler for the OnError event. This event is raised by an executable when an error occurs.
• OnExecStatusChanged: The event handler for the OnExecStatusChanged event. This event is raised by an executable when its execution status changes.
• OnInformation: The event handler for the OnInformation event. This event is raised during the validation and execution of an executable to report information. This event conveys information only, no errors or warnings.
• OnPostExecute: The event handler for the OnPostExecute event. This event is raised by an executable immediately after it has finished running.
• OnPostValidate: The event handler for the OnPostValidate event. This event is raised by an executable when its validation is finished.
• OnPreExecute: The event handler for the OnPreExecute event. This event is raised by an executable immediately before it runs.
• OnPreValidate: The event handler for the OnPreValidate event. This event is raised by an executable when its validation starts.
• OnProgress: The event handler for the OnProgress event. This event is raised by an executable when measurable progress is made by the executable.
• OnQueryCancel: The event handler for the OnQueryCancel event. This event is raised by an executable to determine whether it should stop running.
• OnTaskFailed: The event handler for the OnTaskFailed event. This event is raised by a task when it fails.
• OnWarning: The event handler for the OnWarning event. This event is raised by an executable when a warning occurs.
Package Explorer :
The Package Explorer tab provides an Explorer -type view of package content. The view is built as you construct the package and provides a great way to understand the structure of the package the expanded view of a fairly basic package; it has only one executable (Run SQL Statement) one connection manager (LocalHost.DatabaseName), and no user-defined variables. You can imagine how important this view is to understanding and communicating to others the structure of complex packages!
Progress Execution
Execution Results tab is used to see detailed information of the package.
Package Properties
When you create a package in Business Intelligence Development Studio by using the graphical interface that Integration Services provides, you set the properties of the package object in the Properties window.
The Properties window provides a categorized and alphabetical list of properties. To arrange the Properties window by category, click the Categorized icon.
Properties by Category
When arranged by category, the Properties window groups properties in the following categories:
• Checkpoints
• Execution
• Forced Execution Value
• Identification
• Misc
• Security
• Transactions
• Version
Checkpoints
You can use the properties in this category to restart the package from a point of failure in the package control flow, instead of rerunning the package from the beginning of its control flow.
Property Description
CheckpointFileName The name of the file that captures the checkpoint information that enables a package to restart. When the package finishes successfully, this file is deleted.
CheckpointUsage Specifies when a package can be restarted. The values are Never, IfExists, and Always. The default value of this property is Never, which indicates that the package cannot be restarted.
SaveCheckpoints Specifies whether the checkpoints are written to the checkpoint file when the package runs. The default value of this property is False.
Steps:
• First Change Package level properties CheckpointFileName to path of your text file, CheckpointUsage to IfExists and SaveCheckpoints to true.
• Then change the each task level properties select data flow task in package go to properties and set FailPackageOnFailure to true and FailParentOnError to true.
Example:
To see the how checkpoints works let’s take not execution package
Right click the package properties
You can specify the each task level or container level properties if that particulat task or container fails to execute then it will run task or container where its stops.
Right click the Task or container then set below properties. Change the each task level properties select data flow task in package go to properties and set FailPackageOnFailure to true and FailParentOnError to true.
Output:
See its start execution from where its stops.
Execution
The properties in this category configure the run-time behavior of the package object.
Property Description
DelayValidation Indicates whether package validation is delayed until the package runs. The default value for this property is False.
Disable Indicates whether the package is disabled. The default value of this property is False.
DisableEventHandlers Specifies whether the package event handlers run. The default value of this property is False.
FailPackageOnFailure Specifies whether the package fails if an error occurs in a package component. The only valid value of this property is False.
FailParentOnError Specifies whether the parent container fails if an error occurs in a child container. The default value is of this property is False.
MaxConcurrentExecutables The number of executable files that the package can run concurrently. The default value of this property is -1, which indicates that there is no limit.
MaximumErrorCount The maximum number of errors that can occur before a package stops running. The default value of this property is 1.
PackagePriorityClass The Win32 thread priority class of the package thread. The values are Default, AboveNormal, Normal, BelowNormal, Idle. The default value of this property is Default. For more information, see DTSPriorityClass.
Forced Execution Value
The properties in this category configure an optional execution value for the package.
Property Description
ForcedExecutionValue If ForceExecutionValue is set to True, a value that specifies the optional execution value that the package returns. The default value of this property is 0.
ForcedExecutionValueType The data type of ForcedExecutionValue. The default value of this property is Int32.
ForceExecutionValue A Boolean value that specifies whether the optional execution value of the container should be forced to contain a particular value. The default value of this property is False.
Identification
The properties in this category provide information such as the unique identifier and name of the package.
Property Description
CreationDate The date that the package was created.
CreatorComputerName The name of the computer on which the package was created.
CreatorName The name of the person who created the package.
Description A description of package functionality.
ID The package GUID, which is assigned when the package is created. This property is read-only. To generate a new random value for the ID property, select
Name The name of the package.
PackageType The package type. The values are Default, DTSDesigner, DTSDesigner100, DTSWizard, SQLDBMaint, and SQLReplication. The default value of this property is Default. For more information, see DTSPackageType.
Security
The properties in this category are used to set the protection level of the package
Property Description
PackagePassword Specify the package password
ProtectionLevel The protection level of the package. The values are DontSaveSensitive, EncryptSensitiveWithUserKey, EncryptSensitiveWithPassword, EncryptAllWithPassword, and ServerStorage. The default value of this property is EncryptSensitiveWithUserKey.
Version
The properties in this category provide information about the version of the package object.
Property Description
VersionBuild The version number of the build of the package.
VersionComments Comments about the version of the package.
VersionGUID The GUID of the version of the package. This property is read-only.
VersionMajor The latest major version of the package.
VersionMinor The latest minor version of the package.
Implementing Logging in Packages
Integration Services includes logging features that write log entries when run-time events occur and can also write custom messages. Integration Services supports a diverse set of log providers, and gives you the ability to create custom log providers. The Integration Services log providers can write log entries to text files, SQL Server Profiler, SQL Server, Windows Event Log, or XML files. Logs are associated with packages and are configured at the package level. Each task or container in a package can log information to any package log. The tasks and containers in a package can be enabled for logging even if the package itself is not. For example, you can enable logging on an Execute SQL task without enabling logging on the parent package.
Step1: Right click the control area in your BISD
Step2: Configure SSIS log dialogue box appears
In Providers and Logs tabs
In Details tabs select OnError click ok
Step3: save the file in some location so you can configure the file location
Integration Services supports log entries on predefined events and provides custom log entries for many Integration Services objects. The Configure SSIS Logs dialog box in SSIS Designer lists these events and custom log entries.
The following table describes the predefined events that can be enabled to write log entries when run-time events occur. These log entries apply to executable, the package, and the tasks and containers that the package includes. The name of the log entry is the same as the name of the run-time event that was raised and caused the log entry to be written.
Events Description
OnError Writes a log entry when an error occurs.
OnExecStatusChanged Writes a log entry when the execution status of the executable changes.
OnInformation Writes a log entry during the validation and execution of an executable to report information.
OnPostExecute Writes a log entry immediately after the executable has finished running.
OnPostValidate Writes a log entry when the validation of the executable finishes.
OnPreExecute Writes a log entry immediately before the executable runs.
OnPreValidate Writes a log entry when the validation of the executable starts.
OnProgress Writes a log entry when measurable progress is made by the executable.
OnQueryCancel Writes a log entry at any juncture in the task processing where it is feasible to cancel execution.
OnTaskFailed Writes a log entry when a task fails.
OnVariableValueChanged Writes a log entry when the value of a variable changes.
OnWarning Writes a log entry when a warning occurs.
PipelineComponentTime For each data flow component, writes a log entry for each phase of validation and execution. The log entry specifies the processing time for each phase.
Diagnostic Writes a log entry that provides diagnostic information.
For example, you can log a message before and after every call to an external data provider
SSIS Package Configuration
As we develop, test and deploy SSIS packages, we need to specify different parameter values for things like database connection strings, file and folder paths, etc. without having to edit the individual SSIS packages to make these changes. SSIS provides several options for handling package configuration values such as environment variables, XML files, registry settings, and a table in a SQL Server database. In each case you are storing and maintaining configuration parameter values outside of the SSIS package then using those stored values when executing the package. Before we walk through the steps to setup SQL Server package configuration in an SSIS package, there are two preliminary things that need to be done. First, you need to decide on the database server and database name to store the configuration data; for our example we will use the local server and SSISConfig as the database name. While not a requirement to store the SQL Server package configuration in its own database, it is usually a good idea to do so. Second, you should setup an environment variable that holds the database connection string to the package configuration database on the machine(s) where your SSIS package will run. We will configure the SSIS package to get the database connection string for the package configuration database from the environment variable, allowing us the flexibility to change the configuration database as the SSIS package goes through the development, testing and deployment phases.
Now we are ready to setup an SSIS package to use SQL Server package configuration. To begin, open Business Intelligence Developer Studio (BIDS) or Visual Studio. We will perform the following steps:
• Create a new Integration Services project
• Add a new SSIS package to the project; the steps following are performed in this package
• Add Connection Managers for the SSISConfig and AdventureWorks databases (or any other database of your choice)
• Add an environment variable package configuration to assign the SSISConfig database ConnectionString property
• Add a SQL Server package configuration to assign the AdventureWorks database ConnectionString property
Deploying SSIS Project
Microsoft SQL Server Integration Services provides tools that make it easy to deploy packages to another computer. The deployment tools also manage many dependencies, such as configurations and files that the package needs. In this tutorial, you will learn how to use these tools to install packages and their dependencies on a target computer. After you have verified that the packages run successfully in Business Intelligence Development Studio, you will create the deployment bundle to use to install the packages.
First way to deploy SSIS Project:
Steps1:
In Solution Explorer, right-click the project, and then select properties
Step 2: Properties page appears change createdeploymentutility as true and Configuration to Development
Step 3: In bin\deployment\ folder of your project you will find two file one is manifest file and one is your package file. After deployment of your project manifest file is created that’s a file which we need. Open the manifest file. You will find package installation wizard
File System deployment: It will install your package in C:\Program Files (x86)\Microsoft SQL Server\100\DTS\Packages
Sql Server Deployment: It will install sql server management studio.
Your package is installed in your sql Server management studio
Second way to deploy SSIS Project:
Directly save your project as dtsx project in particular location
Third way to deploy SSIS Project:
In Sql management studio directly import your package
SSIS Standards
Staging Side:
• All Staging packages name should Prefix with Stage word.
• All Staging packages should suffix with _ET letter to better understand.
• All Packages names should match with table name.
Data warehouse Side:
• All Loading packages should prefix with Load work.
• All Loading packages should suffix with _L letter to better understand.
• All packages names should match with dimension/fact/join/junk table name
Execute Staging packages package name:
• The name of the package which will execute all staging packages should be MasterStage.
Execute DW packages package name:
• The name of the package which will execute all loading packages should be MasterLoad.
Data Sources Name:
Connection manager’s name should be like this:
• Source Connection = Source
• Staging Connection = Staging
• DW Connection = Production
• Environment Setting/Connection = Config
Components Names:
• All components/objects at Source side, Transformation and Destination side should have appropriate names.
• Need to add annotations for each component and its process to understand the flow.
Variables:
All variables must be easily identified whether they are task based scope or package level scope e.g.
• L_ variablename local variable for some task/component scope.
• Pkg_variablename for package level scope.