Friday, September 16, 2011

What’s new in Denali CTP3? -- SSAS



Before we get into the details of the BI Semantic Model, let me first address an important question: What about existing Analysis Services cubes (also known as Unified Dimensional Model or UDM)?
The answer is very simple
: when you upgrade your Analysis Services project or server to SQL Server “Denali”, every cube automatically becomes a BI Semantic Model. The BI Semantic Model is an evolution of the UDM from a pure multidimensional model into a hybrid model that offers all of the capabilities of the UDM and more. As such, the term Unified Dimensional Model is no longer used with SQL Server “Denali” and is subsumed by the BI Semantic Model. This is a key difference in the roadmap from what was disclosed last November at PASS 2010.
Now let’s get into the details of the BI Semantic Model – how to build a model and the technology that powers the model. The BI Semantic Model can be viewed conceptually as a three-layer model:
• Data Model: This is the conceptual data model that the model developer and the model consumer work with. The BI Semantic Model is a hybrid model supporting both multidimensional and tabular data modeling. The model developer can use BI Development Studio to create a BI Semantic Model using a multidimensional project or a tabular project. And the client tool can consume the model using a multidimensional or a tabular interface. Model developers will typically choose between multidimensional and tabular projects based on the needs of their application and their skill set. On the other hand, client tools will typically use either the multidimensional or tabular interface depending on the nature of the intended user experience. Rich analytical tools such as Excel will typically use the multidimensional interface while ad hoc reporting tools such as Crescent will use the tabular interface.
• Business Logic & Queries: This layer represents the intelligence or semantics in the model. Model developers can embed business logic in the model using MDX (Multidimensional Expressions) or DAX (Data Analysis Expressions) calculation languages. And client tools consuming the model can express additional business logic using MDX or DAX queries. MDX was pioneered by the Analysis Services team over the last 12 years and has become the BI industry standard for multidimensional calculations and queries. DAX is an expression language based on Excel formulas that was introduced in PowerPivot and built on tabular concepts. Again, model developers and client tools will choose between MDX and DAX based on application needs, skill set, user experience, etc.
• Data Access: This layer integrates data from multiple sources – relational databases, business applications, flat files, OData feeds, etc. There are two modes for retrieving and managing this data – cached and passthrough. The cached mode pulls in data from all the sources and stores it in a compressed data structure that is optimized for high speed data access. The passthrough mode pushes query processing and business logic down to the data source, thereby exploiting the capabilities of the source system and avoiding the need to copy the data. Obviously there is a tradeoff between the two modes and the model developer needs to make the choice based on application needs.
o There are two storage engines that can be used for caching the data – MOLAP and VertiPaq™. MOLAP is the storage engine that has been used in Analysis Services for many releases. It is optimized for OLAP and uses techniques such as pre-built aggregates, bitmap indexes, and compression to deliver great performance and scale. In SQL Server “Denali”, we are introducing the VertiPaq storage engine as an alternative to MOLAP. VertiPaq is an in-memory column store engine that combines state-of-art data compression and scanning algorithms to deliver blazing fast performance with no need for indexes, aggregates or tuning.
o There are two options for passthrough – ROLAP and DirectQuery. ROLAP is complementary to MOLAP while DirectQuery is complementary to VertiPaq. Both attempt to push query evaluation, as much as possible, down into the data source. ROLAP is optimized for large fact tables and relatively small dimension tables, while DirectQuery is mostly neutral towards the backend database structure

• The BI Semantic Model can be authored by BI professionals in the Visual Studio 2010 environment using a new project type that will be available as part of “Project Juneau”. Juneau is an integrated development environment for all of SQL Server and subsumes the Business Intelligence Development Studio (BIDS). When a business user creates a PowerPivot application, the model that is embedded inside the workbook is also a BI Semantic Model. When the workbook is published to SharePoint, the model is hosted inside an SSAS server and served up to other applications and services such as Excel Services, Reporting Services, etc. Since it is the same BI Semantic Model that is powering PowerPivot for Excel, PowerPivot for SharePoint and Analysis Services, it enables seamless transition of BI applications from Personal BI to Team BI to Organizational (or Professional) BI.

• Now let’s talk about Analysis Services. While PowerPivot is targeted at business users to build Personal and Team BI applications, Analysis Services is the platform for building Professional BI applications. These are formal BI applications, sanctioned and funded by IT, with project plans, and are built and managed by BI professionals. Analysis Services is the industry leading BI platform in this space today. With the introduction of the BI Semantic Model, there are two flavors of Analysis Services – one that runs the UDM (OLAP) model and one that runs the BISM model. This is a side-by-side offering – you can choose to run one instance of Analysis Services hosting a UDM model and another instance hosting a BISM model on the same server.

• You might ask – Why do we have two types of models in Analysis Services? Which one should I use? Is the UDM going to be deprecated now that we have the BISM? The answer is NO! Analysis Services just got a lot better in SQL Server “Denali” with the BISM and VertiPaq and DAX! However the UDM is a mature and industry leading technology and is here to stay. Let me repeat – UDM (OLAP) models are not being deprecated!

• To explain the relationship between BISM and UDM, I’ll use the analogy of the C++ and C# programming languages. At the time when C# was introduced along with the .NET framework, C++ was a mature programming language. It was a powerful programming language that let you do pretty much what you wanted with the underlying hardware. At the same time, the learning curve for C++ was high and lots of programmers repeated common mistakes resulting in poorly written applications that performed badly or crashed regularly. Sure, if you were an ace programmer, you could do wonders with C++. Then came C# and offered a new way to build applications that was simple, flexible and with a faster time to solution, however it did not claim to do everything that C++ could do. Over the past decade, more and more applications have been written using C# because of these benefits. However there still are applications today that need and use the power of C++ (a great example is the VertiPaq engine!). Also there are lots of existing applications that were written in C++ that could have been rewritten in C#, but the developers didn’t bother since the application was working perfectly fine. There is a happy coexistence of C++ and C# today even though there is a lot overlap in functionality between the two.

• Just like in the above analogy, the UDM and BISM models will coexist in the Microsoft BI stack. Most existing UDM applications will continue to run as is and will be supported by Microsoft. Some existing UDM applications that are undergoing a major overhaul might be migrated to run on the BISM if appropriate. For new BI applications, the choice between UDM and BISM will depend on the complexity of the application. For BI applications that need the power of the multidimensional model and sophisticated calculation capabilities such as scoped assignments, the UDM is the way to go. For a broad majority of BI applications that don’t need that level of complexity, the BISM will offer a rich, high performance platform with faster time to solution.

• As for the Analysis Services roadmap, the UDM is a mature technology while the BISM is a brand new technology. By virtue of this, you should expect to see more investment put into the BISM and less in the UDM. However, make no mistake, we will continue to invest in the UDM based on customer and partner feedback and maintain its market leading position. For example, at the PASS 2010 conference this week, one of the keynotes featured a 12TB cube built by Yahoo! To address the needs of large scale UDM models like these, we lifted the 4GB attribute string store limit in SQL Server “Denali”. In addition, we’ve added support for XEvents and other monitoring capabilities to better troubleshoot and manage large UDM models. We’ve also made performance and scalability improvements in the MDX query processor and MOLAP storage engine that underpin the UDM.

• Business intelligence is going to be even more critical to the success of organizations, large and small, going forward. The Analysis Services team is committed to staying on the cutting edge of BI innovation and leading the industry with breakthrough technologies such as VertiPaq. At the same time, we recognize our rich heritage in OLAP and continue to be humbled by success of our product and the vast ecosystem of customers and partners that it has helped us build. We look forward to your support in helping us take this product to even greater heights.