Pages

Ads 468x60px

Labels

Thursday 29 November 2007

BI Implementation Enabler: Agile Framework for Data Warehousing – Part 1


As part of the Business Intelligence Utopia series, I am going to focus on the implementation enablers in the next few posts. The first implementation enabler is: Agile Framework for Managing Business Intelligence systems
BI systems are complex to manage due to the following reasons:
  • Keeps Evolving over time – Enterprise DW can never be completely built
  • BI drives business decisions – Needs “Total Alignment” with corporate vision
  • Power of BI applications increases exponentially as the number of information consumers increases
  • Data Warehouses need to be measured & calibrated against pre-set goals
  • Development & Support has to be managed concurrently
Standard process methodologies like the Waterfall model, Spiral & Iterative models are not suitable for managing Business Intelligence systems. One methodology I have seen work very well, having used it in multiple projects at Hexaware, is the “Agile Methodology”. The philosophy of the Agile framework fits in very nicely to alleviate some of the complex issues in managing BI systems.
Agile Methodology – Definition
Agile development is a software development approach that “cycles” through the development phases, from gathering requirements to delivering functionality into a working release.
Basic Tenets:
  • Shared Vision and Small Teams working on specific functionality
  • Frequent Releases that make business sense
  • Relentlessly manage scope – Managing the scope-time-resources triad effectively
  • Creating a Multi-Release Framework with master plan and supporting architecture
  • Accommodate changes “gracefully”
BI practitioners would appreciate the fact that the basic tenets do provide solutions to some of the critical pain areas when it comes to managing enterprise BI systems.
The ultimate goal of any DW/BI project is to roll out new business functionality on a regular and rapid basis with a high degree of conformance to what is already there –> à Fits in well with the “Agile” philosophy
The next few posts will illustrate the practical application of the Agile Framework to Business Intelligence systems.
BI Information Nugget – One of the recent websites that I really liked is: www.biblogs.com. This is a Business Intelligence blog aggregation site that has blogs written by seasoned BI practitioners. Happy reading!

Wednesday 28 November 2007

Data Integration Challenge – Building Dynamic DI Systems – I


One other challenge in a data integration project as it is with any other IT system is to build a data integration environment that is agile and flexible enough to accommodate the system changes related to business rules. The benefit of building a dynamic system is that the code changes are less frequently done or completely avoided in many cases, so it is very much a de-facto process to look for design opportunities to build a DI system that is dynamic.
A dynamic DI system accommodates variation in the incoming data and is able to respond without system failures or mal-data processing, in many such scenarios the DI environment also gets to be controlled by the Business Team with lesser support from the IT team.
Following are some of the design aspects towards getting a DI system dynamic
  1. Avoiding hard references, usage of parameter variables

  2. Usage of lookup tables for code conversion

  3. Setting and managing threshold value through tables

  4. Segregating data processing logics into common reusable components

  5. Ensuring that the required processes are controllable by the Business team with the required checks built in

Avoiding hard references, usage of parameter variables:
In scenarios like defining simple database connections or to using a “Business Unit Number” for filtering data while extracting required data from a source system can be parameterized through variables. In the case of database parameterization it enables easier code movement across development-test-production environments and in the scenario of the Business Unit Number parameterization it enables running the same DI program for one other Business Unit with same data by just changing the parameter variable of Business unit Number
Usage of lookup tables for code conversion
In scenarios like where the incoming data value has to be converted to one other “standard value”, we usually write a IF..THEN… ELSE… syntax, here again we can bring in dynamism by having a lookup table which would carry the incoming value as one column and the standard value to be replaced as another column. The benefit is if there is any further change to the standard value the lookup table can be updated without opening the code and as well we can insert a new record into the lookup in case we need get the DI process handle a code conversion for a new incoming value.
We shall see more details and other design aspects in the coming days.
You might want to read these awesome related posts Data Integration Challenge

Monday 19 November 2007

Business Intelligence Utopia – Implementation Enablers


In this series of posts on Business Intelligence Utopia, I have presented some thoughts around 5 key technology enablers that in my view will take BI to the next level. They are:
1. Proliferation of transactions systems that support the analytical feedback loop Real-time / Right-time
2.Data Integration
3.Data Governance
4.Service Oriented Architecture
5.Extensible Data Models
All of these put together answer the question of “Where do you want to see Business Intelligence in the future”? I have 5 more to go to complete the discussion on the “Power of Ten” key enablers.
At this point, I felt that the question of “How to make it happen?” is as important as the technology enablers and hence decided to take a bit of digression to focus on some implementation aspects for Business Intelligence Utopia. I call them the “Implementation Enablers”. These enablers focus on process methodologies, specific techniques etc. that helps in managing the evolution of Business Intelligence within your organization.
There are 3 implementation enablers of interest, at this point in time:
  1. Agile Framework for BI

  2. Business Intelligence Calibration

  3. Function Point based Estimation technique for Data Warehousing

You can also visit the following link http://www.hexaware.com/webcastarchive1.html to listen to my recent webinar on “Agile Framework for Calibrating the Enterprise Data Warehouse” to get an idea of the implementation enablers.
To reiterate, my approach for future posts on this blog is to write about Business Intelligence Utopia along 2 dimensions:
Technology enablers – 10 of them have been identified. 5 of them were discussed already and 5 more to go.
Implementation enablers – 3 of them have been identified and will be elaborated in the next few weeks.
Bit of information – I recently took the TDWI (The Data Warehousing Institute) assessment of  Business Intelligence  Maturity done using their online tool and got some very good insights. If you are interested, please use the link- http://tdwi.org/display.aspx?id=8500 to take the assessment for your organization.

Friday 2 November 2007

Data Integration Challenge – Understanding Lookup Process – III


In Part II we discussed ‘when to use’ and ‘when not to use’ the particular type of lookup process, the Direct Query lookup, Join based lookup and the Cache file based lookup. Now we shall see what are the points to be considered for better performance of these ‘lookup’ types.
In the case of Direct Query the following points are to be considered

  • Index on the lookup condition columns
  • Selecting only the required columns
In the case of Join based lookup, the following points are to be considered

  • Index on the columns that are used as part of Join conditions
  • Selecting only the required columns
In the case of Cache file based lookup, let us first try to understand the process of how these files are built and queried.
The key aspects of a Lookup Process are the

  • SQL that pulls the data from lookup table
  • Cache memory/files that holds the data
  • Lookup Conditions that query the cache memory/file
  • Output Columns that are returned back from the cache files
Cache file build process:
Based on the product Informatica or Datastage when a lookup process is being designed we would define the ‘lookup conditions’ or the ‘key fields’ and also define a list of fields that would need to be returned on lookup query. Based on these definitions the required data is pulled from lookup table and the cache file is populated with the data. The cache file structure is optimized for data retrieval assuming that the cache file would be queried based certain set of columns called ‘lookup conditions’ or ‘key fields’.

In the case of Informatica, the cache file is of separate index and data file, the index file has the fields that are part of the ‘lookup condition’ and the data file has the fields that are to be returned. Datastage cache files are called Hash files which are optimized based on the ‘key fields’.
Cache file query process:

Irrespective of the product of choice following would be the steps involved internally when a lookup process is invoked.

Process:
  1. Get the Inputs for Lookup Query, Lookup Condition and Columns to be returned
  2. Load the cache file to memory

  3. Search the record(s) matching the Lookup condition values , in case of Informatica this search happens on the ‘index file’

  4. Pull the required columns matching the condition and return, in case of Informatica with the result from ‘index file’ search, the data from the ‘data file’ is located and retrieved

In the search process, based on the memory availability there could be many disk hits and page swapping.
So in terms performance tuning we could look at two levels

  1. how to optimize the cache file building process

  2. how to optimize cache file query process

The following table lists the points to be considered for the better performance of a cache file based lookup
Category
Points to consider
Optimize Cache file building process
 While retrieving the records to build the cache file, sort the records by the lookup condition, this sorting would speed up the index (file) building process. This is because the search tree of the Index file would be built faster with lesser node realignment
 Select only the required fields there by reducing the cache file size
 Reusing the same cache file for multiple requirements for same or slightly varied lookup conditions
Optimize Cache file query process
 Sort the records that come from source to query the cache file by the lookup condition columns, this ensures less page swapping and page hits. If the subsequent input source records come in a continuous sorted order then the hits of the required index data in the memory is high and the disk swapping is reduced
 Having a dedicated separate disk ensures a reserved space for the lookup cache files and also improves response of writing to the disk and reading from the disk
 Avoid querying recurring lookup condition, by sorting the incoming records by the lookup condition
You might want to read these awesome related posts Data Integration Challenge