Pages

Ads 468x60px

Labels

Showing posts with label Data Warehouse. Show all posts
Showing posts with label Data Warehouse. Show all posts

Monday 11 February 2013

A Proactive Approach to Building an Effective Data Warehouse

“We can’t solve problems by using the same kind of thinking we used when we created them.” – The famous quote attributed to Albert Einstein applies as much to Business Intelligence & Analytics as it does to other things. Many organizations that turn to BI&A for help on strategic business concerns such as increasing customer churn, drop in quality levels, missed revenue opportunities face disappointment. One of the important reasons for this is that the data that can provide such insights is just not there. For example, to understand the poor sales performance in a particular region during a year, it will not just help to have data about our sales plan, activities, opportunities, conversions and sales achieved / missed, it will also require understanding of other disruptive forces such as competitors promotions, change in customer preferences, new entrants or alternatives.

Thomas Davenport, a household name in the BI&A community, in his book ‘Analytics at Work’, explains the analytical DELTA (Data, Enterprise, Leadership, Targets and Analysts), a framework that organizations could adopt to implement analytics effectively for better business decisions and results. He emphasizes that besides the necessity of having clean, integrated and enterprise-wide data in a warehouse, it is also important that the data enables to measure something new and important.

Wednesday 9 April 2008

Data Integration Challenge – Initial Load – II

The other challenges during Initial load are the
System Performance
Catching Up
System Performance is always a challenge during initial load especially when many years of history data are to be loaded; there is an increase in the usage of system resources because of huge data load which wouldn’t happen during regular incremental load. Some of the ways of handling the system performance issue during initial load are
Group the data loads by filters like years or customer and load the related data in chunks. We could load data for the month Jan, Feb, Mar or load the customers region wise from NY, followed by NJ etc. Such grouping of records for loading eliminates data surge and as well provides better way to perform data validation
As the data gets loaded into the warehouse the data required for lookup from the warehouse becomes huge, we need to decide the required lookup records based on the incoming data. For example if the data in the warehouse has data related to all the regions North, South, East, West and the incoming data currently has only North data then we need to have a override filter and access only the data pertaining to North  from the warehouse
We could plan and increase the available memory for ETL server for a temporary period
Avoid sequential inserts, write data to a file, sort and bulk load
Determine and plan for more disk space requirement for initial load files that are extracted and provided by the source systems is an interesting problem where in the warehouse is not able to cope up and able to deliver latest data (or 1 day old) as in the source system. This problem would be more often due to the ETL performance issues, where even before the initial data is successfully loaded and verified in the DW, additional set of new records would have come from the source which the DW is not able to catch up.
Though at times some kind of code tuning, running things in parallel or hardware upgrade (usually a costly one at a later stage) could resolve such problems, in certain situations this problem could run into an unsolvable state where in the complete ETL architecture has to be re-looked.
One other way to manage such situations is to have the daily process of loading current data to proceed independently and in parallel through a separate of processes bring in the history data on a regular basis, in certain scenarios we might need to build a process that would run and sync up the current with the old data especially the aggregate data if any designed in the data model.
Read More about