You need a Business Intelligence Data Warehouse if you plan to implement a Business Intelligence (BI) solution to support your decision-making and improve your business processes. But DWH development is linked with additional expenses. Is it possible to save your money by implementing BI solution without DWH? What are the advantages of a data warehouse? Is there any visible business value of a data warehouse?
Benefits of a Data Warehouse over a Database
You are using a Database (DB) during your daily activities for entering, storing and modification transactional (that is, statistical) business data. For example, the database of your accounting software or CRM software etc.
Database contains detailed information about what you sold to whom and when: the Сustomer #1 from Segment #1 bought three units of the SKU#1 on the 10th of March 2020).
There can be tens of thousands of such entries per day. So you can’t use these data as a basis for decision making without initial preparation.
To prepare the data for analysis, you have to:
- download the data from the DB (the source database);
- upload it to the special software (e.g. Excel, Power BI, Tableau, etc.);
- make your calculations. The more calculations you need to do, the more time they take, and the higher the chances of making a mistake are.
Only after this, the data can be used for decision making.
A Data Warehouse (DWH), as usual, is a set of databases. A data warehouse stores both statistical and aggregated data. A DWH is created primarily to analyze data for decision making.
A DWH database could be the source of the following aggregated and calculated data:
- Total Sales (by Location, Category, SKU, Period, and more). For example, all Сustomers from Segment #1 bought 100 000 units of goods from Category #1 brought $1,000,000 in March 2020;
- Total Sales Growth (by Location, Category, SKU, and more). For example, it increased by 100,000$ or 10% in March 2020 compared with March 2019.
- Budget Vs. Actual (by Location, Category, Period, Сustomer Segment, and more). For example, the actual variance is $10,000 or -10%.
- and so on.
These data can be used to create models, e.g. to predict demand for goods from Category #1 from Сustomers from the Segment #1.
The data for the analysis are automatically loaded and precalculated in the DWH so you don’t have to spend financial resources on specialists’ salaries to get analysis-ready information. This also negates the possibility of human error.
A data warehouse is different from a database in that it contains aggregated and calculated data for analytical purposes. This is why you can’t do without a DWH if you need analytics for making business decisions.
Data Warehouse Benefits for Businesses
It’s better to explain the benefits of data warehouse using a contrario reasoning: what you could lose without Data Warehouse?
You could lose without DWH:
- The confidence that your BI tool will run at all. The risk of performance issues (hanging, and crashing) is close to 100% without DWH when yoг deal with the big amount of data. Rather than analyzing data, you will waste time waiting for your BI tool to just work for you.
- The confidence that your BI tool will work correctly. The risk of business data loss is close to 100% without DWH. Would you like to make mistaken decisions just because the software does not load all the necessary data without you knowing it?
Risk of Performance Issues without Data Warehouse
Let’s imagine that the source database is your accounting software (like Quickbooks etc.) or your CRM software (like Salesforce or Zoho etc). And you connected it directly to the BI tool (like POWER BI or any else) WITHOUT DWH to get insightful reports and dashboards.
You could face the following performance issues while downloading data from the source database.
1. Performance Issues with the source database
Let’s imagine that on Monday, at 9 AM the part of your staff is trying to fill the QuickBooks/Salesforce database with new transactional data, and the rest of your staff using POWER BI is trying to get reports from the same QuickBooks/Salesforce database.
In this case, both QuickBooks/Salesforce and POWER BI may become unresponsive because high-load resource-intensive POWER BI requests slow down the source database performance.
In fact, your staff will waste time waiting for the software to just work.
If you connected your accounting/CRM software with the BI tool WITH DWH you prevent such a challenge.
With a DWH, performance issues with the source database could be excluded completely because the data will be loaded not from the source database directly, but from the DWH. In turn, the DWH automatically extracts all the data from the source database beyond regular working time, for example at night without hampering others' work.
2. Performance Issues with the BI tool
BI tools always load raw data for analysis (e.g. every transaction for every client at every moment of time). But analytics requires aggregated data (e.g. total monthly sales by product group).
These data need to be aggregated and calculated with some sort of software.
In this example, we are talking about POWER BI installed on your computer. Its speed depends on the hardware of your computer. So if these PCs weren’t built for such tasks, they can freeze, especially if the volume of data is huge. (A retail shop, for example, can reach 100,000 transactions per day, and to analyze sales over 6 months POWER BI has to process around 2 million transactions or lines of data).
Moreover, if you would like to look at the report you’ve been waiting for from a different viewpoint by applying a filter, all the calculations will be redone, because aggregated data wasn’t saved anywhere. A lot of time will be spent waiting for the reports.
In fact, WITHOUT a DWH or using in-memory approach you will waste time waiting for your BI tool to just work.
WITH a DWH, performance issues with the BI tool could be excluded completely because all aggregations and calculations are premade beforehand in DWH. And you get ready-for-analysis data. WITH a DWH, your BI tool does not spend resources on aggregations and calculations.
Risk of Business Data Loss without Data Warehouse
At which stages of working with a BI tool could business data loss happen?
1. Errors during the BI data update from a database source
Suppose you press the “refresh” button in POWER BI to view the dashboard with the current information from your accounting software of CRM.
But this is what could happen:
1.1. Data loss due to the temporary disconnection during the load.
Live loading of data takes time, say about 1 minute. Data exchange almost always happens over the network, be it a local one or the Internet. If there was a glitch during the load time, some data won’t load in the POWER BI.
As a result, some sales over a certain period won’t get into the reports, your analytical conclusions would be incorrect, and you won’t even notice that.
If the data from the accounting software or a CRM went through a DWH before being uploaded to POWER BI, your DWH would automatically check it for consistency (e.g. by checksum verification). Should the error occur, the DWH will load the missing data automatically.
1.2. Inability to access the data source because of the request limit
Suppose you’ve launched a Google Ads marketing campaign for several product groups. You need to estimate your advertising material efficiency for each of them. But you want to not only see which ads are clicked on, but which clicks get converted to revenue so you know where to put your budget to maximize your ROI.
To solve this problem you need to upload your QuickBooks sales data filtered by dates and commodity groups and combine it with the Google Ads data on clicks for each ad.
But you can only get the Google Ads data via API, which has API Limits and Quotas. The more of your employees need to get such reports and the more reports are compiled daily, the quicker you spend your quota. If you try to get another report after your limit has been reached, you won’t get data. The source would be inaccessible.
If the Google Ads data was loaded to a DWH before being transferred to POWER BI you wouldn’t have to make multiple Google Ads requests and use up your limits. You will just use the data that was requested once and stored in the DWH.
1.3. Denial of access to the data source. No connection to a remote DB.
Suppose you urgently need to get a report, but there is no connection with a remote DB, for example with QuickBooks Online or Salesforce CRM, because there is maintenance going on. As a result, at 9 AM on Monday you can’t even get the Friday reports, let alone the current ones.
If your CRM or accounting software data was routed through DWH before getting to POWER BI, you would still be able to get the reports based on the latest upload. As a rule, the upload happens at least once per day according to the predetermined schedule in the DWH.
And if the DWH can’t access the source, it uses a scheduler, using which any process can be automated to run at predefined intervals. The DWH will try over and over again until it gets the data. You can perform synchronizations as frequently as every 5 minutes.
1.4. No access to the historical data at the source
If you use third-party resources for getting data (e.g. SaaS platforms) their owners can wilfully clear your historical data and old transactions they deem not relevant anymore. Or they can push you towards using a more expensive subscription plan.
They do it because they don’t want to expand their DB servers, which costs money.
Routing the data from third-party resources to the DWH before it gets sent to POWER BI makes you independent of the third-party data storage policies.
2. Errors during data entry in the source DB
For example, your employee could erroneously enter the customer in the CRM or accounting software as “Dylan Soulfrank” while he already exists there as “Dylan Soilfrank” - one letter is different. QuickBooks or Salesforce will perceive it as two separate clients. As a result, you’ll get “Dylan Soulfrank” as a new customer, while “Dylan Soilfrank” will become a lost client because of no new transactions.
If you are making a personal customer forecast based on regression (turning to historical data), the data for both accounts would be wrong and you will make an incorrect sales estimate.
If the data from your accounting software or a CRM was routed through DWH before getting into POWER BI, your data warehouse expert would notice the spelling error in QuickBooks and would suspect a duplicate. This can be checked and fixed either in the QuickBooks DB, or in the DWH itself.
As you can see, connecting a source DB and the BI tool is not enough. You need to think about the place to aggregate and precalculate data and save the calculations. There are technical reasons preventing you from keeping this information in the source DB, as we’ve discussed in the article about differences between data warehouse and database. So you have only one option remaining: store the calculations data within RAM, which will lead to data loss and performance issues. This is why the concept of DWH was developed.