Home > The EII Zone > Tech Focus - Caching

Enterprise Information Integration (EII) federates information from disparate sources and allows applications to make a single query to access a broad selection of corporate data. The advantage of EII over alternative integration approaches is that EII provides decision makers information on demand, enabling real-time decisions based on current data. In some cases, EII uses caching to speed response time and to minimize EII-related traffic on the network.

An EII platform operates by sending a query (or queries) to a remote data source for execution and then fetching the query results. Caching stores the query result locally. Caching is sometimes referred to as materialized views, meaning that the virtual EII views are brought together (materialized) and stored. While an EII application does not have to include caching capabilities, many common EII platforms use either a built-in database to cache query results or rely on an external database to perform that function.

A cache delivers results more quickly to an application than processing the requests and retrieving the information from remote data sources. A user can see the response to his or her query much faster (often by an order of magnitude or more). In addition, caching information on an EII server reduces the load on the network and on remote data sources.

Synching
Synching is an important area to address when caching is used in an EII configuration. If the cached data is not current, the query results retrieved from the cache can be meaningless - or even worse, misleading. A cache must therefore have settings that allow for data to be kept current. The definition of 'current' will, of course, vary by application and data type.

EII platforms address the issue of data currency using cache invalidation mechanisms. These are settings on the cache as a whole, on a particular view, or set of results for a group of users. When a cached result set becomes invalid, the EII platform will know to execute the query against the remote data sources the next time the query is invoked.

Caching works because not every back end data source changes frequently. While stock prices, inventory levels, transaction volumes, and the like fluctuate during a day, things like interest rates, product prices, and yesterday's averages will only fluctuate daily. Since many EII views combine queries on data that fluctuates "in real time" and that which changes less frequently, caching is very practical. Real-time data can be queried live, and other data can be cached with an invalidation time of, for example, one day.

Replication and Caching
People occasionally confuse simple data replication and caching. There are two key differences.

First, a cache contains the query result set rather than the source data itself. This corresponds to a much smaller amount of data, and it represents the information that a user wants. Queries to the cache require much less computational expense and time, and, of course, the whole process is automated. Caching with EII also lessens concerns over data ownership. Quite often, source system administrators will allow access (i.e. queries) to their system, but get squeamish when people mention the word 'replication'.

Second, caching in EII typically contains only a piece of an entire view, as described above. The query result in the cache is only a sub-query result that the EII server combines with other live sub-queries against other back end sources.

Advanced Caching Options: Auto-Populate, Rule Checking and Lineage
In the example above on cache invalidation time, we mentioned that a query would be executed again if the cache became invalid. While this works, in many cases it makes more sense to pre-populate a cache. For example, a mortgage broker office would like to get a variety of daily interest rates loaded into the cache before the work day starts. Auto-population allows this. These are specific queries that execute on a particular schedule.

Another advanced caching feature involves data validation. The idea is that the cache can perform a set of rule checks - often as a stored procedure - on the cached data. So, if data is missing or obviously incorrect, the EII platform can either query the back end systems or simply alert the user. This capability does borrow a page from the ETL playbook, but should not be confused with the data cleansing or bulk transformation those products provide.

Lastly, a useful feature of caching is data lineage. With many users pulling data together from multiple sources, auditors and executives must know where the data came from, when, and at whose request. Data lineage requires caching so a copy of the data can be viewed along with its lineage. Generally, system logs do not contain enough information for use in this context. Imagine trying to reconstruct a query used to make a decision without the data itself. Extremely hard, and potentially impossible in a timeframe to satisfy an angry executive or overzealous auditor.

Additional Cache Considerations
Given the advanced capabilities needed for caching in EII, there are several design considerations. First, simply using the file system as a cache is not adequate. Trying to run rule checks with stored procedures, insert metadata for data lineage, or process transactions are next to impossible.

Also, a cache should accommodate a mix of data sources. SQL result sets should be as easily cached and manipulated as XML data from transactions or Web Services. This is because EII applications often mix data together in views and need cached result sets of disparate data.

For an example of how intelligent caching can streamline your EII, please contact Ipedo.


 

Company | Products | Solutions | News & Events | Developers | Contact Us | Site Map