Home > The EII Zone > Tech Focus - Query Response Optimization

In a typical enterprise information integration (EII) scenario, applications query an EII server, which parses the request to different remote data sources. To speed the flow of information to the end-users, EII products use a combination of query optimization and other information staging and streaming techniques. Query optimization can greatly improve application performance, as well as reducing the load on the production systems that provide the source data.

There are two general approaches that EII products use to optimize queries: cost-based and rule-based, although there is overlap between the two methods. For example, both methods apply rules, although cost-based optimizations use input calculated from the data sources to determine what querying techniques to apply while rule-based optimizations follow a pre-determined methodology that does not rely on information about the stored data. Generally, EII products try to use cost-based algorithms when they can, but also have rule-based approaches to use when they cannot obtain cost information.

In addition to query optimization, EII products use other means such as caching and streaming to expedite information flow to the user. Table 1 lists a few of the ways EII servers speed query response. This page will discuss several commonly used techniques to maximize query performance and minimize resource consumption.

Table 1 - EII Servers Use Several Methods to Speed Query Response

Cost-based Query Optimization
•  Join optimization
•  Index analysis

Rule-based Query Optimization
•  Query statement rewriting
•  Pushdown analysis
•  Predicate reordering

Data Flow Management
•  Caching
•  Streaming results



Cost-based optimization
Cost-based optimization considers the capabilities and runtime statistics of the remote data source to calculate a cost estimate. This determines whether and how the EII server should send a SQL operation to the remote data sources for execution. The runtime statistics include the number of rows in a table, the existence of an index on a column, the selectivity of an index (number of different values in the column), and the selectivity of a predicate on a column (expected percentage of rows that meet the match criteria), among other factors. Cost-based optimizers generally pursue a low-cost option, as opposed to the lowest-cost option, as spending resources on its own calculation time would delay getting query results back to the user.

Cost-based optimization pays the greatest dividends in joining data from different sources. In these cases, the query engine decides the best join order and chooses the best join algorithm (nested loop or hash join) based on index information and cost estimates from the different tables. For example, the query optimizer will choose the smaller table with a unique index as the inner table of nested loop join. If a join query has selection conditions on two joined tables besides the join condition, the query optimizer may ask the remote sources to execute the selections first, then fetch the results back and do a local hash-join.

Rule-based optimization
A common form of rule-based optimization involves pushdown analysis. The goal of pushdown analysis is to determine whether a remote relational data source performs a SQL operation efficiently so as to reduce the size of the data set that the EII server must handle. This becomes especially important for table join operations involving complex query criteria, in which large amounts of filtered information may need to flow between different databases. The EII server uses a set of rules and heuristics to determine which pushdown operations will best shift the load to the data source and minimize overall system data handling.

The query engine inside the EII server takes a SQL request from a client, parses and optimizes the query to decide which SQL operations to perform locally and which operations to send to the remote data source for execution. The EII query engine will push down the search condition as much as possible to the data source for execution, because it avoids fetching a big result set back from the remote source. It also sends join operations on tables from the same data source to the remote resource for execution instead of fetching the tables back and doing the join locally. In addition, the EII query engine will push down any sorting criteria if possible so that result are returned from the remote sources sorted and minimize the need to re-sort the final result within the EII query engine.

Streaming
The query engine uses streaming techniques to chain together the consumer and producer of a query result set. The consumer of a query result set will ask its producer for the next row when it needs a new row. This avoids generating a huge intermediate query result set and consuming too much memory.

Also, the query engine will return the first results to the client as soon as the first result rows are generated without waiting for the entire query processing to finish. This dramatically speeds query response time.

Caching
The EII query engine often needs to send a query to a remote data source for execution and then fetch the result back. Caching is used to store the query result locally so the EII server can answer the query from the cache rather than sending it to the remote data source. The EII query engine often provides mechanisms to synch up the cache with the remote source and invalidate the cache intelligently.

For an example of how query optimization can streamline your EII, please contact Ipedo.
 

Company | Products | Solutions | News & Events | Developers | Contact Us | Site Map