Sunday, March 21, 2010

Oracle Database 10g CBO (Cost based optimizer) and cost model

Oracle Database 10g CBO (Cost based optimizer) and cost model

The CBO chooses the best path for your queries, based on what it knows about your data and by leveraging Oracle database features such as bitmap indexes, function-based indexes, hash joins, index-organized tables, and partitioning, whereas the RBO just follows established rules (heuristics). With the release of Oracle Database 10g, the RBO's obsolescence is official and the CBO has been significantly improved yet again.

The CBO now has a special tuning mode that is used by many other Oracle Database 10g features, including the SQL Tuning Advisor and SQL Access Advisor.

When a SQL Query submitted

SQL queries submitted to the Oracle database engine first run through the parser, which checks syntax and analyzes semantics. The result of this run-through is a set of query blocks that is sent to the optimizer, which comprises three main functional subsystems—specifically, the query transformation engine, the estimator, and the plan generator.

1. Query transformation engine

The query transformation engine applies heuristics- or rules-based algorithms to the query blocks. Some common transformations include select-join, group-by, and distinct-view merging and materialized view rewrite.

Common Transformations

Oracle performs numerous query transformations, changing and rewriting queries into equivalent but more optimal forms. Some of these transformations are heuristics- or rules-based transformations, meaning they are based on the structural properties of the query—number of tables, types of join and filters, presence of grouping clauses, for example—and not on selectivity, cardinality, distribution, join order, or related costs of various database operations.

Other transformations, such as materialized view rewrite, star transformation, and OR-expansion, are cost-based transformations in which a query is copied, transformed, and costed repeatedly (and recursively, each time using a new set of transformations), until at the end of the process, one or more transformations is selected to be applied to the original query, if it results in an optimal cost. Here's a summary of the most common transformations:

Heuristics-based transformations.

  • Select-join
  • Select-join-project view merging
  • Group-by and distinct-view merging
  • Subquery unnesting
  • Filter predicate push-down in a view
  • Predicate move-around
  • Common subexpression elimination
  • Join predicate push down
  • Outer to inner join conversion
  • Subsumed subquery removal through window function,

Cost-based transformations.

  • OR expansion—converts a query with ORs in the WHERE-clause into a UNION ALL of several queries without ORs, which is highly beneficial when the ORs refer to restrictions of different tables
  • Materialized view rewrite
  • Star transformation
  • Conversion of set operator intersect into join and minus into anti-join

A large number of query blocks and interdependence of the transformations themselves add to the complexity of any cost-based transformation strategy. With Oracle Database 10g, the transformation engine has been redesigned as an extensible framework that can support new algorithms as needed, to ensure that Oracle can always quickly select the optimal transformation.

In addition, the Oracle Database 10g transformation framework supports several new state space search algorithms that ensure that the optimizer finds the best plan quickly, even when transformations are complex.

2. The estimator

The estimator measures cardinality (the number of rows); selectivity (the fraction of rows from a row set filtered by predicate); and cost, defined as resources (disk I/O, CPU usage, and memory) of the various operations (scanning a table or joining tables together, for instance) that will yield the data. The accuracy of these estimates depends on statistics—the object statistics and now, in Oracle Database 10g, system statistics. SQL profiles specific to any query are also used at this point to effectively estimate costs.

Oracle Database 10g CBO Uses a New Cost Model

More important,However, the cost model itself has changed. In Oracle9i Database, the CBO used an I/O cost model that evaluated everything primarily in terms of single block reads, largely ignoring CPU costs (or using constants to estimate CPU costs). In Oracle Database 10g, the cost model now includes system statistics that realistically capture CPU and I/O on your system with respect to the Oracle workload.

In Oracle Database 10g, the default cost model is CPU plus I/O, in which the cost unit is time: The CBO estimates execution time for a query by estimating the number of I/O operations, the type of I/O operations, and the number of CPU cycles the database will perform while executing the query. These estimates depend on the existence of system statistics, which the CBO uses to convert the number of CPU cycles and the number of I/Os into execution time. (Note that some operations, such as bitmap merge, sort, or hash joins, may not require any I/O but that all operations include a CPU component.)

Oracle Database 10g gathers two types of system statistics—statistics captured without a workload (noworkload) and statistics captured with a workload. Noworkload statistics capture I/O system performance—average I/O seek time and transfer speed—and CPU speed. When gathering noworkload statistics, the CBO issues sample reads of different sizes from the database's datafiles; it times every read and then uses statistical methods to compute average seek time and transfer speed. This takes from a few seconds to a few minutes. The CBO computes CPU speed in millions of cycles per second.

Workload statistics make the CBO aware of the workload. The system statistics captured during workload conditions identify whether the system is I/O- or CPU-bound; the CBO uses the data to adjust the cost of the plans accordingly. To gather workload statistics, execute these commands at the start and the conclusion of your workload:

dbms_stats.gather_system_stats(gathering_mode=>'start')
 
dbms_stats.gather_system_stats(gathering_mode=>'stop')

You can see the value by following query.

SQL> select sname, pname, pval1
from sys.aux_stats$;
 

CPUSPEEDNW, IOSEEKTIM, and IOTFRSPEED are noworkload statistics; SREADTIM, MREADTIM, CPUSPEED, MBRC, MAXTHR, and SLAVETHR represent workload statistics.

If both workload and noworkload statistics are available, the optimizer uses workload statistics.

  • SREADTIM—single block read time (milliseconds)—is the average time Oracle takes to read a single block.
  • MREADTIM—multiblock read time (milliseconds)—is the average time taken to read sequentially.
  • MBRC—multiblock read count—is the blocks, on average, read during multiblock sequential reads. The CBO uses MBRC instead of thedb_ multiblock_read_count parameter during query optimization to compute costs for table and fast full index scans.
  • MAXTHR—maximum I/O system throughput—is captured only if the database runs parallel queries.
  • SLAVETHR—maximum slave I/O throughput—is captured only if the database runs parallel queries.

Another benefit of the Oracle Database 10g cost model is that it enables the CBO to use other techniques to create the fastest plan possible—for example, by reordering predicates in a query. Because the CBO knows the costs associated with a given query in terms of CPU and because the main ingredient in the cost of a predicate is CPU, the CBO can calculate costs with alternative predicate orderings and reorganize the predicates in the most efficient order.

For example, assume that for the following select statement, b <> 0 returns all rows in table t:

select * from t 
where a > 0 and b <>

If predicate order is a > 0 and b <> 0, then the predicate a > 0 will be evaluated just once.

The algorithms for ordering predicates consider the costs of various combinations of the predicate order and selectivity when determining the best order. This feature provides considerable improvement, especially for nonselective and expensive predicates.

3. The plan generator

The plan generator recombines the query in various ways, constructing alternative plans by using different join methods and join orders. The more table joins in the original query, the more alternative plans are generated and the longer the time it may take the CBO to find the "best" plan. Conversely, if there are few joins in a particular query or the joins are in the best order to begin with, the CBO will find the optimal plan quickly.

Ultimately, the CBO picks the best plan—the one with the lowest estimated cost relative to other plans. But precisely how the CBO measures the cost of processing any SQL statement—the cost model—is influenced by many factors, including several initialization parameters , starting with the optimizer_mode. The optimizer_mode sets the overall performance goal for the CBO in terms of throughput (the minimal time to return all rows) versus response time (the minimal time to return the first rows).

Conclusion

Oracle's cost-based optimizer is a strategic subsystem of Oracle Database. Given that the CBO improves with each new release of Oracle Database, you'll want to consider the ramifications that changes to the CBO may have on your Oracle database applications. If you're migrating an existing system to Oracle Database 10g, be sure to back up all the statistics your system uses for daily operations and test these statistics out with your applications in a test environment built around the new version of the database before moving into production.

Annexure - A

Initialization Parameters that Affect the CBO

select * from v$sys_optimizer_env ;

Optimizer_mode sets an overall performance goal for the CBO in terms of throughput versus response time. In Oracle Database 10g, the default setting (all_rows), primes the CBO to select for overall best throughput—the optimizer will choose the plan that delivers all the rows in a query in the shortest possible overall time. On the other hand, the first_rows setting will cause the optimizer to provide fastest response time by choosing the plan that delivers an initial subset of rows more quickly, and the remainder of the query in a longer period of time than the all_rows setting.

The "first_rows_n" setting (new as of Oracle9i Database) lets you balance the possible variances in this continuum (throughput versus response) more directly, by letting you set the number of rows to 1, 10, 100, or 1000. In Oracle Database 10g, if the query contains a rownum predicate, the CBO automatically switches to first_rows_n mode, deriving the value of n from the rownum predicate.

Another important optimizer initialization parameter is optimizer_dynamic_sampling. Statistics on database objects (size of tables, indexes, number of rows in tables, number of distinct values in columns, percentage of table below the high-water mark, and so on) play a key role in estimating costs (see Table 2), and if the CBO can't find up-to-date statistics to work with during the cost estimation phase of statement processing, it can dynamically sample statistical data at that time—if this parameter is set. The default level in Oracle Database 10g is 2, which means the CBO will use dynamic sampling on nonanalyzed tables to collect necessary statistics from them while optimizing the query.

Annexure - B

SQL profile

================

One of the artifacts produced by the SQL tuning mode is a SQL profile—auxiliary statistics specific to a given statement—that can be persisted to the data dictionary and used at runtime (during normal CBO operations) to improve SQL performance. SQL profiles are particularly useful for packaged applications (or any application for which you cannot access the source code).

SQL profiles use sampling to collect additional information; partial execution techniques to verify and adjust cost estimates; and execution history information for the SQL statement to modify parameter settings, if needed.

No comments: