Find us on Google+ Google+

June 3, 2010

Oracle Statistics

Whenever a valid SQL statement is processed Oracle has to decide how to retrieve the necessary data. This decision can be made using one of two methods:
  • Rule Based Optimizer (RBO) - This method is used if the server has no internal statistics relating to the objects referenced by the statement. This method is no longer favoured by Oracle and was desupported from Oracle 10g.
  • Cost Based Optimizer (CBO) - This method is used if internal statistics are present. The CBO checks several possible execution plans and selects the one with the lowest cost, where cost relates to system resources. Since Oracle 8i the Cost Based Optimizer (CBO) is the preferred optimizer for Oracle.
Oracle statistics tell us the size of the tables, the distribution of values within columns, and other important information so that SQL statements will always generate the best execution plans. If new objects are created, or the amount of data in the database changes the statistics will no longer represent the real state of the database so the CBO decision process may be seriously impaired.

Oracle can do things in several different ways, e.g. select might be done by table scan or by using indexes. It uses statistics, a variety of counts and averages and other numbers, to figure out the best way to do things. It does the figuring automatically, using the Cost Based Optimizer. DBA job is to make sure the numbers are good enough for that optimizer to work properly.

Oracle statistics may refer to historical performance statistics that are kept in STATSPACK or AWR, but more common use of the term Oracle statistics is about Oracle optimizer Metadata statistics in order to provide the cost-based SQL optimizer with the information about the nature of the tables. The statistics mentioned here are optimizer statistics, which are created for the purposes of query optimization and are stored in the data dictionary. These statistics should not be confused with performance statistics visible through V$ views.

The optimizer is influenced by the following factors:
  • OPTIMIZER_MODE in the initialization file
  • Statistics in the data dictionary
  • Hints

OPTIMIZER_MODE can have the following values:
CHOOSE
ALL_ROWS
FIRST_ROWS
RULE

If we provide Oracle with good statistics about the schema the CBO will almost always generate an optimal execution plan. The areas of schema analysis include:

  • Object statistics - Statistics for all tables, partitions, IOTs, etc should be sampled with a deep and statistically valid sample size.
  • Critical columns - Those columns that are regularly-referenced in SQL statements that are:
    • Heavily skewed columns - This helps the CBO properly choose between an index range scan and a full table scan.
    • Foreign key columns - For n-way table joins, the CBO needs to determine the optimal table join order and knowing the cardinality of the intermediate results sets is critical.
  • External statistics - Oracle will sample the CPU cost and I/O cost during statistics collection and use this information to determine the optimal execution plan, based on optimizer_mode. External statistics are most useful for SQL running in the all_rows optimizer mode.
Optimizer statistics are a collection of data that describe more details about the database and the objects in the database. These statistics are used by the query optimizer to choose the best execution plan for each SQL statement. Optimizer statistics include the following:
  • Table statistics
    • Number of rows
    • Number of blocks
    • Average row length
  • Column statistics
    • Number of distinct values (NDV) in column
    • Number of nulls in column
    • Data distribution (histogram)
  • Index statistics
    • Number of leaf blocks
    • Levels
    • Clustering factor
  • System statistics
    • I/O performance and utilization
    • CPU performance and utilization
The optimizer statistics are stored in the data dictionary. They can be viewed using data dictionary views. Only statistics stored in the dictionary itself have an impact on the cost-based optimizer.

When statistics are updated for a database object, Oracle invalidates any currently parsed SQL statements that access the object. The next time such a statement executes, the statement is re-parsed and the optimizer automatically chooses a new execution plan based on the new statistics. Distributed statements accessing objects with new statistics on remote databases are not invalidated. The new statistics take effect the next time the SQL statement is parsed.

Because the objects in a database can be constantly changing, statistics must be regularly updated so that they accurately describe these database objects. Statistics are maintained automatically by Oracle or we can maintain the optimizer statistics manually using the DBMS_STATS package.

DBMS_STATS package provides procedures for managing statistics. We can save and restore copies of statistics. You can export statistics from one system and import those statistics into another system. For example, you could export statistics from a production system to a test system. We can lock statistics to prevent those statistics from changing.

For data warehouses and database using the all_rows optimizer_mode, from Oracle9i release 2 we can collect the external cpu_cost and io_cost metrics. The ability to save and re-use schema statistics is important:
  • Bi-Modal databases - Many databases get huge benefits from using two sets of stats, one for OLTP (daytime), and another for batch (evening jobs).
  • Test databases - Many Oracle professionals will export their production statistics into the development instances so that the test execution plans more closely resemble the production database.
Creating statistics
In order to make good use of the CBO, we need to create statistics for the data in the database. There are several options to create statistics.

Automatic Statistics Gathering
The recommended approach to gathering statistics is to allow Oracle to automatically gather the statistics. Oracle gathers statistics on all database objects automatically and maintains those statistics in a regularly-scheduled maintenance job. Automated statistics collection eliminates many of the manual tasks associated with managing the query optimizer, and significantly reduces the chances of getting poor execution plans because of missing or stale statistics.

GATHER_STATS_JOB
Optimizer statistics are automatically gathered with the job GATHER_STATS_JOB. This job gathers statistics on all objects in the database which have missing statistics and stale statistics.

This job is created automatically at database creation time and is managed by the Scheduler. The Scheduler runs this job when the maintenance window is opened. By default, the maintenance window opens every night from 10 P.M. to 6 A.M. and all day on weekends. The stop_on_window_close attribute controls whether the GATHER_STATS_JOB continues when the maintenance window closes. The default setting for the stop_on_window_close attribute is TRUE, causing Scheduler to terminate GATHER_STATS_JOB when the maintenance window closes. The remaining objects are then processed in the next maintenance window.

The GATHER_STATS_JOB job gathers optimizer statistics by calling the DBMS_STATS.GATHER_DATABASE_STATS_JOB_PROC procedure. The GATHER_DATABASE_STATS_JOB_PROC procedure collects statistics on database objects when the object has no previously gathered statistics or the existing statistics are stale because the underlying object has been modified significantly (more than 10% of the rows).The GATHER_DATABASE_STATS_JOB_PROC is an internal procedure, but its operates in a very similar fashion to the DBMS_STATS.GATHER_DATABASE_STATS procedure using the GATHER AUTO option. The primary difference is that the GATHER_DATABASE_STATS_JOB_PROC procedure prioritizes the database objects that require statistics, so that those objects which most need updated statistics are processed first. This ensures that the most-needed statistics are gathered before the maintenance window closes.

Enabling Automatic Statistics Gathering
Automatic statistics gathering is enabled by default when a database is created, or when a database is upgraded from an earlier database release. We can verify that the job exists by viewing the DBA_SCHEDULER_JOBS view:
SQL> SELECT * FROM DBA_SCHEDULER_JOBS WHERE JOB_NAME = 'GATHER_STATS_JOB';

In situations when you want to disable automatic statistics gathering, then disable the GATHER_STATS_JOB as follows:
BEGIN
DBMS_SCHEDULER.DISABLE('GATHER_STATS_JOB');
END;
/

Automatic statistics gathering relies on the modification monitoring feature. If this feature is disabled, then the automatic statistics gathering job is not able to detect stale statistics. This feature is enabled when the STATISTICS_LEVEL parameter is set to TYPICAL (default) or ALL.

When to Use Manual Statistics
Automatic statistics gathering should be sufficient for most database objects which are being modified at a moderate speed. However, there are cases where automatic statistics gathering may not be adequate. Because the automatic statistics gathering runs during an overnight batch window, the statistics on tables which are significantly modified during the day may become stale. There are typically two types of such objects:
  • Volatile tables that are being deleted or truncated and rebuilt during the course of the day.
  • Objects which are the target of large bulk loads which add 10% or more to the object's total size.
For highly volatile tables, there are two approaches:
  • The statistics on these tables can be set to NULL. When Oracle encounters a table with no statistics, Oracle dynamically gathers the necessary statistics as part of query optimization. This dynamic sampling feature is controlled by the OPTIMIZER_DYNAMIC_SAMPLING parameter, and this parameter should be set to a value of 2 or higher. The default value is 2. The statistics can set to NULL by deleting and then locking the statistics:
  • BEGIN
  • DBMS_STATS.DELETE_TABLE_STATS('SCOTT','EMP');
  • DBMS_STATS.LOCK_TABLE_STATS('SCOTT','EMP');
  • END;
  • /
  • The statistics on these tables can be set to values that represent the typical state of the table. We should gather statistics on the table when the tables have a representative number of rows, and then lock the statistics.
This is more effective than the GATHER_STATS_JOB, because any statistics generated on the table during the overnight batch window may not be the most appropriate statistics for the daytime workload.For tables which are being bulk-loaded, the statistics-gathering procedures should be run on those tables immediately following the load process, preferably as part of the same script or job that is running the bulk load.

For external tables, statistics are not collected during GATHER_SCHEMA_STATS, GATHER_DATABASE_STATS, and automatic statistics gathering processing. However, you can collect statistics on an individual external table using GATHER_TABLE_STATS. Sampling on external tables is not supported so the ESTIMATE_PERCENT option should be explicitly set to NULL. Because data manipulation is not allowed against external tables, it is sufficient to analyze external tables when the corresponding file changes.

If the monitoring feature is disabled by setting STATISTICS_LEVEL to BASIC, automatic statistics gathering cannot detect stale statistics. In this case statistics need to be manually gathered.

Another area in which statistics need to be manually gathered is the system statistics. These statistics are not automatically gathered.

Statistics on fixed objects, such as the dynamic performance tables, need to be manually collected using GATHER_FIXED_OBJECTS_STATS procedure. Fixed objects record current database activity; statistics gathering should be done when database has representative activity.

Whenever statistics in dictionary are modified, old versions of statistics are saved automatically for future restoring. Statistics can be restored using RESTORE procedures of DBMS_STATS package.

In some cases, we may want to prevent any new statistics from being gathered on a table or schema by the DBMS_STATS_JOB process, such as highly volatile tables. In those cases, the DBMS_STATS package provides procedures for locking the statistics for a table or schema.

Scheduling Stats
Scheduling the gathering of statistics using DBMS_JOB is the easiest way to make sure they are always up to date:

SET SERVEROUTPUT ON
DECLARE
l_job NUMBER;
BEGIN
DBMS_JOB.submit(l_job, 'BEGIN DBMS_STATS.gather_schema_stats(''SCOTT''); END;',
SYSDATE,'SYSDATE + 1');
COMMIT;
DBMS_OUTPUT.put_line('Job: ' || l_job);
END;
/

The above code sets up a job to gather statistics for SCOTT for the current time every day. We can list the current jobs on the server using the DBA_JOBS and DBA_JOBS_RUNNING views.

Existing jobs can be removed using:
EXEC DBMS_JOB.remove(X);
COMMIT;
Where 'X' is the number of the job to be removed.

Manual Statistics Gathering
If you choose not to use automatic statistics gathering, then you need to manually collect statistics in all schemas, including system schemas. If the data in the database changes regularly, you also need to gather statistics regularly to ensure that the statistics accurately represent characteristics of your database objects.

The preferred tool for collecting statistics used to be the ANALYZE command. Over the past few releases, the DBMS_STATS package in the PL/SQL Packages and Types reference has taken over the statistics functions, and left the ANALYZE command with more mundane 'health check' work like analyzing chained rows.

Analyze Statement
The ANALYZE statement can be used to gather statistics for a specific table, index or cluster. The statistics can be computed exactly, or estimated based on a specific number of rows, or a percentage of rows.

The ANALYZE command is available for all versions of Oracle, however to obtain faster and better statistics use the procedures supplied - in 7.3.4 and 8.0 DBMS_UTILITY.ANALYZE_SCHEMA, and in 8i and above - DBMS_STATS.GATHER_SCHEMA_STATS.

The analyze table can be used to create statistics for 1 table, index or cluster.
Syntax:
ANALYZE table tableName {compute|estimate|delete} statistics options
ANALYZE index indexName {compute|estimate|delete} statistics options
ANALYZE cluster clusterName {compute|estimate|delete} statistics options

ANALYZE TABLE emp COMPUTE STATISTICS;
ANALYZE TABLE emp COMPUTE STATISTICS FOR COLUMNS sal SIZE 10;
ANALYZE TABLE emp PARTITION (p1) COMPUTE STATISTICS;
ANALYZE INDEX emp_pk COMPUTE STATISTICS;

ANALYZE TABLE emp ESTIMATE STATISTICS;
ANALYZE TABLE emp ESTIMATE STATISTICS SAMPLE 500 ROWS;
ANALYZE TABLE emp ESTIMATE STATISTICS SAMPLE 15 PERCENT;
ANALYZE TABLE emp ESTIMATE STATISTICS FOR ALL COLUMNS;

ANALYZE TABLE emp DELETE STATISTICS;
ANALYZE INDEX emp_pk DELETE STATISTICS;

ANALYZE TABLE emp VALIDATE STRUCTURE CASCADE;
ANALYZE INDEX emp_pk VALIDATE STRUCTURE;
ANALYZE CLUSTER emp_custs VALIDATE STRUCTURE CASCADE;

ANALYZE TABLE emp VALIDATE REF UPDATE;
ANALYZE TABLE emp LIST CHAINED ROWS INTO cr;

Note: Do not use the COMPUTE and ESTIMATE clauses of ANALYZE statement to collect optimizer statistics. These clauses are supported solely for backward compatibility and may be removed in a future release. The DBMS_STATS package collects a broader, more accurate set of statistics, and gathers statistics more efficiently.

We may continue to use ANALYZE statement to for other purposes not related to optimizer statistics collection:

  • To use the VALIDATE or LIST CHAINED ROWS clauses
  • To collect information on free list blocks
  • To sample a number (rather than a percentage) of rows
DBMS_UTILITY
The DBMS_UTILITY package can be used to gather statistics for a whole schema or database. With DBMS_UTILITY.ANALYZE_SCHEMA you can gather all the statistics for all the tables, clusters and indexes of a schema. Both methods follow the same format as the ANALYZE statement:

EXEC DBMS_UTILITY.ANALYZE_SCHEMA('SCOTT','COMPUTE');
EXEC DBMS_UTILITY.ANALYZE_SCHEMA('SCOTT','ESTIMATE',ESTIMATE_ROWS=>100);
EXEC DBMS_UTILITY.ANALYZE_SCHEMA('SCOTT','ESTIMATE',ESTIMATE_PERCENT=>25);
EXEC DBMS_UTILITY.ANALYZE_SCHEMA('SCOTT','DELETE');EXEC DBMS_UTILITY.ANALYZE_DATABASE('COMPUTE');
EXEC DBMS_UTILITY.ANALYZE_DATABASE('ESTIMATE',ESTIMATE_ROWS=>100);
EXEC DBMS_UTILITY.ANALYZE_DATABASE('ESTIMATE',ESTIMATE_PERCENT=>15);

DBMS_STATS
The DBMS_STATS package was introduced in Oracle 8i and is Oracles preferred method of gathering object statistics. Oracle list a number of benefits to using it including parallel execution, long term storage of statistics and transfer of statistics between servers. This PL/SQL package is also used to modify, view, export, import, and delete statistics. It follows a similar format to the other methods.

The DBMS_STATS package can gather statistics on table and indexes, and well as individual columns and partitions of tables. It does not gather cluster statistics; however, we can use DBMS_STATS to gather statistics on the individual tables instead of the whole cluster.

When we generate statistics for a table, column, or index, if the data dictionary already contains statistics for the object, then Oracle updates the existing statistics. The older statistics are saved and can be restored later if necessary.

Procedures in the DBMS_STATS package for gathering statistics on database objects:


Procedure Collects
GATHER_INDEX_STATS Index statistics
GATHER_TABLE_STATS Table, column, and index statistics
GATHER_SCHEMA_STATS Statistics for all objects in a schema
GATHER_DICTIONARY_STATSStatistics for all dictionary objects
GATHER_DATABASE_STATS Statistics for all objects in a database

EXEC DBMS_STATS.GATHER_DATABASE_STATS;
EXEC DBMS_STATS.GATHER_DATABASE_STATS(ESTIMATE_PERCENT=>20);

EXEC DBMS_STATS.GATHER_SCHEMA_STATS(ownname, estimate_percent, block_sample, method_opt, degree, granularity, cascade, stattab, statid, options, statown, no_invalidate, gather_temp, gather_fixed);
EXEC DBMS_STATS.GATHER_SCHEMA_STATS('SCOTT');
EXEC DBMS_STATS.GATHER_SCHEMA_STATS(OWNNAME=>'MRT');
EXEC DBMS_STATS.GATHER_SCHEMA_STATS('SCOTT',ESTIMATE_PERCENT=>10);

EXEC DBMS_STATS.GATHER_TABLE_STATS('SCOTT','EMP');
EXEC DBMS_STATS.GATHER_TABLE_STATS('SCOTT','EMP',ESTIMATE_PERCENT=>15);

EXEC DBMS_STATS.GATHER_INDEX_STATS('SCOTT','EMP_PK');
EXEC DBMS_STATS.GATHER_INDEX_STATS('SCOTT','EMP_PK',ESTIMATE_PERCENT=>15);

This package also gives us the ability to delete statistics:
EXEC DBMS_STATS.DELETE_DATABASE_STATS;
EXEC DBMS_STATS.DELETE_SCHEMA_STATS('SCOTT');
EXEC DBMS_STATS.DELETE_TABLE_STATS('SCOTT','EMP');
EXEC DBMS_STATS.DELETE_INDEX_STATS('SCOTT','EMP_PK');
EXEC DBMS_STATS.DELETE_PENDING_STATS('SH','SALES');

EXEC DBMS_STATS.GATHER_SCHEMA_STATS(OWNNAME=>'"DWH"',OPTIONS=>'GATHER AUTO');
EXEC DBMS_STATS.GATHER_SCHEMA_STATS(OWNNAME=>'PERFSTAT',CASCADE=>TRUE);

When gathering statistics on system schemas, we can use the procedure DBMS_STATS.GATHER_DICTIONARY_STATS. This procedure gathers statistics for all system schemas, including SYS and SYSTEM, and other optional schemas, such as CTXSYS and DRSYS.

Statistics Gathering Using Sampling
The statistics-gathering operations can utilize sampling to estimate statistics. Sampling is an important technique for gathering statistics. Gathering statistics without sampling requires full table scans and sorts of entire tables. Sampling minimizes the resources necessary to gather statistics.

Sampling is specified using the ESTIMATE_PERCENT argument to the DBMS_STATS procedures. While the sampling percentage can be set to any value, Oracle Corporation recommends setting the ESTIMATE_PERCENT parameter of the DBMS_STATS gathering procedures to DBMS_STATS.AUTO_SAMPLE_SIZE to maximize performance gains while achieving necessary statistical accuracy. AUTO_SAMPLE_SIZE lets Oracle determine the best sample size necessary for good statistics, based on the statistical property of the object. Because each type of statistics has different requirements, the size of the actual sample taken may not be the same across the table, columns, or indexes. For example, to collect table and column statistics for all tables in the SCOTT schema with auto-sampling, you could use:
EXEC DBMS_STATS.GATHER_SCHEMA_STATS('SCOTT',DBMS_STATS.AUTO_SAMPLE_SIZE);
EXEC DBMS_STATS.GATHER_SCHEMA_STATS(OWNNAME=>'SCOTT', ESTIMATE_PERCENT=>DBMS_STATS.AUTO_SAMPLE_SIZE);

When the ESTIMATE_PERCENT parameter is manually specified, the DBMS_STATS gathering procedures may automatically increase the sampling percentage if the specified percentage did not produce a large enough sample. This ensures the stability of the estimated values by reducing fluctuations.
EXEC DBMS_STATS.GATHER_SCHEMA_STATS(OWNNAME=>'SCOTT',ESTIMATE_PERCENT=>25);

Parallel Statistics Gathering
The statistics-gathering operations can run either serially or in parallel. The degree of parallelism can be specified with the DEGREE argument to the DBMS_STATS gathering procedures. Parallel statistics gathering can be used in conjunction with sampling. Oracle recommends setting the DEGREE parameter to DBMS_STATS.AUTO_DEGREE. This setting allows Oracle to choose an appropriate degree of parallelism based on the size of the object and the settings for the parallel-related init.ora parameters.

Note that certain types of index statistics are not gathered in parallel, including cluster indexes, domain indexes, and bitmap join indexes.

EXEC DBMS_STATS.GATHER_SCHEMA_STATS(OWNNAME=>'SCOTT', ESTIMATE_PERCENT=> DBMS_STATS.AUTO_SAMPLE_SIZE, METHOD_OPT=> 'FOR ALL COLUMNS SIZE AUTO',DEGREE=>7);
EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>’DWH, METHOD_OPT=>’FOR ALL COLUMNS SIZE AUTO’,DEGREE=>6,ESTIMATE_PERCENT=>5, NO_INVALIDATE=>FALSE);

Statistics on Partitioned Objects
For partitioned tables and indexes, DBMS_STATS can gather separate statistics for each partition, as well as global statistics for the entire table or index. Similarly, for composite partitioning, DBMS_STATS can gather separate statistics for subpartitions, partitions, and the entire table or index. The type of partitioning statistics to be gathered is specified in the GRANULARITY argument to the DBMS_STATS gathering procedures.

Depending on the SQL statement being optimized, the optimizer can choose to use either the partition (or subpartition) statistics or the global statistics. Both types of statistics are important for most applications, and Oracle recommends setting the GRANULARITY parameter to AUTO to gather both types of partition statistics.

Column Statistics and Histograms
When gathering statistics on a table, DBMS_STATS gathers information about the data distribution of the columns within the table. The most basic information about the data distribution is the maximum value and minimum value of the column. However, this level of statistics may be insufficient for the optimizer's needs if the data within the column is skewed. For skewed data distributions, histograms can also be created as part of the column statistics to describe the data distribution of a given column.

Histograms are specified using the METHOD_OPT argument of the DBMS_STATS gathering procedures. Oracle recommends setting the METHOD_OPT to FOR ALL COLUMNS SIZE AUTO. With this setting, Oracle automatically determines which columns require histograms and the number of buckets (size) of each histogram. You can also manually specify which columns should have histograms and the size of each histogram.

EXEC DBMS_STATS.GATHER_TABLE_STATS('SH','SALES',method_opt=>'FOR COLUMNS (empno, deptno)');
EXEC DBMS_STATS.GATHER_TABLE_STATS('SH','SALES',method_op =>'FOR COLUMNS (sal+comm)');

Note: If you need to remove all rows from a table when using DBMS_STATS, use TRUNCATE instead of dropping and re-creating the same table. When a table is dropped, workload information used by the auto-histogram gathering feature and saved statistics history used by the RESTORE_*_STATS procedures will be lost. Without this data, these features will not function properly.

Determining Stale Statistics
Statistics must be regularly gathered on database objects as those database objects are modified over time. In order to determine whether or not given database object needs new database statistics, Oracle provides a table monitoring facility. This monitoring is enabled by default when STATISTICS_LEVEL is set to TYPICAL or ALL. Monitoring tracks the approximate number of INSERTs, UPDATEs, and DELETEs for that table, as well as whether the table has been truncated, since the last time statistics were gathered. The information about changes of tables can be viewed in the USER_TAB_MODIFICATIONS view. Following a data-modification, there may be a few minutes delay while Oracle propagates the information to this view. Use the DBMS_STATS.FLUSH_DATABASE_MONITORING_INFO procedure to immediately reflect the outstanding monitored information kept in the memory.

-- Table level
ALTER TABLE emp NOMONITORING;
ALTER TABLE emp MONITORING;

-- Schema level
EXEC DBMS_STATS.alter_schema_tab_monitoring('SCOTT', TRUE);
EXEC DBMS_STATS.alter_schema_tab_monitoring('SCOTT', FALSE);

-- Database level
EXEC DBMS_STATS.alter_database_tab_monitoring(TRUE);
EXEC DBMS_STATS.alter_database_tab_monitoring(FALSE);

The GATHER_DATABASE_STATS or GATHER_SCHEMA_STATS procedures gather new statistics for tables with stale statistics when the OPTIONS parameter is set to GATHER STALE or GATHER AUTO. If a monitored table has been modified more than 10%, then these statistics are considered stale and gathered again.

User-defined Statistics
You can create user-defined optimizer statistics to support user-defined indexes and functions. When you associate a statistics type with a column or domain index, Oracle calls the statistics collection method in the statistics type whenever statistics are gathered for database objects.

You should gather new column statistics on a table after creating a function-based index, to allow Oracle to collect column statistics equivalent information for the expression. This is done by calling the statistics-gathering procedure with the METHOD_OPT argument set to FOR ALL HIDDEN COLUMNS.

When to Gather Statistics
When gathering statistics manually, we not only need to determine how to gather statistics, but also when and how often to gather new statistics.

For an application in which tables are being incrementally modified, we may only need to gather new statistics every week or every month. The simplest way to gather statistics in these environments is to use a script or job scheduling tool to regularly run the GATHER_SCHEMA_STATS and GATHER_DATABASE_STATS procedures. The frequency of collection intervals should balance the task of providing accurate statistics for the optimizer against the processing overhead incurred by the statistics collection process.

For tables which are being substantially modified in batch operations, such as with bulk loads, statistics should be gathered on those tables as part of the batch operation. The DBMS_STATS procedure should be called as soon as the load operation completes.

For partitioned tables, there are often cases in which only a single partition is modified. In those cases, statistics can be gathered only on those partitions rather than gathering statistics for the entire table. However, gathering global statistics for the partitioned table may still be necessary.

Transferring Statistics between databases
It is possible to transfer statistics between servers allowing consistent execution plans between servers with varying amounts of data. First the statistics must be collected into a statistics table. It can be very handy to use production statistics on development database, so that we can forecast the optimizer behaviour.

Statistics can be exported and imported from the data dictionary to user-owned tables. This enables you to create multiple versions of statistics for the same schema. It also enables you to copy statistics from one database to another database. You may want to do this to copy the statistics from a production database to a scaled-down test database.

Note: Exporting and importing statistics is a distinct concept from the EXP and IMP utilities of the database. The DBMS_STATS export and import packages do utilize IMP and EXP dumpfiles.

Before exporting statistics, you first need to create a table for holding the statistics. This statistics table is created using the procedure DBMS_STATS.CREATE_STAT_TABLE. After this table is created, then you can export statistics from the data dictionary into your statistics table using the DBMS_STATS.EXPORT_*_STATS procedures. The statistics can then be imported using the DBMS_STATS.IMPORT_*_STATS procedures.

Note that the optimizer does not use statistics stored in a user-owned table. The only statistics used by the optimizer are the statistics stored in the data dictionary. In order to have the optimizer use the statistics in user-owned tables, you must import those statistics into the data dictionary using the statistics import procedures.

In order to move statistics from one database to another, you must first export the statistics on the first database, then copy the statistics table to the second database, using the EXP and IMP utilities or other mechanisms, and finally import the statistics into the second database.

Note: The EXP and IMP utilities export and import optimizer statistics from the database along with the table. One exception is that statistics are not exported with the data if a table has columns with system-generated names.

In the following example the statistics for the APPSCHEMA user are collected into a new table, STATS_TAB, which is owned by DBASCHEMA:

1. Create the statistics table.
EXEC DBMS_STATS.CREATE_STAT_TABLE(ownname =>'SCHEMA_NAME', stat_tab => 'STATS_TABLE', tblspace => 'STATS_TABLESPACE');

SQL> EXEC DBMS_STATS.CREATE_STAT_TABLE('DBASCHEMA','STATS_TAB');

2. Export statistics to statistics table.
EXEC DBMS_STATS.EXPORT_SCHEMA_STATS('ORIGINAL_SCHEMA', 'STATS_TABLE', NULL, 'STATS_TABLE_OWNER');

SQL> EXEC DBMS_STATS.EXPORT_SCHEMA_STATS('APPSCHEMA','STATS_TAB',NULL, 'DBASCHEMA');
(or)
EXEC DBMS_STATS.EXPORT_SCHEMA_STATS(OWNNAME=>'APPSCHEMA', STATTAB=>'STAT_TAB',STATID=>'030610',STATOWN=>'DBASCHEMA');

3. This table can be transferred to another server using any one of the below methods.
SQLPlus Copy:
SQL> insert into dbaschema.stats_tab select * from dbaschema.stats_tab@source;

Export/Import:
exp file=stats.dmp log=stats_exp.log tables=dbaschema.stats_tab
imp file=stats.dmp log=stats_imp.log

Data Pump:
expdp directory=dpump_dir dumpfile=stats.dmp logfile=stats_exp.log tables= dbaschema.stats_tab
impdp directory=dpump_dir dumpfile=stats.dmp logfile=stats_imp.log

4. Import statistics into the data dictionary.
EXEC DBMS_STATS.IMPORT_SCHEMA_STATS('NEW_SCHEMA', 'STATS_TABLE', NULL, 'SYSTEM');

SQL> EXEC DBMS_STATS.IMPORT_SCHEMA_STATS('APPSCHEMA','STATS_TAB',NULL, 'DBASCHEMA');
(or)
EXEC DBMS_STATS.IMPORT_SCHEMA_STATS(OWNNAME=>'APPSCHEMA', STATTAB=>'STAT_TAB',STATID=>'030610',STATOWN=>'DBASCHEMA');

5. Drop the statistics table (optional step).
EXEC DBMS_STATS.DROP_STAT_TABLE('SYSTEM','STATS_TABLE');
SQL> EXEC DBMS_STATS.DROP_STAT_TABLE('DBASCHEMA','STATS_TAB');

Getting top-quality stats
Because Oracle9i schema statistics work best with external system load, we like to schedule a valid sample (using dbms_stats.auto_sample_size) during regular working hours. For example, here we refresh statistics using the "auto" option which works with the table monitoring facility to only re-analyze those Oracle tables that have experienced more than a 10% change in row content:
begin
dbms_stats.gather_schema_stats(ownname => 'SCOTT',
estimate_percent => dbms_stats.auto_sample_size,
method_opt => 'for all columns size auto',
degree => 7);
end;
/

Optimizer Hints
ALL_ROWS
FIRST_ROWS
FIRST_n_ROWS
APPEND
FULL
INDEX
DYNAMIC_SAMPLING
BYPASS_RECURSIVE_CHECK
BYPASS_RECURSIVE_CHECK APPEND


Examples:
SELECT /*+ ALL_ROWS */ empid, last_name, sal FROM emp;
SELECT /*+ FIRST_ROWS */ * FROM emp;
SELECT /*+ FIRST_20_ROWS */ * FROM emp;
SELECT /*+ FIRST_ROWS(100) */ empid, last_name, sal FROM emp;

System Statistics
System statistics describe the system's hardware characteristics, such as I/O and CPU performance and utilization, to the query optimizer. When choosing an execution plan, the optimizer estimates the I/O and CPU resources required for each query. System statistics enable the query optimizer to more accurately estimate I/O and CPU costs, enabling the query optimizer to choose a better execution plan.

When Oracle gathers system statistics, it analyzes system activity in a specified time period (workload statistics) or simulates a workload (noworkload statistics). The statistics are collected using the DBMS_STATS.GATHER_SYSTEM_STATS procedure. Oracle highly recommends that you gather system statistics.

Note: You must have DBA privileges or GATHER_SYSTEM_STATISTICS role to update dictionary system statistics.

EXEC DBMS_STATS.GATHER_SYSTEM_STATS (interval=>720, stattab=>'mystats', statid=>'OLTP');
EXEC DBMS_STATS.IMPORT_SYSTEM_STATS('mystats', 'OLTP');Unlike table, index, or column statistics, Oracle does not invalidate already parsed SQL statements when system statistics get updated. All new SQL statements are parsed using new statistics.

These options better facilitate the gathering process to the physical database and workload: when workload system statistics are gathered, noworkload system statistics will be ignored. Noworkload system statistics are initialized to default values at the first database startup.

Workload Statistics
Workload statistics, introduced in Oracle 9i, gather single and multiblock read times, mbrc, CPU speed (cpuspeed), maximum system throughput, and average slave throughput. The sreadtim, mreadtim, and mbrc are computed by comparing the number of physical sequential and random reads between two points in time from the beginning to the end of a workload. These values are implemented through counters that change when the buffer cache completes synchronous read requests. Since the counters are in the buffer cache, they include not only I/O delays, but also waits related to latch contention and task switching. Workload statistics thus depend on the activity the system had during the workload window. If system is I/O bound—both latch contention and I/O throughput—it will be reflected in the statistics and will therefore promote a less I/O intensive plan after the statistics are used. Furthermore, workload statistics gathering does not generate additional overhead.

In Oracle release 9.2, maximum I/O throughput and average slave throughput were added to set a lower limit for a full table scan (FTS).
To gather workload statistics, either:

  • Run the dbms_stats.gather_system_stats('start') procedure at the beginning of the workload window, then the dbms_stats.gather_system_stats('stop') procedure at the end of the workload window.
  • Run dbms_stats.gather_system_stats('interval', interval=>N) where N is the number of minutes when statistics gathering will be stopped automatically.
To delete system statistics, run dbms_stats.delete_system_stats(). Workload statistics will be deleted and reset to the default noworkload statistics.

Noworkload Statistics
Noworkload statistics consist of I/O transfer speed, I/O seek time, and CPU speed (cpuspeednw). The major difference between workload statistics and noworkload statistics lies in the gathering method.

Noworkload statistics gather data by submitting random reads against all data files, while workload statistics uses counters updated when database activity occurs. isseektim represents the time it takes to position the disk head to read data. Its value usually varies from 5 ms to 15 ms, depending on disk rotation speed and the disk or RAID specification. The I/O transfer speed represents the speed at which one operating system process can read data from the I/O subsystem. Its value varies greatly, from a few MBs per second to hundreds of MBs per second. Oracle uses relatively conservative default settings for I/O transfer speed.

In Oracle 10g, Oracle uses noworkload statistics and the CPU cost model by default. The values of noworkload statistics are
initialized to defaults at the first instance startup:

ioseektim = 10ms
iotrfspeed = 4096 bytes/ms
cpuspeednw = gathered value, varies based on system

If workload statistics are gathered, noworkload statistics will be ignored and Oracle will use workload statistics instead. To gather noworkload statistics, run dbms_stats.gather_system_stats() with no arguments. There will be an overhead on the I/O system during the gathering process of noworkload statistics. The gathering process may take from a few seconds to several minutes, depending on I/O performance and database size.

The information is analyzed and verified for consistency. In some cases, the value of noworkload statistics may remain its default value. In such cases, repeat the statistics gathering process or set the value manually to values that the I/O system has according to its specifications by using the dbms_stats.set_system_stats procedure.

Managing Statistics
Restoring Previous Versions of Statistics
Whenever statistics in dictionary are modified, old versions of statistics are saved automatically for future restoring. Statistics can be restored using RESTORE procedures of DBMS_STATS package. These procedures use a time stamp as an argument and restore statistics as of that time stamp. This is useful in case newly collected statistics leads to some sub-optimal execution plans and the administrator wants to revert to the previous set of statistics.There are dictionary views that display the time of statistics modifications. These views are useful in determining the time stamp to be used for statistics restoration.

  • Catalog view DBA_OPTSTAT_OPERATIONS contain history of statistics operations performed at schema and database level using DBMS_STATS.
  • The views *_TAB_STATS_HISTORY views (ALL, DBA, or USER) contain a history of table statistics modifications.
The old statistics are purged automatically at regular intervals based on the statistics history retention setting and the time of the recent analysis of the system. Retention is configurable using the ALTER_STATS_HISTORY_RETENTION procedure of DBMS_STATS. The default value is 31 days, which means that you would be able to restore the optimizer statistics to any time in last 31 days.

Automatic purging is enabled when STATISTICS_LEVEL parameter is set to TYPICAL or ALL. If automatic purging is disabled, the old versions of statistics need to be purged manually using the PURGE_STATS procedure.

The other DBMS_STATS procedures related to restoring and purging statistics include:
  • PURGE_STATS: This procedure can be used to manually purge old versions beyond a time stamp.
  • GET_STATS_HISTORY_RENTENTION: This function can be used to get the current statistics history retention value.
  • GET_STATS_HISTORY_AVAILABILITY: This function gets the oldest time stamp where statistics history is available. Users cannot restore statistics to a time stamp older than the oldest time stamp.
When restoring previous versions of statistics, the following limitations apply:
  • RESTORE procedures cannot restore user-defined statistics.
  • Old versions of statistics are not stored when the ANALYZE command has been used for collecting statistics.
Note: If you need to remove all rows from a table when using DBMS_STATS, use TRUNCATE instead of dropping and re-creating the same table. When a table is dropped, workload information used by the auto-histogram gathering feature and saved statistics history used by the RESTORE_*_STATS procedures will be lost. Without this data, these features will not function properly.

Restoring Statistics versus Importing or Exporting Statistics
The functionality for restoring statistics is similar in some respects to the functionality of importing and exporting statistics. In general, you should use the restore capability when:
  • You want to recover older versions of the statistics. For example, to restore the optimizer behaviour to an earlier date.
  • You want the database to manage the retention and purging of statistics histories.
You should use EXPORT/IMPORT_*_STATS procedures when:
  • You want to experiment with multiple sets of statistics and change the values back and forth.
  • You want to move the statistics from one database to another database. For example, moving statistics from a production system to a test system.
  • You want to preserve a known set of statistics for a longer period of time than the desired retention date for restoring statistics.

Locking Statistics for a Table or Schema
Statistics for a table or schema can be locked. Once statistics are locked, no modifications can be made to those statistics until the statistics have been unlocked. These locking procedures are useful in a static environment in which you want to guarantee that the statistics never change.

The DBMS_STATS package provides two procedures for locking and two procedures for unlocking statistics:
  • LOCK_SCHEMA_STATS
  • LOCK_TABLE_STATS
  • UNLOCK_SCHEMA_STATS
  • UNLOCK_TABLE_STATS
EXEC DBMS_STATS.LOCK_SCHEMA_STATS('AP');
EXEC DBMS_STATS.UNLOCK_SCHEMA_STATS('AP');

Setting Statistics
We can set table, column, index, and system statistics using the SET_*_STATISTICS procedures. Setting statistics in the manner is not recommended, because inaccurate or inconsistent statistics can lead to poor performance.

Dynamic Sampling
The purpose of dynamic sampling is to improve server performance by determining more accurate estimates for predicate selectivity and statistics for tables and indexes. The statistics for tables and indexes include table block counts, applicable index block counts, table cardinalities, and relevant join column statistics. These more accurate estimates allow the optimizer to produce better performing plans.

You can use dynamic sampling to:
  • Estimate single-table predicate selectivities when collected statistics cannot be used or are likely to lead to significant errors in estimation.
  • Estimate statistics for tables and relevant indexes without statistics.
  • Estimate statistics for tables and relevant indexes whose statistics are too out of date to trust.
This dynamic sampling feature is controlled by the OPTIMIZER_DYNAMIC_SAMPLING parameter. For dynamic sampling to automatically gather the necessary statistics, this parameter should be set to a value of 2(default) or higher.

The primary performance attribute is compile time. Oracle determines at compile time whether a query would benefit from dynamic sampling. If so, a recursive SQL statement is issued to scan a small random sample of the table's blocks, and to apply the relevant single table predicates to estimate predicate selectivities. The sample cardinality can also be used, in some cases, to estimate table cardinality. Any relevant column and index statistics are also collected. Depending on the value of the OPTIMIZER_DYNAMIC_SAMPLING initialization parameter, a certain number of blocks are read by the dynamic sampling query.

For a query that normally completes quickly (in less than a few seconds), we will not want to incur the cost of dynamic sampling. However, dynamic sampling can be beneficial under any of the following conditions:
  • A better plan can be found using dynamic sampling.
  • The sampling time is a small fraction of total execution time for the query.
  • The query will be executed many times.
Dynamic sampling can be applied to a subset of a single table's predicates and combined with standard selectivity estimates of predicates for which dynamic sampling is not done.

We control dynamic sampling with the OPTIMIZER_DYNAMIC_SAMPLING parameter, which can be set to a value from 0 to 10. The default is 2.
  • A value of 0 means dynamic sampling will not be done.
  • Increasing the value of the parameter results in more aggressive application of dynamic sampling, in terms of both the type of tables sampled (analyzed or unanalyzed) and the amount of I/O spent on sampling.
Dynamic sampling is repeatable if no rows have been inserted, deleted, or updated in the table being sampled. The parameter OPTIMIZER_FEATURES_ENABLE turns off dynamic sampling if set to a version prior to 9.2.0.

Dynamic Sampling Levels
The sampling levels are as follows if the dynamic sampling level used is from a cursor hint or from the OPTIMIZER_DYNAMIC_SAMPLING initialization parameter:
  • Level 0: Do not use dynamic sampling.
  • Level 1: Sample all tables that have not been analyzed if the following criteria are met: (1) there is at least 1 unanalyzed table in the query; (2) this unanalyzed table is joined to another table or appears in a subquery or non-mergeable view; (3) this unanalyzed table has no indexes; (4) this unanalyzed table has more blocks than the number of blocks that would be used for dynamic sampling of this table. The number of blocks sampled is the default number of dynamic sampling blocks (32).
  • Level 2: Apply dynamic sampling to all unanalyzed tables. The number of blocks sampled is two times the default number of dynamic sampling blocks.
  • Level 3: Apply dynamic sampling to all tables that meet Level 2 criteria, plus all tables for which standard selectivity estimation used a guess for some predicate that is a potential dynamic sampling predicate. The number of blocks sampled is the default number of dynamic sampling blocks. For unanalyzed tables, the number of blocks sampled is two times the default number of dynamic sampling blocks.
  • Level 4: Apply dynamic sampling to all tables that meet Level 3 criteria, plus all tables that have single-table predicates that reference 2 or more columns. The number of blocks sampled is the default number of dynamic sampling blocks. For unanalyzed tables, the number of blocks sampled is two times the default number of dynamic sampling blocks.
  • Levels 5, 6, 7, 8, and 9: Apply dynamic sampling to all tables that meet the previous level criteria using 2, 4, 8, 32, or 128 times the default number of dynamic sampling blocks respectively.
  • Level 10: Apply dynamic sampling to all tables that meet the Level 9 criteria using all blocks in the table.
The sampling levels are as follows if the dynamic sampling level for a table is set using the DYNAMIC_SAMPLING optimizer hint:
  • Level 0: Do not use dynamic sampling.
  • Level 1: The number of blocks sampled is the default number of dynamic sampling blocks (32).
  • Levels 2, 3, 4, 5, 6, 7, 8, and 9: The number of blocks sampled is 2, 4, 8, 16, 32, 64, 128, or 256 times the default number of dynamic sampling blocks respectively.
  • Level 10: Read all blocks in the table.
Handling Missing Statistics
When Oracle encounters a table with missing statistics, Oracle dynamically gathers the necessary statistics needed by the optimizer. However, for certain types of tables, Oracle does not perform dynamic sampling. These include remote tables and external tables. In those cases and also when dynamic sampling has been disabled, the optimizer uses default values for its statistics.

Default Table Values When Statistics Are Missing

Table Statistic Default Value Used by Optimizer
Cardinality num_of_blocks * (block_size - cache_layer) / avg_row_len
Average row length 100 bytes
Number of blocks 100 or actual value based on the extent map
Remote cardinality 2000 rows
Remote average row length 100 bytes
Default Index Values When Statistics Are Missing

Index Statistic Default Value Used by Optimizer
Levels 1
Leaf blocks 25
Leaf blocks/key 1
Data blocks/key 1
Distinct keys 100
Clustering factor 800

Viewing Statistics
Statistics on Tables, Indexes and Columns
Statistics on tables, indexes, and columns are stored in the data dictionary. To view statistics in the data dictionary, query the appropriate data dictionary view (USER, ALL, or DBA). These DBA_* views include the following:

· DBA_TAB_STATISTICS
· ALL_TAB_STATISTICS
· USER_TAB_STATISTICS
· DBA_TAB_COL_STATISTICS
· ALL_TAB_COL_STATISTICS
· USER_TAB_COL_STATISTICS
· DBA_TAB_HISTOGRAMS
· ALL_TAB_HISTOGRAMS
· USER_TAB_HISTOGRAMS

· DBA_TABLES
· DBA_OBJECT_TABLES
· DBA_TAB_HISTOGRAMS
· DBA_INDEXES
· DBA_IND_STATISTICS
· DBA_CLUSTERS
· DBA_TAB_PARTITIONS
· DBA_TAB_SUBPARTITIONS
· DBA_IND_PARTITIONS
· DBA_IND_SUBPARTITIONS
· DBA_PART_COL_STATISTICS
· DBA_PART_HISTOGRAMS
· DBA_SUBPART_COL_STATISTICS
· DBA_SUBPART_HISTOGRAMS

Viewing Histograms
Column statistics may be stored as histograms. These histograms provide accurate estimates of the distribution of column data. Histograms provide improved selectivity estimates in the presence of data skew, resulting in optimal execution plans with non uniform data distributions.

Oracle uses two types of histograms for column statistics: height-balanced histograms and frequency histograms. The type of histogram is stored in the HISTOGRAM column of the *TAB_COL_STATISTICS views (USER and DBA). This column can have values of HEIGHT BALANCED, FREQUENCY, or NONE.

Height-Balanced Histograms
In a height-balanced histogram, the column values are divided into bands so that each band contains approximately the same number of rows. The useful information that the histogram provides is where in the range of values the endpoints fall. Height-balanced histograms can be viewed using the *TAB_HISTOGRAMS tables.

Example for Viewing Height-Balanced Histogram Statistics
BEGIN
DBMS_STATS.GATHER_TABLE_STATS (OWNNAME => 'OE', TABNAME => 'INVENTORIES',
METHOD_OPT => 'FOR COLUMNS SIZE 10 QUANTITY_ON_HAND');
END;
/

SELECT COLUMN_NAME, NUM_DISTINCT, NUM_BUCKETS, HISTOGRAM
FROM USER_TAB_COL_STATISTICS
WHERE TABLE_NAME = 'INVENTORIES' AND COLUMN_NAME = 'QUANTITY_ON_HAND';

COLUMN_NAME NUM_DISTINCT NUM_BUCKETS HISTOGRAM
------------------------------ ------------ ----------- ---------------
QUANTITY_ON_HAND 237 10 HEIGHT BALANCED

SELECT ENDPOINT_NUMBER, ENDPOINT_VALUE
FROM USER_HISTOGRAMS
WHERE TABLE_NAME = 'INVENTORIES' AND COLUMN_NAME = 'QUANTITY_ON_HAND'
ORDER BY ENDPOINT_NUMBER;

ENDPOINT_NUMBER ENDPOINT_VALUE
--------------- --------------
0 0
1 27
2 42
3 57
4 74
5 98
6 123
7 149
8 175
9 202
10 353

In the query output, one row corresponds to one bucket in the histogram.

Frequency Histograms
In a frequency histogram, each value of the column corresponds to a single bucket of the histogram. Each bucket contains the number of occurrences of that single value. Frequency histograms are automatically created instead of height-balanced histograms when the number of distinct values is less than or equal to the number of histogram buckets specified. Frequency histograms can be viewed using the *TAB_HISTOGRAMS tables.

Example for Viewing Frequency Histogram Statistics
BEGIN
DBMS_STATS.GATHER_TABLE_STATS (OWNNAME => 'OE', TABNAME => 'INVENTORIES',
METHOD_OPT => 'FOR COLUMNS SIZE 20 WAREHOUSE_ID');
END;
/

SELECT COLUMN_NAME, NUM_DISTINCT, NUM_BUCKETS, HISTOGRAM
FROM USER_TAB_COL_STATISTICS
WHERE TABLE_NAME = 'INVENTORIES' AND COLUMN_NAME = 'WAREHOUSE_ID';

COLUMN_NAME NUM_DISTINCT NUM_BUCKETS HISTOGRAM
------------------------------ ------------ ----------- ---------------
WAREHOUSE_ID 9 9 FREQUENCY

SELECT ENDPOINT_NUMBER, ENDPOINT_VALUE
FROM USER_HISTOGRAMS
WHERE TABLE_NAME = 'INVENTORIES' AND COLUMN_NAME = 'WAREHOUSE_ID'
ORDER BY ENDPOINT_NUMBER;

ENDPOINT_NUMBER ENDPOINT_VALUE
--------------- --------------
36 1
213 2
261 3
370 4
484 5
692 6
798 7
984 8
1112 9

Issues
  • Exclude dataload tables from your regular stats gathering, unless you know they will be full at the time that stats are gathered.
  • Gathering stats for the SYS schema can make the system run slower, not faster.
  • Gathering statistics can be very resource intensive for the server so avoid peak workload times or gather stale stats only.
  • Even if scheduled, it may be necessary to gather fresh statistics after database maintenance or large data loads.
  • If a table goes from 1 row to 200 rows, that's a significant change. When a table goes from 100,000 rows to 150,000 rows, that's not a terribly significant change. When a table goes from 1000 rows all with identical values in commonly-queried column X to 1000 rows with nearly unique values in column X, that's a significant change.
Statistics store information about item counts and relative frequencies. Things that will let it "guess" at how many rows will match a given criteria. When it guesses wrong, the optimizer can pick a very suboptimal query plan.

Source:Internet