Saturday, October 10, 2009

Visiting Data as Few Times as Possible

Visiting Data as Few Times as Possible

Applications should try to access each row only once. This reduces network traffic and
reduces database load. Consider doing the following:

■ Combine Multiples Scans with CASE Statements
■ Use DML with RETURNING Clause
■ Modify All the Data Needed in One Statement

Combine Multiples Scans with CASE Statements
=============================================
Often, it is necessary to calculate different aggregates on various sets of tables. Usually, this is done with multiple scans on the table, but it is easy to calculate all the aggregates with one single scan. Eliminating n-1 scans can greatly improve
performance.

Combining multiple scans into one scan can be done by moving the WHERE condition
of each scan into a CASE statement, which filters the data for the aggregation. For eachaggregation, there could be another column that retrieves the data.

The following example asks for the count of all employees who earn less then 2000,
between 2000 and 4000, and more than 4000 each month. This can be done with three
separate queries:

SELECT COUNT (*)
FROM employees
WHERE salary < 2000;
SELECT COUNT (*)
FROM employees
WHERE salary BETWEEN 2000 AND 4000;
SELECT COUNT (*)
FROM employees
WHERE salary>4000;

However, it is more efficient to run the entire query in a single statement. Each number is calculated as one column. The count uses a filter with the CASE statement to count only the rows where the condition is valid.

For example:
SELECT COUNT (CASE WHEN salary < 2000
THEN 1 ELSE null END) count1,
COUNT (CASE WHEN salary BETWEEN 2001 AND 4000
THEN 1 ELSE null END) count2,
COUNT (CASE WHEN salary > 4000
THEN 1 ELSE null END) count3
FROM employees;

This is a very simple example. The ranges could be overlapping, the functions for the
aggregates could be different, and so on.

Use DML with RETURNING Clause
==============================

When appropriate, use INSERT, UPDATE, or DELETE... RETURNING to select and
modify data with a single call. This technique improves performance by reducing the
number of calls to the database.

No comments: