How to Accelerate Queries on Big Tables

Tables that have rows more than 100M in numbers may have a performance degradation on queries, especially when the queries must join multiple tables, which can make the execution time grow exponentially. Here are some suggestions that may speed up queries in practices.

Proper Indexes

You may not have built the proper indexes to facilitate your queries to execute faster. Or, one possibility, you may think the indexes are already there, but in fact, they are not good enough to make the database engine to choose any one of them to run queries, then full table scan is applied instead.

A proper index must include some (or parts) of crucial columns listed in where clause of concerned queries, and the most important thing is that they must be arranged in correct orders. Misplacing the order of columns in composite indexes can make the indexes useless.

For ensuring the index is selected to run queries by the database engine, you may add the keyword explain before select statements to check the execution path.

Note 1: For really small tables, let's say hundreds of row or less, you don't have to build any index, table retrieving are fast enough on themselves.

Note 2: Bitmap indexes are adequate for tables contain monotonous values in a column, e.g. 5 to 10 kinds of value. Otherwise, binary tree should be the index type.

Table Partitions with the Right Key.

Table partitions can divide relevant data into several physical segments by some definitions called Partition Key. So as to make IO faster than normal because of shorter distance among data, but the prerequisites is that you have to select the right column(s) to be the partition key.

The right partition key is usually selected from where clause of concerned queries. If you can't find any column(s) to be the right key to partition, then don't. Otherwise, things will get worse.

Note: For small-scaled tables with partitioned purposely, there will be a performance penalty waiting for you.

Materialized Views

If there's no "right", not even "proper" partition key can be selected on the big table to partition. Materialized views may help the case. This technique allows you to aggregate potentially or likely to be selected data in advance instead of querying the big table directly. It's especially useful when there're multiple tables have to be joined together.

There maybe only one 10th or even one 100th really useful data out of the big table that is preserved in the materialized view, consequently it dramatically saves execution time. With the nice features, materialized views are widely used in data warehousing and reporting services. Furthermore, the technique can corporate with table partitions to accelerate snapshot of materialized views.

On the negative side, the cost of materialized views is that you must have a clear and maybe complex procedure to maintain them automatically or manually to protect data from any logical errors. Worse things could happen, and may nobody knows how to recover it.

Table Compression

We may think table compression only benefits for saving lots of disk space, but in fact, it also make some progress on IO access by costing some CPU power instead. Therefore, it can accelerate the overall execution of queries sometimes, especially when the result sets are very heavy (e.g. OLAP-liked data).

The point is, you have to choose the proper compression type. Generally, high CPU-powered database deserves more advanced compression, low CPU-powered servers should apply the basic compression.

How to Accelerate Queries on Big Tables

Proper Indexes

Table Partitions with the Right Key.

Materialized Views

Table Compression

Leave a Reply Cancel reply