There're could be several tables and indexes over 100 million rows in the database, you may not want to consume precious resource on analyzing them with an unnecessary high sampling percentage. A lower sampling percentage could save time and resource, but it also could result inaccurate statistics and make queries apply a wrong plan under CBO algorithm.
Basically, the value of estimate percentage is always the higher the better. But with limited resources, you can only choose a sufficient or good enough percentage to optimize. How high is sufficient enough? It really depends on the evenness of the table, which means that if every row in the table is very similar or uniform in length, block usage and characteristics (e.g. distinct values, null values), you can choose a low sampling percentage, say, 10%, even 5% might be good enough.
On the other hand, if the data in the table is not evenly distributed, you have to prepare a high estimate percentage, for example, 30% or more. Please note that, with estimate percentages larger than 49% (50% ~ 100%), the database will COMPUTE the whole table instead.
With an unpredictable table, I recommend you to use 30% as a starting point and decrease the percentage gradually. This value usually brings me a pretty confident statistics. Believe it or not, the statistics result between 20% and 30% could make the same statement execute with different SQL plans, which costed me a lot of time to solve.
How High Estimate Percentage of Statistics is Sufficient Enough
- by Ed Chen