How do query optimizers work




















The result of this step is a query tree that is composed of a basic list of the processes needed to execute the query. This provides basic instructions, but does not yet include specifics, such as which indexes or joins to use.

Optimization is the process that we will reference most often here. The optimizer operates similarly to a chess or any gaming computer. It needs to consider an immense number of possible moves as quickly as possible, remove the poor choices, and finish with the best possible move. At any point in time, there may be millions of combinations of moves available for the computer to consider, of which only a handful will be the best possible moves.

Anyone that has played chess against a computer knows that the less time the computer has, the more likely it is to make an error. In the world of SQL Server, we will talk about execution plans instead of chess moves. The execution plan is the set of specific steps that the execution engine will follow to process a query. Every query has many choices to make to arrive at that execution plan and must do so in a very short span of time.

Any execution plan that is considered by the optimizer must return the same results, but the performance of each plan may differ due to those questions above and many more! Query optimization is a CPU-intensive operation.

The process to sift through plans requires significant computing resources and to find the best plan may require more time than is available. As a result, a balance must be maintained between the resources needed to optimize the query, the resources required to execute the query, and the time we must wait for the entire process to complete.

As a result, the optimizer is not built to select the best execution plan, but instead to search and find the best possible plan after a set amount of time passes. It may not be the perfect execution plan, but we accept that as a limitation of how a process with so many possibilities must operate. The metric used to judge execution plans and decide which to consider or not is query cost.

The cost has no unit and is a relative measure of the resources required to execute each step of an execution plan. The overall query cost is the sum of the costs of each step within a query. You can view these costs in any execution plan:. While query cost is a useful metric to understand how SQL Server has optimized a particular query, it is important to remember that its primary purpose is to aid the query optimizer in choosing good execution plans.

It is not a direct measure of IO, CPU, memory, duration, or any other metric that matters to an application user waiting for query execution to complete. A low query cost may not indicate a fast query or the best plan. Alternatively, a high query cost may sometimes be acceptable.

As the query optimizer churns through candidate execution plans, it will rank them from lowest cost to highest cost. Eventually, the optimizer will reach one of the following conclusions:.

Execution is the final step. SQL Server takes the execution plan that was identified in the optimization step and follows those instructions in order to execute the query.

A note on plan reuse : Because optimizing is an inherently expensive process, SQL Server maintains an execution plan cache that stores details about each query executed on a server and the plan that was chosen for it.

Typically, databases experience the same queries executed over and over again, such as a web search, order placement, or social media post. Reuse allows us to avoid the expensive optimization process and rely on the work we have previously done to optimize a query.

When a query is executed that already has a valid plan in cache, that plan will be chosen, rather than going through the process of building a new one. This saves computing resources and speeds up query execution immensely. The following is a list of the most common metrics that will assist in optimization.

Once the basics are out of the way, we can use these basic processes to identify tricks, tips, and patterns in query structure that can be indicative of poor performance. Data may be accessed from an index via either a scan or a seek.

A seek is a targeted selection of rows from the table based on a typically narrow filter. A scan is when an entire index is searched to return the requested data. If a table contains a million rows, then a scan will need to traverse all million rows to service the query. If there is a legitimate need to return a great deal of data from a table, then an index scan may be the correct operation.

If we needed to return , rows from a million row table, then an index scan makes sense. If we only need to return 10 rows, then a seek would be far more efficient. We can quickly spot the index scan in the top-right corner of the execution plan.

Many solutions are available when we have identified an undesired index scan. Here is a quick list of some thoughts to consider when resolving an index scan problem:. The faster we can slice down our data set to only the rows we need, the more efficient query execution will be! When evaluating a WHERE clause, any expressions involved need to be resolved prior to returning our data. If the function must be evaluated prior to execution to determine a result set, then the entirety of the data set will need to be scanned to complete that evaluation.

This will return any rows from Person. Database users do not typically interact with a query optimizer, which works in the background. SQL queries may be simple or complex statements. Each SQL statement requires minimal use of valuable resources, such as disk reads and server memory. The query optimizer ensures this, as well as expedited execution of each SQL query.

For example, a query optimizer may generate a series of query plans based on resource costs. One query plan may involve reading a table to retrieve a subset of its data, while another may involve using table indexes for quick data reading. These are known as cost-based optimizers. A query optimizer may select different query plans for the same query, depending on environmental circumstances. For example, a user runs a query that selects approximately half of a table's data.

The user runs the query when the server is heavily tasked with multiple simultaneous connections. In this scenario, the query optimizer may decide to use a query plan that calls on the created table indexes to satisfy the query, based on limited resources.

This ensures minimal server drain by the query. Many people keep asking about explicit versus implicit joins. Basically, both variants are the same. Both queries are identical and the planner will treat them the same way as for most commonly seen queries.

Mind that the explicit joins work with and without parenthesis. However, there is one parameter that is of great importance here:. In other words, an implicit join is just like an explicit join, but only up to a certain number of joins controlled by this parameter.

PostgreSQL offers various join strategies. These strategies include hash joins, merge joins, nested loops, and a lot more. We have already shared some of this information in previous posts. More on PostgreSQL join strategies can be found here. Usually, the planner has fewer options here than in the case of inner joins. The following optimizations are possible:.

While this theoretical explanation is correct, most people will have no clue what it means in real life. Therefore I have compiled a real-world example showing how PostgreSQL actually reorders a real join:. What we see here is that the PostgreSQL optimizer decides on joining x with y and then with z. This is the same query, but with slightly altered parameters. The difference is that PostgreSQL again starts with x but then joins z first, before adding y.

Note that this optimization happens automatically. One reason why the optimizer can make this decision is because of the existence of optimizer support functions which were added to PostgreSQL a while ago.

The reason why the reordering works is that support functions offer the planner a chance to figure out how many rows are returned from which part. If you use tables instead of set returning functions, support functions are irrelevant.

Not every join in a query is actually executed by PostgreSQL. The optimizer knows the concept of join pruning and is able to get rid of pointless joins quite efficiently. In this case, we need to make sure that both sides actually have primary keys, or some kind of unique constraint:.

In this case, the join has to be executed. As you can see, PostgreSQL has decided on a hash join. The next example contains only a small variation of the query:.

If the right side is not unique, the join might actually increase the number of rows returned; so pruning is only possible in case the right side is unique. While it is certainly a good thing to have join pruning in the PostgreSQL optimizer, you have to be aware of the fact that the planner is basically fixing something which should not exist anyway.

This is way more efficient than some sort of nested loop. In short: The nested loop is replaced with a join which can yield significant performance gains. Every database relies heavily on sorting, which is necessary to handle many different types of queries and optimize various workloads.

If you want to know what hint bits are and how they operate, consider checking out our post about hint bits in PostgreSQL. PostgreSQL needs more than one core to process the query in 86 milliseconds, which is a lot. But what happens if we create an index? The most obvious observation is that the query executed in a fraction of a millisecond.

However, what is most important here is that we do NOT see a sort-step. PostgreSQL knows that the index returns data in sorted order sorted by id and thus there is no need to sort the data all over again.



0コメント

  • 1000 / 1000