This is a discussion on Weird speed problem within the SQL Server forums, part of the Microsoft SQL Server category; --> I have a table on a database that contains 18million records. I need to design a system that queries ...
| |||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| I have a table on a database that contains 18million records. I need to design a system that queries this table to produce fast counts. I have got counts for multiple criteria down to only a few seconds. Most take under a second however I have a few queries that seam to take longer which I am working on reducing the time. I have found some strange behavour in the way SQL Server works. Take the following two queries which produce exactly the same result: --------------------------------- select count(*) from dbo.table where column1='value1' and column2='value2' -------------------------------- and -------------------------------- select count(*) from (select id from table where column1 = 'value1') as value1 join (select id from table where column2 = 'value2') as value2 on value1.id = value2.id --------------------------------- I would assume that the first query should run faster then the second query. When I look at the query plans, they are almost identical cost wise. The first takes about 53% of the cost and the second takes 47%. Yet, the first query takes about 25 seconds to run and the second takes only 5 seconds. Does anyone know of a reason why there would be such a difference in query speed? |
| |||
| On Oct 25, 3:13 pm, DBMonitor <spamawa...@yahoo.com.au> wrote: > I have a table on a database that contains 18million records. I need > to design a system that queries this table to produce fast counts. > > I have got counts for multiple criteria down to only a few seconds. > Most take under a second however I have a few queries that seam to > take longer which I am working on reducing the time. > > I have found some strange behavour in the way SQL Server works. > > Take the following two queries which produce exactly the same result: > > --------------------------------- > select count(*) > from dbo.table > where column1='value1' > and column2='value2' > -------------------------------- > and > > -------------------------------- > select count(*) > from (select id from table where column1 = 'value1') as value1 > join (select id from table where column2 = 'value2') as value2 > on value1.id = value2.id > --------------------------------- > > I would assume that the first query should run faster then the second > query. When I look at the query plans, they are almost identical cost > wise. The first takes about 53% of the cost and the second takes 47%. > > Yet, the first query takes about 25 seconds to run and the second > takes only 5 seconds. > > Does anyone know of a reason why there would be such a difference in > query speed? BTW: Cubes are not an option in this senario. |
| |||
| My guess - and I can only guess from the given information - is that there is an index on dbo.table(column1), and an index on dbo.table(column2), but no index on dbo.table(column1,column2). In the first example the optimizer chooses to use one of the indexes, then has to scan to resolve the other test. In the second version of the query the optimizer is using both indexes and then matching up the results. If this combination of columns is queried often you might consider changing one of the two indexes to add the other column as the second column of the index. So you might end up with the two indexes as (column1), and (column2,column1). You would not need to indexes with the same first column. Roy Harvey Beacon Falls, CT On Wed, 24 Oct 2007 22:13:11 -0700, DBMonitor <spamawayau@yahoo.com.au> wrote: >I have a table on a database that contains 18million records. I need >to design a system that queries this table to produce fast counts. > >I have got counts for multiple criteria down to only a few seconds. >Most take under a second however I have a few queries that seam to >take longer which I am working on reducing the time. > >I have found some strange behavour in the way SQL Server works. > >Take the following two queries which produce exactly the same result: > >--------------------------------- >select count(*) >from dbo.table >where column1='value1' >and column2='value2' >-------------------------------- >and > >-------------------------------- >select count(*) >from (select id from table where column1 = 'value1') as value1 >join (select id from table where column2 = 'value2') as value2 >on value1.id = value2.id >--------------------------------- > >I would assume that the first query should run faster then the second >query. When I look at the query plans, they are almost identical cost >wise. The first takes about 53% of the cost and the second takes 47%. > >Yet, the first query takes about 25 seconds to run and the second >takes only 5 seconds. > >Does anyone know of a reason why there would be such a difference in >query speed? |
| |||
| On Wed, 24 Oct 2007 22:13:11 -0700, DBMonitor wrote: >Does anyone know of a reason why there would be such a difference in >query speed? Hi DBMonitor, In addition to the suggestion posted by Roy, have you considered that this may be caused by cachhing? In other words, if you run the second query first and the first query last, or if you run both queries repeatedly, does that change anything to the execution time? -- Hugo Kornelis, SQL Server MVP My SQL Server blog: http://sqlblog.com/blogs/hugo_kornelis |
| |||
| On Oct 26, 5:48 am, Hugo Kornelis <h...@perFact.REMOVETHIS.info.INVALID> wrote: > On Wed, 24 Oct 2007 22:13:11 -0700, DBMonitor wrote: > >Does anyone know of a reason why there would be such a difference in > >query speed? > > Hi DBMonitor, > > In addition to the suggestion posted by Roy, have you considered that > this may be caused by cachhing? In other words, if you run the second > query first and the first query last, or if you run both queries > repeatedly, does that change anything to the execution time? > > -- > Hugo Kornelis, SQL Server MVP > My SQL Server blog:http://sqlblog.com/blogs/hugo_kornelis Thanks for your replys. Yes, there is an index on column 1 and one on colunm 2 but none for both. However the user could search on one of about 30 columns on the table therefore the search is dynamic. Both query plans use indexes on for both the tables though. The plans are almost identical. In order to ensure the cache was not the problem I did a DBCC DROPCLEANBUFFERS before executing each query. It does not matter what order they are run also. The query with the subqueries always runs faster then the standard query. |
| |||
| DBMonitor (spamawayau@yahoo.com.au) writes: > Both query plans use indexes on for both the tables though. The plans > are almost identical. "Almost". Apparently, there is a subtle, but important difference. Would it be posible for you to post the query plans? If you are on SQL 2005 you can save the graphical execution plan in a file and post that in an attachment. (Or put it on a web site with a link to it.) -- Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se Books Online for SQL Server 2005 at http://www.microsoft.com/technet/pro...ads/books.mspx Books Online for SQL Server 2000 at http://www.microsoft.com/sql/prodinf...ons/books.mspx |
| |||
| On Oct 26, 7:39 am, Erland Sommarskog <esq...@sommarskog.se> wrote: > DBMonitor (spamawa...@yahoo.com.au) writes: > > Both query plans use indexes on for both the tables though. The plans > > are almost identical. > > "Almost". Apparently, there is a subtle, but important difference. > > Would it be posible for you to post the query plans? If you are on SQL 2005 > you can save the graphical execution plan in a file and post that in an > attachment. (Or put it on a web site with a link to it.) The query plans are as follows: ----------------- Subquery Query (Total cost 41.52%) SELECT (0%)<-CS (0%)<-HM IJ (68%)<-IS Col1 (11%) ..................................<-IS Col2 (20%) Standard Query (Total Cost 58.48%) SELECT (0%)<-CS (0%)<-HM IJ (78%)<-IS Col1 (8%) ..................................<-IS Col2 (14%) KEY: CS - Comput Scalar HM IJ - Hash Match/Inner Join IS - Index Seek ------------------------- The main difference is the hash match process. For the sub querys, the row size is 15 for the query with the sub queries and 21 for the standard query however the sub query query has more rows on it. The only thing I can think of which is happening is some sort of page io sharing problem. The database server is running on a virtual machine and the last wait type for the queries are always 'PAGEIOLATCH_SH' and the process runs on one thread. When I run it on a dedicated server, the times to run the queries are almost identical and the lastwaittypes change to CXPACKET and uses multiple threads. |
| ||||
| DBMonitor (spamawayau@yahoo.com.au) writes: > The query plans are as follows: > > ----------------- > Subquery Query (Total cost 41.52%) > SELECT (0%)<-CS (0%)<-HM IJ (68%)<-IS Col1 (11%) > .................................<-IS Col2 (20%) > > Standard Query (Total Cost 58.48%) > SELECT (0%)<-CS (0%)<-HM IJ (78%)<-IS Col1 (8%) > .................................<-IS Col2 (14%) > > KEY: > CS - Comput Scalar > HM IJ - Hash Match/Inner Join > IS - Index Seek > I will have to confess that I hoped to see the full plans. Oh well. Your choice. > When I run it on a dedicated server, the times to run the queries are > almost identical and the lastwaittypes change to CXPACKET and uses > multiple threads. What happens if you add OPTION (MAXDOP 1) to force a non-parallel plan? -- Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se Books Online for SQL Server 2005 at http://www.microsoft.com/technet/pro...ads/books.mspx Books Online for SQL Server 2000 at http://www.microsoft.com/sql/prodinf...ons/books.mspx |