This is a discussion on RE: data distribution question within the Informix forums, part of the Database Server Software category; --> bin# # rows in bucket # unique values highest value in this bin 2: ( 1973055, 9364, 25763) 3: ...
| |||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| bin# # rows in bucket # unique values highest value in this bin 2: ( 1973055, 9364, 25763) 3: ( 1973055, 9915, 35887) Doesn't look too bad. You've got 1973055 rows in bin #3 representing values between 25764 and 35887 and have 9915 unique values in this range. Calculate this out and you have 1973055 / 9915 = 199 rows for each value between 25764 and 35887. If I'm looking for a row with value of 28742 this index would narrow the search down to approx. 199 rows. Now, if I am looking for a row with even numbered values a sequential scan would be better. It all depends on what info you are trying to get to. Check out this link for some good info on distributions and update statistics by some guy, http://www-1.ibm.com/partnerworld/pw...bizinformix_08 mar.html (Did I get it right John?) Andrew Ford -----Original Message----- From: informix-list-bounces@iiug.org [mailto:informix-list-bounces@iiug.org] On Behalf Of Floyd Wellershaus Sent: Wednesday, August 02, 2006 6:08 PM To: informix-list@iiug.org Subject: data distribution question With the below values for the mbntfhst_token column, which is being joined to in the query, I would think these distributions show way too much duplication for an index to be effective. Yet, when I force a scan on that table, the cost goes way way up. What am I missing here ? Thank you. Floyd 1: ( 1973055, 13906, 14381) 2: ( 1973055, 9364, 25763) 3: ( 1973055, 9915, 35887) 4: ( 1973055, 5803, 41746) 5: ( 1973055, 4832, 46654) 6: ( 1973055, 5731, 52437) 7: ( 1973055, 5548, 58053) 8: ( 1973055, 6483, 65065) 9: ( 1973055, 11890, 77038) 10: ( 1973055, 10660, 88268) 11: ( 1973055, 5761, 94404) 12: ( 1973055, 4753, 99639) 13: ( 1973055, 8261, 107909) 14: ( 1973055, 7536, 116822) 15: ( 1973055, 4875, 123111) 16: ( 1973055, 3135, 127073) 17: ( 1973055, 5571, 134148) .... .... Also, take a look at the overflow bucket for that column. 1: ( 1509940, 206221) 2: ( 499152, 249386) 3: ( 568126, 250016) 4: ( 522169, 276801) 5: ( 495874, 277207) 6: ( 564699, 305435) 7: ( 584721, 305658) 8: ( 507669, 305889) 9: ( 496176, 324590) 10: ( 545764, 335032) 11: ( 575454, 337654) 12: ( 662279, 338354) 13: ( 655045, 357904) 14: ( 1352666, 359863) 15: ( 684749, 368087) 16: ( 521739, 368452) 17: ( 687000, 371474) 18: ( 528608, 371973) 19: ( 679561, 388584) 20: ( 538942, 392313) 21: ( 494174, 393604) 22: ( 546620, 403774) .... .... ======================== -<<Floyd Wellershaus>>- Database Administrator Unix Administrator email: fwellers@yahoo.com Home: 703-430-0805 Cell: 703-477-6045 ======================== http://www.one.org/ |
| ||||
| On Wed, 2 Aug 2006 18:38:34 -0400, "Ford, Andrew G" <Andrew_G_Ford@homedepot.com> wrote: > > >bin# # rows in bucket # unique values highest value in >this bin > > 2: ( 1973055, 9364, >25763) > > 3: ( 1973055, 9915, >35887) > >Doesn't look too bad. You've got 1973055 rows in bin #3 representing >values between 25764 and 35887 and have 9915 unique values in this >range. Calculate this out and you have 1973055 / 9915 = 199 rows for >each value between 25764 and 35887. > >If I'm looking for a row with value of 28742 this index would narrow the >search down to approx. 199 rows. > >Now, if I am looking for a row with even numbered values a sequential >scan would be better. It all depends on what info you are trying to get >to. You lost me with the sequential scan being better for "even numbered values". If I need 200 records out of the millions on the table, it seems that an indexed read would be better in this case. Like you said, all depends on that data is coming back. > >Check out this link for some good info on distributions and update >statistics by some guy, >http://www-1.ibm.com/partnerworld/pw...bizinformix_08 >mar.html > > >(Did I get it right John?) > > Pretty well done . . . . . JWC > > >Andrew Ford > > >-----Original Message----- >From: informix-list-bounces@iiug.org >[mailto:informix-list-bounces@iiug.org] On Behalf Of Floyd Wellershaus >Sent: Wednesday, August 02, 2006 6:08 PM >To: informix-list@iiug.org >Subject: data distribution question > > >With the below values for the mbntfhst_token column, which is being >joined to in the query, I would think these distributions show way too >much duplication for an index to be effective. Yet, when I force a scan >on that table, the cost goes way way up. > >What am I missing here ? > >Thank you. > >Floyd > > >1: ( 1973055, 13906, 14381) > > 2: ( 1973055, 9364, 25763) > > 3: ( 1973055, 9915, 35887) > > 4: ( 1973055, 5803, 41746) > > 5: ( 1973055, 4832, 46654) > > 6: ( 1973055, 5731, 52437) > > 7: ( 1973055, 5548, 58053) > > 8: ( 1973055, 6483, 65065) > > 9: ( 1973055, 11890, 77038) > > 10: ( 1973055, 10660, 88268) > > 11: ( 1973055, 5761, 94404) > > 12: ( 1973055, 4753, 99639) > > 13: ( 1973055, 8261, 107909) > > 14: ( 1973055, 7536, 116822) > > 15: ( 1973055, 4875, 123111) > > 16: ( 1973055, 3135, 127073) > > 17: ( 1973055, 5571, 134148) > >... > >... > > > >Also, take a look at the overflow bucket for that column. > > 1: ( 1509940, 206221) > > 2: ( 499152, 249386) > > 3: ( 568126, 250016) > > 4: ( 522169, 276801) > > 5: ( 495874, 277207) > > 6: ( 564699, 305435) > > 7: ( 584721, 305658) > > 8: ( 507669, 305889) > > 9: ( 496176, 324590) > > 10: ( 545764, 335032) > > 11: ( 575454, 337654) > > 12: ( 662279, 338354) > > 13: ( 655045, 357904) > > 14: ( 1352666, 359863) > > 15: ( 684749, 368087) > > 16: ( 521739, 368452) > > 17: ( 687000, 371474) > > 18: ( 528608, 371973) > > 19: ( 679561, 388584) > > 20: ( 538942, 392313) > > 21: ( 494174, 393604) > > 22: ( 546620, 403774) > >... > >... > > > > > > > >======================== >-<<Floyd Wellershaus>>- >Database Administrator >Unix Administrator > > >email: fwellers@yahoo.com > >Home: 703-430-0805 > >Cell: 703-477-6045 >======================== > >http://www.one.org/ |
| Thread Tools | |
| Display Modes | |
|
|