Unix Technical Forum

RE: data distribution question

This is a discussion on RE: data distribution question within the Informix forums, part of the Database Server Software category; --> bin# # rows in bucket # unique values highest value in this bin 2: ( 1973055, 9364, 25763) 3: ...


Go Back   Unix Technical Forum > Database Server Software > Informix

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 04-20-2008, 12:43 PM
Ford, Andrew G
 
Posts: n/a
Default RE: data distribution question



bin# # rows in bucket # unique values highest value in
this bin

2: ( 1973055, 9364,
25763)

3: ( 1973055, 9915,
35887)

Doesn't look too bad. You've got 1973055 rows in bin #3 representing
values between 25764 and 35887 and have 9915 unique values in this
range. Calculate this out and you have 1973055 / 9915 = 199 rows for
each value between 25764 and 35887.

If I'm looking for a row with value of 28742 this index would narrow the
search down to approx. 199 rows.

Now, if I am looking for a row with even numbered values a sequential
scan would be better. It all depends on what info you are trying to get
to.

Check out this link for some good info on distributions and update
statistics by some guy,
http://www-1.ibm.com/partnerworld/pw...bizinformix_08
mar.html


(Did I get it right John?)




Andrew Ford


-----Original Message-----
From: informix-list-bounces@iiug.org
[mailto:informix-list-bounces@iiug.org] On Behalf Of Floyd Wellershaus
Sent: Wednesday, August 02, 2006 6:08 PM
To: informix-list@iiug.org
Subject: data distribution question


With the below values for the mbntfhst_token column, which is being
joined to in the query, I would think these distributions show way too
much duplication for an index to be effective. Yet, when I force a scan
on that table, the cost goes way way up.

What am I missing here ?

Thank you.

Floyd


1: ( 1973055, 13906, 14381)

2: ( 1973055, 9364, 25763)

3: ( 1973055, 9915, 35887)

4: ( 1973055, 5803, 41746)

5: ( 1973055, 4832, 46654)

6: ( 1973055, 5731, 52437)

7: ( 1973055, 5548, 58053)

8: ( 1973055, 6483, 65065)

9: ( 1973055, 11890, 77038)

10: ( 1973055, 10660, 88268)

11: ( 1973055, 5761, 94404)

12: ( 1973055, 4753, 99639)

13: ( 1973055, 8261, 107909)

14: ( 1973055, 7536, 116822)

15: ( 1973055, 4875, 123111)

16: ( 1973055, 3135, 127073)

17: ( 1973055, 5571, 134148)

....

....



Also, take a look at the overflow bucket for that column.

1: ( 1509940, 206221)

2: ( 499152, 249386)

3: ( 568126, 250016)

4: ( 522169, 276801)

5: ( 495874, 277207)

6: ( 564699, 305435)

7: ( 584721, 305658)

8: ( 507669, 305889)

9: ( 496176, 324590)

10: ( 545764, 335032)

11: ( 575454, 337654)

12: ( 662279, 338354)

13: ( 655045, 357904)

14: ( 1352666, 359863)

15: ( 684749, 368087)

16: ( 521739, 368452)

17: ( 687000, 371474)

18: ( 528608, 371973)

19: ( 679561, 388584)

20: ( 538942, 392313)

21: ( 494174, 393604)

22: ( 546620, 403774)

....

....







========================
-<<Floyd Wellershaus>>-
Database Administrator
Unix Administrator


email: fwellers@yahoo.com

Home: 703-430-0805

Cell: 703-477-6045
========================

http://www.one.org/

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2 (permalink)  
Old 04-20-2008, 12:43 PM
John Carlson
 
Posts: n/a
Default Re: data distribution question

On Wed, 2 Aug 2006 18:38:34 -0400, "Ford, Andrew G"
<Andrew_G_Ford@homedepot.com> wrote:

>
>
>bin# # rows in bucket # unique values highest value in
>this bin
>
> 2: ( 1973055, 9364,
>25763)
>
> 3: ( 1973055, 9915,
>35887)
>
>Doesn't look too bad. You've got 1973055 rows in bin #3 representing
>values between 25764 and 35887 and have 9915 unique values in this
>range. Calculate this out and you have 1973055 / 9915 = 199 rows for
>each value between 25764 and 35887.
>
>If I'm looking for a row with value of 28742 this index would narrow the
>search down to approx. 199 rows.
>
>Now, if I am looking for a row with even numbered values a sequential
>scan would be better. It all depends on what info you are trying to get
>to.


You lost me with the sequential scan being better for "even numbered
values". If I need 200 records out of the millions on the table, it
seems that an indexed read would be better in this case. Like you
said, all depends on that data is coming back.

>
>Check out this link for some good info on distributions and update
>statistics by some guy,
>http://www-1.ibm.com/partnerworld/pw...bizinformix_08
>mar.html
>
>
>(Did I get it right John?)
>
>


Pretty well done . . . . .

JWC



>
>
>Andrew Ford
>
>
>-----Original Message-----
>From: informix-list-bounces@iiug.org
>[mailto:informix-list-bounces@iiug.org] On Behalf Of Floyd Wellershaus
>Sent: Wednesday, August 02, 2006 6:08 PM
>To: informix-list@iiug.org
>Subject: data distribution question
>
>
>With the below values for the mbntfhst_token column, which is being
>joined to in the query, I would think these distributions show way too
>much duplication for an index to be effective. Yet, when I force a scan
>on that table, the cost goes way way up.
>
>What am I missing here ?
>
>Thank you.
>
>Floyd
>
>
>1: ( 1973055, 13906, 14381)
>
> 2: ( 1973055, 9364, 25763)
>
> 3: ( 1973055, 9915, 35887)
>
> 4: ( 1973055, 5803, 41746)
>
> 5: ( 1973055, 4832, 46654)
>
> 6: ( 1973055, 5731, 52437)
>
> 7: ( 1973055, 5548, 58053)
>
> 8: ( 1973055, 6483, 65065)
>
> 9: ( 1973055, 11890, 77038)
>
> 10: ( 1973055, 10660, 88268)
>
> 11: ( 1973055, 5761, 94404)
>
> 12: ( 1973055, 4753, 99639)
>
> 13: ( 1973055, 8261, 107909)
>
> 14: ( 1973055, 7536, 116822)
>
> 15: ( 1973055, 4875, 123111)
>
> 16: ( 1973055, 3135, 127073)
>
> 17: ( 1973055, 5571, 134148)
>
>...
>
>...
>
>
>
>Also, take a look at the overflow bucket for that column.
>
> 1: ( 1509940, 206221)
>
> 2: ( 499152, 249386)
>
> 3: ( 568126, 250016)
>
> 4: ( 522169, 276801)
>
> 5: ( 495874, 277207)
>
> 6: ( 564699, 305435)
>
> 7: ( 584721, 305658)
>
> 8: ( 507669, 305889)
>
> 9: ( 496176, 324590)
>
> 10: ( 545764, 335032)
>
> 11: ( 575454, 337654)
>
> 12: ( 662279, 338354)
>
> 13: ( 655045, 357904)
>
> 14: ( 1352666, 359863)
>
> 15: ( 684749, 368087)
>
> 16: ( 521739, 368452)
>
> 17: ( 687000, 371474)
>
> 18: ( 528608, 371973)
>
> 19: ( 679561, 388584)
>
> 20: ( 538942, 392313)
>
> 21: ( 494174, 393604)
>
> 22: ( 546620, 403774)
>
>...
>
>...
>
>
>
>
>
>
>
>========================
>-<<Floyd Wellershaus>>-
>Database Administrator
>Unix Administrator
>
>
>email: fwellers@yahoo.com
>
>Home: 703-430-0805
>
>Cell: 703-477-6045
>========================
>
>http://www.one.org/

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 05:27 PM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Search Engine Optimization by vBSEO 3.2.0
www.UnixAdminTalk.com