View Single Post

   
  #2 (permalink)  
Old 04-20-2008, 07:52 AM
Jonathan Leffler
 
Posts: n/a
Default Re: "index size" for a particular table by querying the sysstem tables

DL Redden wrote:
> Without doing any work or research on this I wonder if the
> difference is in the 4 byte overhead for each index. This theory
> stems from the large chunk capabilities of 9.4. I have no basis for
> this theory it was just a passing thought.


The '+4' factor is a bit questionable in general. For non-fragmented
tables, the value should be '+5' - 4 bytes for the rowid plus 1 byte
for the delete marker. If you have a fragmented table, it needs to be
'+9' (4 bytes fragid, 4 bytes rowid, 1 byte delete marker).

But there are a couple of more deep-seated problems...

> "June C. Hunt" <june.c.hunt@gmail.com> wrote:
> REBELLO, Rulesh Felix wrote [with fixups]:
>>Can I find the "index size" for a particular table by querying the
>>system tables .. ..???
>>
>> It is the similar output which we get in the dbschema utility or
>> the info in the .sql file of a dbexport.

>
> I might only be able to help a small bit. The following was posted to
> c.d.i. by Rajib Sarkar of IBM back on 2002-06-10:
>
> Step 1:
> =========
> select sum(c.collength)
> from syscolumns c, sysindexes i
> where i.tabid = c.tabid
> and i.tabid =
> and (c.colno = ABS(i.part1) or
> c.colno = ABS(i.part2) or
> c.colno = ABS(i.part3) or
> c.colno = ABS(i.part4) or
> c.colno = ABS(i.part5) or
> c.colno = ABS(i.part6) or
> c.colno = ABS(i.part7) or
> c.colno = ABS(i.part8) or
> c.colno = ABS(i.part9) or
> c.colno = ABS(i.part10) or
> c.colno = ABS(i.part11) or
> c.colno = ABS(i.part12) or
> c.colno = ABS(i.part13) or
> c.colno = ABS(i.part14) or
> c.colno = ABS(i.part15) or
> c.colno = ABS(i.part16));



The use of ABS is good - in the rare cases where you have an index
sorted into descending order (or a key within an index sorted in
descending order), the column number is negative.

Unfortunately, the use of SUM(c.collength) is woefully wrong. It will
work correctly for columns of the types: CHAR, NCHAR, VARCHAR(n),
VARCHAR(n,0), NVARCHAR(n), NVARCHAR(n,0), INT, SMALLINT, FLOAT,
SMALLFLOAT, SERIAL, SERIAL8, INT8, BOOLEAN, DATE. I'll take LVARCHAR
under advisement - offhand, I'm not sure.

The problem children are DECIMAL, MONEY, DATETIME, INTERVAL and
[N]VARCHAR(n,m) for m>0. In each case, the ColLength value is
encoded, and the encoding is moderately complex. The easiest is the
VARCHAR values; the lower byte is the maximum length (n), but the
upper byte contains the minimum length (m). So, a VARCHAR(40,10) is
encoded as 10 * 256 + 40 = 2600, and adding that into your index size
gives an erroneous value. DECIMAL and MONEY values are encoded with
scale and precision - so DECIMAL(16,4) is 16 * 256 + 4 = 4100 (he
says, working from a header file and not from an actual database;
there's an outside chance any given calculation is wrong, but the
overall principle remains accurate). The actual length on disk - and
in index - for DECIMAL(16,4) is 9 bytes. The encoding of DATETIME is
still more complex - and of INTERVAL more complex yet. But the
numbers in the collength column are certainly not simply addable as
shown in the query above.

> This will just give you the width of the index, now you would require to
> add the overhead to it, and the formula would be:
> Step 2
> =========
> select count(*) from sysindexes where tabid = ;
>
> index length = ( + 4 * ) * 3/2;


The second step is interesting - I can't think why there would be a
3/2 factor in the index size - unless the number of rows was included.
Then the computation would be for the total size in bytes used by an
index, and you do need a fudge factor in there, and 1.5 is a possible
value to use. But that isn't what was requested; the request was for
the width of an index as reported by dbschema or dbexport, and that is
the size of one key entry, and that does not need a fudge factor.

There is no difference between 32-bit and 64-bit indexes.

> I've tested this formula against IDS 9.20.UC3 and get the index size
> that I expect. I've also tested this formula against IDS 9.40.FC2 and
> it does not come out to be the value I would expect. I'm not sure if
> the difference is due entirely to the version of IDS, in some portion
> to the 32 versus 64-bit difference, a combination of the two, or
> something completely different. I've spent some time looking at the
> Administrator's Guide (Art Kagel referenced that in a post that dealt
> with this same issue) and the Performance Guide (the Admin Guide
> references the Performance Guide), but haven't found an answer yet.
>
> I will continue to look for a formula that will work with IDS 9.4 (I
> can be stubborn that way), but would be happy to end my search if
> anyone else cares to post the correct answer. If not, I'll post what
> I find if and when I get a chance to get back to this....




--
Jonathan Leffler #include <disclaimer.h>
Email: jleffler@earthlink.net, jleffler@us.ibm.com
Guardian of DBD::Informix v2003.04 -- http://dbi.perl.org/
Reply With Quote