Unix Technical Forum

Re: SUSE port (was [ANNOUNCE] PostgreSQL 8.0.0 Release Candidate 5)

This is a discussion on Re: SUSE port (was [ANNOUNCE] PostgreSQL 8.0.0 Release Candidate 5) within the pgsql Hackers forums, part of the PostgreSQL category; --> Simon Riggs wrote: > Not sure what is going on here: why is SUSE not listed on the > ...


Go Back   Unix Technical Forum > Database Server Software > PostgreSQL > pgsql Hackers

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 04-11-2008, 03:15 AM
Peter Eisentraut
 
Posts: n/a
Default Re: SUSE port (was [ANNOUNCE] PostgreSQL 8.0.0 Release Candidate 5)

Simon Riggs wrote:
> Not sure what is going on here: why is SUSE not listed on the
> supported platforms list? (still)


RC5 contains:

SUSE Linux x86 8.0.0 Peter Eisentraut (<peter_e@gmx.net>), 2005-01-10
9.1

In the meantime I have received confirmation from Reinhard Max that his
test methods match our requirements, so the list will be completed with
the other platforms he reported in due time.

I'm sorry, but I won't just add "it works" notices if it's not clear
what kind of testing was done.

--
Peter Eisentraut
http://developer.postgresql.org/~petere/

---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2 (permalink)  
Old 04-11-2008, 03:15 AM
Jonah H. Harris
 
Posts: n/a
Default Re: Much Ado About COUNT(*)

Greg Stark wrote:

>I think part of the problem is that there's a bunch of features related to
>these types of queries and the lines between them blur.
>
>You seem to be talking about putting visibility information inside indexes for
>so index-only plans can be performed. But you're also talking about queries
>like "select count(*) from foo" with no where clauses. Such a query wouldn't
>be helped by index-only scans.
>
>Perhaps you're thinking about caching the total number of records in a global
>piece of state like a materialized view? That would be a nice feature but I
>think it should done as a general materialized view implementation, not a
>special case solution for just this one query.
>
>Perhaps you're thinking of the min/max problem of being able to use indexes to
>pick out just the tuples satisfying the min/max constraint. That seems to me
>to be one of the more tractable problems in this area but it would still
>require lots of work.
>
>I suggest you post a specific query you find is slow. Then discuss how you
>think it ought to be executed and why.
>
>
>

You are correct, I am proposing to add visibility to the indexes.

As for unqualified counts, I believe that they could take advantage of
an index-only scan as it requires much less I/O to perform an index scan
than a sequential scan on large tables.

Min/Max would also take advantage of index only scans but say, for
example, that someone has the following:

Relation SOME_USERS
user_id BIGINT PK
user_nm varchar(32) UNIQUE INDEX
some_other_attributes...

If an application needs the user names, it would run SELECT user_nm FROM
SOME_USERS... in the current implementation this would require a
sequential scan. On a relation which contains 1M+ tuples, this requires
either a lot of I/O or a lot of cache. An index scan would immensely
speed up this query.





---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #3 (permalink)  
Old 04-11-2008, 03:16 AM
Tom Lane
 
Posts: n/a
Default Re: Much Ado About COUNT(*)

"Jonah H. Harris" <jharris@tvi.edu> writes:
> Tom, Bruce, and others involved in this recurring TODO discussion…
> First, let me start by saying that I understand this has been discussed
> many times before; however, I’d like to see what the current state of
> affairs is regarding the possibility of using a unique index scan to
> speed up the COUNT aggregate.


It's not happening, because no one has come up with a workable proposal.
In particular, we're not willing to slow down every other operation in
order to make COUNT-*-with-no-WHERE-clause faster.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #4 (permalink)  
Old 04-11-2008, 03:16 AM
Reinhard Max
 
Posts: n/a
Default Re: SUSE port (was [ANNOUNCE] PostgreSQL 8.0.0 Release

On Wed, 12 Jan 2005 at 16:29, Peter Eisentraut wrote:

> In the meantime I have received confirmation from Reinhard Max that
> his test methods match our requirements, so the list will be
> completed with the other platforms he reported in due time.


Today I've updated the RPMs on the FTP server to RC5, which implicitly
means that PostgreSQL passes the regression tests on the provided
platforms.
ftp://ftp.suse.com/pub/projects/post...resql-8.0.0rc5

I have also successfully compiled and tested RC5 (but not yet created
RPMs) on the following platforms:

8.1-i386
8.2-i386
sles8-i386
sles8-ia64
sles8-ppc
sles8-ppc64
sles8-s390
sles8-s390x

The only failure I have to report is sles8-x86_64, where I am getting
segfaults from psql during the regression tests. I am still
investigating what's going on there....

cu
Reinhard

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #5 (permalink)  
Old 04-11-2008, 03:16 AM
Reinhard Max
 
Posts: n/a
Default segfault caused by heimdal (was: SUSE port)

On Wed, 12 Jan 2005 at 18:20, Reinhard Max wrote:

> I am still not sure whether the kerberos library, glibc, or
> PostgreSQL is to blame, or if it's a combination of bugs in these
> components that triggers the segfault.


The problem is, that the heimdal implementation of kerberos5 used on
sles8 needs an extra include statement for com_err.h in
src/interfaces/libpq/fe-auth.c to get the prototype for
error_message(), while on newer SUSE-releases using the MIT Kerberos5
implementation this prototype is provided by krb5.h itself.

So I suspect this bug might hit everyone using heimdal, but it only
gets triggered when one of the calls to kerberos in
src/interfaces/libpq/fe-auth.c returns with an error.

I am still not sure why the crash only happened on x86_64.

cu
Reinhard

---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #6 (permalink)  
Old 04-11-2008, 03:16 AM
Jonah H. Harris
 
Posts: n/a
Default Re: Much Ado About COUNT(*)

Tom Lane wrote:

>The fundamental problem is that you can't do it without adding at least
>16 bytes, probably 20, to the size of an index tuple header. That would
>double the physical size of an index on a simple column (eg an integer
>or timestamp). The extra I/O costs and extra maintenance costs are
>unattractive to say the least. And it takes away some of the
>justification for the whole thing, which is that reading an index is
>much cheaper than reading the main table. That's only true if the index
>is much smaller than the main table ...
>
> regards, tom lane
>
>

I recognize the added cost of implementing index only scans. As storage
is relatively cheap these days, everyone I know is more concerned about
faster access to data. Similarly, it would still be faster to scan the
indexes than to perform a sequential scan over the entire relation for
this case. I also acknowledge that it would be a negative impact to
indexes where this type of acces isn't required, as you suggested and
which is more than likely not the case. I just wonder what more people
would be happier with and whether the added 16-20 bytes would be
extremely noticable considering most 1-3 year old hardware.



---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #7 (permalink)  
Old 04-11-2008, 03:16 AM
Andrew Dunstan
 
Posts: n/a
Default Re: Much Ado About COUNT(*)

Jonah H. Harris said:
> Tom Lane wrote:
>
>>The fundamental problem is that you can't do it without adding at least
>>16 bytes, probably 20, to the size of an index tuple header. That
>>would double the physical size of an index on a simple column (eg an
>>integer or timestamp). The extra I/O costs and extra maintenance costs
>>are unattractive to say the least. And it takes away some of the
>>justification for the whole thing, which is that reading an index is
>>much cheaper than reading the main table. That's only true if the
>>index is much smaller than the main table ...
>>

> I recognize the added cost of implementing index only scans. As
> storage is relatively cheap these days, everyone I know is more
> concerned about faster access to data. Similarly, it would still be
> faster to scan the indexes than to perform a sequential scan over the
> entire relation for this case. I also acknowledge that it would be a
> negative impact to indexes where this type of acces isn't required, as
> you suggested and which is more than likely not the case. I just
> wonder what more people would be happier with and whether the added
> 16-20 bytes would be
> extremely noticable considering most 1-3 year old hardware.
>
>



Monetary cost is not the issue - cost in time is the issue.

cheers

andrew



---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #8 (permalink)  
Old 04-11-2008, 03:16 AM
Rod Taylor
 
Posts: n/a
Default Re: Much Ado About COUNT(*)

> >The index could then make sensible the reasonably common practice of
> >using a covered index - i.e. putting additional columns into the index
> >to satisfy the whole query just from the index.


> I am willing to take it on and I understand that the workload is mine.
> As long as everyone gives me some suggestions, I'm good it being optional.


If nobody is working on it, you may find that the below TODO item might
accomplish most of what you're looking for as well as generally
improving performance. The count(*) on a where clause would result in
one index scan and one partial sequential heap scan. Not as fast for the
specific examples you've shown, but far better than today and covers
many other cases as well.

Fetch heap pages matching index entries in sequential order

Rather than randomly accessing heap pages based on index
entries, mark heap pages needing access in a bitmap and do the
lookups in sequential order. Another method would be to sort
heap ctids matching the index before accessing the heap rows.


---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #9 (permalink)  
Old 04-11-2008, 03:16 AM
Greg Stark
 
Posts: n/a
Default Re: Much Ado About COUNT(*)


"Jonah H. Harris" <jharris@tvi.edu> writes:

> You are correct, I am proposing to add visibility to the indexes.


Then I think the only way you'll get any support is if it's an option. Since
it would incur a performance penalty on updates and deletes.

> As for unqualified counts, I believe that they could take advantage of an
> index-only scan as it requires much less I/O to perform an index scan than a
> sequential scan on large tables.


No, sequential scans require slightly more i/o than index scans. More
importantly they require random access i/o instead of sequential i/o which is
much slower.

Though this depends. If the tuple is very wide then the index might be faster
to scan since it would only contain the data from the fields being indexed.

This brings to mind another approach. It might be handy to split the heap for
a table into multiple heaps. The visibility information would only be in one
of the heaps. This would be a big win if many of the fields were rarely used,
especially if they're rarely used by sequential scans.


> Relation SOME_USERS
> user_id BIGINT PK
> user_nm varchar(32) UNIQUE INDEX
> some_other_attributes...


What's with the fetish with unique indexes? None of this is any different for
unique indexes versus non-unique indexes.


--
greg


---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #10 (permalink)  
Old 04-11-2008, 03:16 AM
Greg Stark
 
Posts: n/a
Default Re: Much Ado About COUNT(*)

"Jonah H. Harris" <jharris@tvi.edu> writes:

> Looking at the message boards, there is significant interest in the COUNT(*)
> aspect. However, rather than solely address the COUNT(*) TODO item, why not fix
> it and add additional functionality found in commercial databases as well? I
> believe Oracle has had this feature since 7.3 and I know people take advantage
> of it.


I think part of the problem is that there's a bunch of features related to
these types of queries and the lines between them blur.

You seem to be talking about putting visibility information inside indexes for
so index-only plans can be performed. But you're also talking about queries
like "select count(*) from foo" with no where clauses. Such a query wouldn't
be helped by index-only scans.

Perhaps you're thinking about caching the total number of records in a global
piece of state like a materialized view? That would be a nice feature but I
think it should done as a general materialized view implementation, not a
special case solution for just this one query.

Perhaps you're thinking of the min/max problem of being able to use indexes to
pick out just the tuples satisfying the min/max constraint. That seems to me
to be one of the more tractable problems in this area but it would still
require lots of work.

I suggest you post a specific query you find is slow. Then discuss how you
think it ought to be executed and why.

--
greg


---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 11:30 PM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Search Engine Optimization by vBSEO 3.2.0
www.UnixAdminTalk.com