Unix Technical Forum

Optmal tags design?

This is a discussion on Optmal tags design? within the Pgsql Performance forums, part of the PostgreSQL category; --> I am planning to add a tags (as in the "web 2.0" thing) feature to my web based application. ...


Go Back   Unix Technical Forum > Database Server Software > PostgreSQL > Pgsql Performance

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 04-19-2008, 11:14 AM
lists@on-track.ca
 
Posts: n/a
Default Optmal tags design?

I am planning to add a tags (as in the "web 2.0" thing) feature to my web
based application. I would like some feedback from the experts here on
what the best database design for that would be.

The possibilities I have come up with are:
* A tags table containing the tag and id number of what it links to.
select pid from tags where tag='bla'
select tag from tags where pid=xxx.

* a tags table where each tag exists only once, and a table with the tag
ID and picture ID to link them together.

select pid from tags inner join picture_tags using(tag_id) where tag='bla'
select tag from tags inner join picture_tags using(tag_id) where pid='xxx'

* A full text index in the picture table containing the tags

select pid from pictures where tags @@ to_tsquery('bla')
(or the non-fti version)
select pid from pictures where tags ~* '.*bla.*'

select tags from pictures where pid=xxx;

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2 (permalink)  
Old 04-19-2008, 11:14 AM
Mark Lewis
 
Posts: n/a
Default Re: Optmal tags design?



On Wed, 2007-07-18 at 14:26 -0700, lists@on-track.ca wrote:
> I am planning to add a tags (as in the "web 2.0" thing) feature to my web
> based application. I would like some feedback from the experts here on
> what the best database design for that would be.
>
> The possibilities I have come up with are:
> * A tags table containing the tag and id number of what it links to.
> select pid from tags where tag='bla'
> select tag from tags where pid=xxx.


Properly indexed, this schema can handle common lookups such as 'show me
all pictures with tag X'.

The problem here is that any operation involving all tags (for example,
'show me a list of all tags in the database') may be slow and/or
awkward.

> * a tags table where each tag exists only once, and a table with the tag
> ID and picture ID to link them together.


This sounds the most reasonable, and is the "right way" to do it in the
relational model. Can handle common queries such as 'show me all
pictures with tag X'. Can also easily perform queries such as 'show me
a list of all tags in the database'.

This also gives you a logical place to store additional information for
each tag, such as the user and timestamp of the first usage of the tag,
or a cache of the approximate number of pictures with that tag (for a
future performance optimization, maybe), or whatever else you can think
up that might be useful to store on a per-tag level.

> select pid from tags inner join picture_tags using(tag_id) where tag='bla'
> select tag from tags inner join picture_tags using(tag_id) where pid='xxx'
>
> * A full text index in the picture table containing the tags
>
> select pid from pictures where tags @@ to_tsquery('bla')
> (or the non-fti version)
> select pid from pictures where tags ~* '.*bla.*'
>
> select tags from pictures where pid=xxx;


I'm not experienced with full text indexing so perhaps I'm wrong about
this, but it seems like it would give you approximately the same
flexibility as #1 in terms of your data model. The only reason I can
think of why you might want this over #1 would be for a performance
improvement, but if there's a reasonably small number of distinct tags
and/or distinct tags per picture I can't imagine it being much faster
than #1.

-- Mark Lewis

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #3 (permalink)  
Old 04-19-2008, 11:14 AM
Bryan Murphy
 
Posts: n/a
Default Re: Optmal tags design?

We store tags on our items like this like this:

Tag.ID INT NOT NULL PRIMARY KEY
Tag.Value TEXT LCASE NOT NULL UNIQUE

Item.ID INT NOT NULL PRIMARY KEY

ItemTagBinding.ItemID INT NOT NULL REFERENCES Item.ID
ItemTagBinding.TagID INT NOT NULL REFERENCES Tag.ID
ItemTagBinding.ItemID + ItemTagBinding.TagID UNIQUE

with appropriate indexes on the columns we need to frequently query.

We have about 3 million tag bindings right now, and have not run into any
performance issues related to tagging other than generating tag clouds
(which we pre-calculate anyway).

I'll have to get back to you when we get up to 10's, or even 100's of
millions and let you know how it scaled.

Bryan

On 7/18/07, lists@on-track.ca <lists@on-track.ca> wrote:
>
> I am planning to add a tags (as in the "web 2.0" thing) feature to my web
> based application. I would like some feedback from the experts here on
> what the best database design for that would be.
>
> The possibilities I have come up with are:
> * A tags table containing the tag and id number of what it links to.
> select pid from tags where tag='bla'
> select tag from tags where pid=xxx.
>
> * a tags table where each tag exists only once, and a table with the tag
> ID and picture ID to link them together.
>
> select pid from tags inner join picture_tags using(tag_id) where tag='bla'
> select tag from tags inner join picture_tags using(tag_id) where pid='xxx'
>
> * A full text index in the picture table containing the tags
>
> select pid from pictures where tags @@ to_tsquery('bla')
> (or the non-fti version)
> select pid from pictures where tags ~* '.*bla.*'
>
> select tags from pictures where pid=xxx;
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
> choose an index scan if your joining column's datatypes do not
> match
>


Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 05:49 AM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Search Engine Optimization by vBSEO 3.2.0
www.UnixAdminTalk.com