View Single Post

   
  #1 (permalink)  
Old 02-28-2008, 07:14 AM
Michal Stankoviansky
 
Posts: n/a
Default Is self-join appropriate?

Please forgive me if this post is too long. I was trying to solve this
(probably simple) issue for all day but I'm not as good in SQL as I
thought I was.

I'm trying to learn JOINs in MySQL 5.0.16 on Windows XP.

Let's say we have a simple web discussion board database.

The 2 kB SQL dump for testing is available here:
http://www.metallica2.com/stuff/test_dump.sql



Tables description:
===================

--> board - (a better name would be board_posts) contains all the posts
made by users in topics; first post in a topic has `answer_to` field
set to zero; all replies in the topic have same value in `answer_to`
field as the `id` of the first post in a topic
--> board_topics - contains topics' subjects
--> users - contains usernames



Tables' structure:
==================


--> table `board`:

+-------------+----------------------+------+-----+
| Field | Type | Null | Key |
+-------------+----------------------+------+-----+
| id | int(10) unsigned | NO | PRI |
| user_id | smallint(5) unsigned | NO | |
| time_posted | datetime | NO | MUL |
| post_text | text | NO | |
| answer_to | int(10) unsigned | NO | |
+-------------+----------------------+------+-----+

Fields explanation:
- user_id - user who posted the post
- answer_to - contains id of the first post in the topic this post
belongs to


--> table `board_topics`:

+---------------+----------------------+------+-----+
| Field | Type | Null | Key |
+---------------+----------------------+------+-----+
| id | int(10) unsigned | NO | PRI |
| topic_subject | varchar(200) | NO | |
| categ_id | tinyint(1) unsigned | NO | |
+---------------+----------------------+------+-----+

Fields explanation:
- id - same value as the `id` of the first post in this topic in the
`board` table
- please ignore categ_id


--> table `users`:

+----------+----------------------+------+-----+
| Field | Type | Null | Key |
+----------+----------------------+------+-----+
| id | smallint(5) unsigned | NO | PRI |
| username | varchar(16) | NO | UNI |
+----------+----------------------+------+-----+



What I want:
============

I want to get a resultset where each row would represent one discussion
topic, with username of the user who created the topic (posted the
first post of the topic and specified a topic subject), with text of
this first post and in the same row username, text of the post and time
of the post that is the newest reply in the topic.

So I tried this self-join:

SELECT
b1.id,
bt.topic_subject,
b1.post_text,
b1.time_posted,
u1.username,
b2.id,
b2.post_text,
u2.username,
b2.time_posted,
b2.answer_to
FROM board b1
JOIN board_topics bt
ON b1.id = bt.id
JOIN users u1
ON b1.user_id = u1.id
LEFT JOIN board b2
ON b2.answer_to = b1.id
LEFT JOIN users u2
ON b2.user_id = u2.id
ORDER BY b2.time_posted DESC, b2.id DESC

....which gives this:
http://www.metallica2.com/stuff/query01.html

As you can see, I *almost* got what I wanted...but it contains ALL
replies joined with the first post, and I only wanted to list the
topics with the *most recent* reply LEFT JOIN-ed - each topic only
once.

So i tried to GROUP it BY the topic id [b1.id]:

SELECT
b1.id,
bt.topic_subject,
b1.post_text,
b1.time_posted,
u1.username,
b2.id,
b2.post_text,
u2.username,
b2.time_posted,
b2.answer_to
FROM board b1
JOIN board_topics bt
ON b1.id = bt.id
JOIN users u1
ON b1.user_id = u1.id
LEFT JOIN board b2
ON b2.answer_to = b1.id
LEFT JOIN users u2
ON b2.user_id = u2.id
GROUP BY b1.id /* <--- added this line */
ORDER BY b2.time_posted DESC, b2.id DESC

....which gave me this:
http://www.metallica2.com/stuff/query02.html

....which looks a bit better but isn't correct, because first posts end
up joined with the first reply (second post in the topic) insted of the
most recent reply (the last post in the topic).

I thought that some HAVING clause might help, but all I was getting
were errors.

Of course I could loop in the PHP code to get the most recent posts in
the topics or maybe use a subquery, but I hoped there MUST be a better
way to get these data. So...is this possible with a single statement?

Thanks for reading through all this. Happy New Year to everyone.

P.S.: If you know where I could read about some *real-world* examples
of joins and grouping, I'd be grateful for a link. Thank you.

Reply With Quote