
02-28-2008, 07:14 AM
|
| |
Re: Is self-join appropriate?
Why not add a where clause with MAX or MIN time_posted?
Michal Stankoviansky wrote:
> Please forgive me if this post is too long. I was trying to solve this
> (probably simple) issue for all day but I'm not as good in SQL as I
> thought I was.
>
> I'm trying to learn JOINs in MySQL 5.0.16 on Windows XP.
>
> Let's say we have a simple web discussion board database.
>
> The 2 kB SQL dump for testing is available here:
> http://www.metallica2.com/stuff/test_dump.sql
>
>
>
> Tables description:
> ===================
>
> --> board - (a better name would be board_posts) contains all the posts
> made by users in topics; first post in a topic has `answer_to` field
> set to zero; all replies in the topic have same value in `answer_to`
> field as the `id` of the first post in a topic
> --> board_topics - contains topics' subjects
> --> users - contains usernames
>
>
>
> Tables' structure:
> ==================
>
>
> --> table `board`:
>
> +-------------+----------------------+------+-----+
> | Field | Type | Null | Key |
> +-------------+----------------------+------+-----+
> | id | int(10) unsigned | NO | PRI |
> | user_id | smallint(5) unsigned | NO | |
> | time_posted | datetime | NO | MUL |
> | post_text | text | NO | |
> | answer_to | int(10) unsigned | NO | |
> +-------------+----------------------+------+-----+
>
> Fields explanation:
> - user_id - user who posted the post
> - answer_to - contains id of the first post in the topic this post
> belongs to
>
>
> --> table `board_topics`:
>
> +---------------+----------------------+------+-----+
> | Field | Type | Null | Key |
> +---------------+----------------------+------+-----+
> | id | int(10) unsigned | NO | PRI |
> | topic_subject | varchar(200) | NO | |
> | categ_id | tinyint(1) unsigned | NO | |
> +---------------+----------------------+------+-----+
>
> Fields explanation:
> - id - same value as the `id` of the first post in this topic in the
> `board` table
> - please ignore categ_id
>
>
> --> table `users`:
>
> +----------+----------------------+------+-----+
> | Field | Type | Null | Key |
> +----------+----------------------+------+-----+
> | id | smallint(5) unsigned | NO | PRI |
> | username | varchar(16) | NO | UNI |
> +----------+----------------------+------+-----+
>
>
>
> What I want:
> ============
>
> I want to get a resultset where each row would represent one discussion
> topic, with username of the user who created the topic (posted the
> first post of the topic and specified a topic subject), with text of
> this first post and in the same row username, text of the post and time
> of the post that is the newest reply in the topic.
>
> So I tried this self-join:
>
> SELECT
> b1.id,
> bt.topic_subject,
> b1.post_text,
> b1.time_posted,
> u1.username,
> b2.id,
> b2.post_text,
> u2.username,
> b2.time_posted,
> b2.answer_to
> FROM board b1
> JOIN board_topics bt
> ON b1.id = bt.id
> JOIN users u1
> ON b1.user_id = u1.id
> LEFT JOIN board b2
> ON b2.answer_to = b1.id
> LEFT JOIN users u2
> ON b2.user_id = u2.id
> ORDER BY b2.time_posted DESC, b2.id DESC
>
> ...which gives this:
> http://www.metallica2.com/stuff/query01.html
>
> As you can see, I *almost* got what I wanted...but it contains ALL
> replies joined with the first post, and I only wanted to list the
> topics with the *most recent* reply LEFT JOIN-ed - each topic only
> once.
>
> So i tried to GROUP it BY the topic id [b1.id]:
>
> SELECT
> b1.id,
> bt.topic_subject,
> b1.post_text,
> b1.time_posted,
> u1.username,
> b2.id,
> b2.post_text,
> u2.username,
> b2.time_posted,
> b2.answer_to
> FROM board b1
> JOIN board_topics bt
> ON b1.id = bt.id
> JOIN users u1
> ON b1.user_id = u1.id
> LEFT JOIN board b2
> ON b2.answer_to = b1.id
> LEFT JOIN users u2
> ON b2.user_id = u2.id
> GROUP BY b1.id /* <--- added this line */
> ORDER BY b2.time_posted DESC, b2.id DESC
>
> ...which gave me this:
> http://www.metallica2.com/stuff/query02.html
>
> ...which looks a bit better but isn't correct, because first posts end
> up joined with the first reply (second post in the topic) insted of the
> most recent reply (the last post in the topic).
>
> I thought that some HAVING clause might help, but all I was getting
> were errors.
>
> Of course I could loop in the PHP code to get the most recent posts in
> the topics or maybe use a subquery, but I hoped there MUST be a better
> way to get these data. So...is this possible with a single statement?
>
> Thanks for reading through all this. Happy New Year to everyone.
>
> P.S.: If you know where I could read about some *real-world* examples
> of joins and grouping, I'd be grateful for a link. Thank you.
> |