Unix Technical Forum

Optimizing a Search Query

This is a discussion on Optimizing a Search Query within the MySQL forums, part of the Database Server Software category; --> Hello. I have a really hard search query to perform and it's taking at least 4 seconds to get ...


Go Back   Unix Technical Forum > Database Server Software > MySQL

Register FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 02-28-2008, 09:10 AM
battle.chris@gmail.com
 
Posts: n/a
Default Optimizing a Search Query

Hello.

I have a really hard search query to perform and it's taking at least
4 seconds to get a result. Can anyone offer a better approach or
optimize my SQL?

We're searching for candidates with certain skills sets, software
experience and work experience from the following tables:

CREATE TABLE `candidates`
(
`id` integer (11) NOT NULL AUTO_INCREMENT ,
`first_name` varchar (127),
`surname` varchar (127),
`created_date` date,
`date_of_birth` date,
PRIMARY KEY (`id`)
) TYPE=InnoDB CHARACTER SET latin1 COLLATE latin1_swedish_ci;

CREATE TABLE `candidates_skills`
(
`id` integer (11) NOT NULL AUTO_INCREMENT ,
`candidate_id` integer (11) NOT NULL DEFAULT 0,
`skill_id` integer (11) NOT NULL DEFAULT 0,
`rating` tinyint (4) NOT NULL DEFAULT 0,
`years_experience` tinyint (4) NOT NULL DEFAULT 0,
PRIMARY KEY (`id`)
) TYPE=InnoDB CHARACTER SET latin1 COLLATE latin1_swedish_ci;

CREATE TABLE `candidates_softwares`
(
`id` integer (11) NOT NULL AUTO_INCREMENT ,
`candidate_id` integer (11) NOT NULL DEFAULT 0,
`software_id` integer (11) NOT NULL DEFAULT 0,
`rating` tinyint (4) NOT NULL DEFAULT 0,
`years_experience` tinyint (4) NOT NULL DEFAULT 0,
PRIMARY KEY (`id`)
) TYPE=InnoDB CHARACTER SET latin1 COLLATE latin1_swedish_ci;

CREATE TABLE `candidates_work_subcategories`
(
`id` integer (11) NOT NULL AUTO_INCREMENT ,
`candidate_id` integer (11),
`work_subcategory_id` integer (11),
PRIMARY KEY (`id`)
) TYPE=InnoDB CHARACTER SET latin1 COLLATE latin1_swedish_ci;

Both rating and years_experience in the candidates_skills and
candidates_softwares are scores out of 5.

When client's submit their search, sets of skills_id's, software_id's
and work_subcategory_id's are sent back to the server and the query is
performed in the following way:

A view is created to sum of the candidates rating and years_experience
scores for each related skill:

create view skill_view( candidate_id, skill_score) as
SELECT candidates.id, SUM( candidates_skills.rating +
candidates_skills.years_experience) as rating
FROM candidates, candidates_skills
WHERE candidates_skills.candidate_id = candidates.id AND
( candidates_skills.skill_id = 250 or candidates_skills.skill_id =
368)
GROUP by candidates.id

A second view is created to sum of the candidates rating and
years_experience scores for each related software:

create view software_view( candidate_id, software_score) as
SELECT candidates.id, SUM( candidates_softwares.rating +
candidates_softwares.years_experience) as rating
FROM candidates, candidates_softwares
WHERE candidates_softwares.candidate_id = candidates.id
AND ( candidates_softwares.software_id = 120 or
candidates_softwares.software_id = 112)

GROUP by candidates.id

A third view is created to sum of the number of work types applicable
to each candidates:

create view work_type_view( candidate_id, work_type_score) as
SELECT candidates.id, SUM( 1) as rating
FROM candidates, candidates_work_subcategories
WHERE candidates_work_subcategories.candidate_id = candidates.id
AND ( work_subcategory_id = 7 OR work_subcategory_id = 15) group by
candidates.id

And now the skills_view and software views are combined to create a
subtotal score; my understanding is that this needs to be done using a
Full Outer Join as candidates may be present in one view but not in
the other:

create view skill_software_view( candidate_id, sub_score) as
SELECT skill_view.candidate_id, ( if( software_score IS NULL, 0,
software_score) + if( skill_score IS NULL, 0, skill_score)) AS
total_score
FROM skill_view LEFT JOIN software_view ON skill_view.candidate_id =
software_view.candidate_id
UNION
SELECT software_view.candidate_id, ( if( software_score IS NULL, 0,
software_score) + if( skill_score IS NULL, 0, skill_score)) AS
total_score
FROM $skill_view RIGHT JOIN $software_view ON
$skill_view.candidate_id = $software_view.candidate_id

And finally the work type score is included by completing a full outer
join with the work_type_view and the skill_software_view:

create view best_candidates( candidate_id, total_score) as
SELECT work_type_view.candidate_id, ( if( sub_score IS NULL, 0,
sub_score) + if( work_type_score IS NULL, 0, work_type_score)) AS
total_score
FROM work_type_view LEFT JOIN skill_software_view ON
work_type_view.candidate_id = skill_software_view.candidate_id
UNION
SELECT skill_software_view.candidate_id, ( if( sub_score IS NULL, 0,
sub_score) + if( work_type_score IS NULL, 0, work_type_score)) AS
total_score
FROM work_type_view RIGHT JOIN skill_software_view ON
work_type_view.candidate_id = skill_software_view.candidate_id

And that's it! I can now perform queries on the best_candidates view.
Unfortunately these queries are taking at least 4 seconds for even a
few skills, software and work types.

Is there a way to perform the search more efficiently? I'd appreciate
any help or comment!

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2 (permalink)  
Old 02-28-2008, 09:10 AM
Captain Paralytic
 
Posts: n/a
Default Re: Optimizing a Search Query

On 16 Feb, 07:32, battle.ch...@gmail.com wrote:
> Hello.
>
> I have a really hard search query to perform and it's taking at least
> 4 seconds to get a result. Can anyone offer a better approach or
> optimize my SQL?
>
> We're searching for candidates with certain skills sets, software
> experience and work experience from the following tables:
>
> CREATE TABLE `candidates`
> (
> `id` integer (11) NOT NULL AUTO_INCREMENT ,
> `first_name` varchar (127),
> `surname` varchar (127),
> `created_date` date,
> `date_of_birth` date,
> PRIMARY KEY (`id`)
> ) TYPE=InnoDB CHARACTER SET latin1 COLLATE latin1_swedish_ci;
>
> CREATE TABLE `candidates_skills`
> (
> `id` integer (11) NOT NULL AUTO_INCREMENT ,
> `candidate_id` integer (11) NOT NULL DEFAULT 0,
> `skill_id` integer (11) NOT NULL DEFAULT 0,
> `rating` tinyint (4) NOT NULL DEFAULT 0,
> `years_experience` tinyint (4) NOT NULL DEFAULT 0,
> PRIMARY KEY (`id`)
> ) TYPE=InnoDB CHARACTER SET latin1 COLLATE latin1_swedish_ci;
>
> CREATE TABLE `candidates_softwares`
> (
> `id` integer (11) NOT NULL AUTO_INCREMENT ,
> `candidate_id` integer (11) NOT NULL DEFAULT 0,
> `software_id` integer (11) NOT NULL DEFAULT 0,
> `rating` tinyint (4) NOT NULL DEFAULT 0,
> `years_experience` tinyint (4) NOT NULL DEFAULT 0,
> PRIMARY KEY (`id`)
> ) TYPE=InnoDB CHARACTER SET latin1 COLLATE latin1_swedish_ci;
>
> CREATE TABLE `candidates_work_subcategories`
> (
> `id` integer (11) NOT NULL AUTO_INCREMENT ,
> `candidate_id` integer (11),
> `work_subcategory_id` integer (11),
> PRIMARY KEY (`id`)
> ) TYPE=InnoDB CHARACTER SET latin1 COLLATE latin1_swedish_ci;
>
> Both rating and years_experience in the candidates_skills and
> candidates_softwares are scores out of 5.
>
> When client's submit their search, sets of skills_id's, software_id's
> and work_subcategory_id's are sent back to the server and the query is
> performed in the following way:
>
> A view is created to sum of the candidates rating and years_experience
> scores for each related skill:
>
> create view skill_view( candidate_id, skill_score) as
> SELECT candidates.id, SUM( candidates_skills.rating +
> candidates_skills.years_experience) as rating
> FROM candidates, candidates_skills
> WHERE candidates_skills.candidate_id = candidates.id AND
> ( candidates_skills.skill_id = 250 or candidates_skills.skill_id =
> 368)
> GROUP by candidates.id
>
> A second view is created to sum of the candidates rating and
> years_experience scores for each related software:
>
> create view software_view( candidate_id, software_score) as
> SELECT candidates.id, SUM( candidates_softwares.rating +
> candidates_softwares.years_experience) as rating
> FROM candidates, candidates_softwares
> WHERE candidates_softwares.candidate_id = candidates.id
> AND ( candidates_softwares.software_id = 120 or
> candidates_softwares.software_id = 112)
>
> GROUP by candidates.id
>
> A third view is created to sum of the number of work types applicable
> to each candidates:
>
> create view work_type_view( candidate_id, work_type_score) as
> SELECT candidates.id, SUM( 1) as rating
> FROM candidates, candidates_work_subcategories
> WHERE candidates_work_subcategories.candidate_id = candidates.id
> AND ( work_subcategory_id = 7 OR work_subcategory_id = 15) group by
> candidates.id
>
> And now the skills_view and software views are combined to create a
> subtotal score; my understanding is that this needs to be done using a
> Full Outer Join as candidates may be present in one view but not in
> the other:
>
> create view skill_software_view( candidate_id, sub_score) as
> SELECT skill_view.candidate_id, ( if( software_score IS NULL, 0,
> software_score) + if( skill_score IS NULL, 0, skill_score)) AS
> total_score
> FROM skill_view LEFT JOIN software_view ON skill_view.candidate_id =
> software_view.candidate_id
> UNION
> SELECT software_view.candidate_id, ( if( software_score IS NULL, 0,
> software_score) + if( skill_score IS NULL, 0, skill_score)) AS
> total_score
> FROM $skill_view RIGHT JOIN $software_view ON
> $skill_view.candidate_id = $software_view.candidate_id
>
> And finally the work type score is included by completing a full outer
> join with the work_type_view and the skill_software_view:
>
> create view best_candidates( candidate_id, total_score) as
> SELECT work_type_view.candidate_id, ( if( sub_score IS NULL, 0,
> sub_score) + if( work_type_score IS NULL, 0, work_type_score)) AS
> total_score
> FROM work_type_view LEFT JOIN skill_software_view ON
> work_type_view.candidate_id = skill_software_view.candidate_id
> UNION
> SELECT skill_software_view.candidate_id, ( if( sub_score IS NULL, 0,
> sub_score) + if( work_type_score IS NULL, 0, work_type_score)) AS
> total_score
> FROM work_type_view RIGHT JOIN skill_software_view ON
> work_type_view.candidate_id = skill_software_view.candidate_id
>
> And that's it! I can now perform queries on the best_candidates view.
> Unfortunately these queries are taking at least 4 seconds for even a
> few skills, software and work types.
>
> Is there a way to perform the search more efficiently? I'd appreciate
> any help or comment!


Almost certainly the problem is the multiple nested subselects.
You will find many posts on this forum showing how to replace
subselects with LEFT JOINs and there is a specific chapter in the
MySQL manual about it, in which one paragraph states:

"A LEFT [OUTER] JOIN can be faster than an equivalent subquery because
the server might be able to optimize it better - a fact that is not
specific to MySQL Server alone. Prior to SQL-92, outer joins did not
exist, so subqueries were the only way to do certain things. Today,
MySQL Server and many other modern database systems offer a wide range
of outer join types."

So I would dump all the views and re-write the "and finally" as a
single query containing LEFT JOINs and possibly some INNER JOINs.

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 04:00 AM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Search Engine Optimization by vBSEO 3.2.0
www.UnixAdminTalk.com