This is a discussion on Optimizing a Search Query within the MySQL forums, part of the Database Server Software category; --> Hello. I have a really hard search query to perform and it's taking at least 4 seconds to get ...
| |||||||
| Register | FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| Hello. I have a really hard search query to perform and it's taking at least 4 seconds to get a result. Can anyone offer a better approach or optimize my SQL? We're searching for candidates with certain skills sets, software experience and work experience from the following tables: CREATE TABLE `candidates` ( `id` integer (11) NOT NULL AUTO_INCREMENT , `first_name` varchar (127), `surname` varchar (127), `created_date` date, `date_of_birth` date, PRIMARY KEY (`id`) ) TYPE=InnoDB CHARACTER SET latin1 COLLATE latin1_swedish_ci; CREATE TABLE `candidates_skills` ( `id` integer (11) NOT NULL AUTO_INCREMENT , `candidate_id` integer (11) NOT NULL DEFAULT 0, `skill_id` integer (11) NOT NULL DEFAULT 0, `rating` tinyint (4) NOT NULL DEFAULT 0, `years_experience` tinyint (4) NOT NULL DEFAULT 0, PRIMARY KEY (`id`) ) TYPE=InnoDB CHARACTER SET latin1 COLLATE latin1_swedish_ci; CREATE TABLE `candidates_softwares` ( `id` integer (11) NOT NULL AUTO_INCREMENT , `candidate_id` integer (11) NOT NULL DEFAULT 0, `software_id` integer (11) NOT NULL DEFAULT 0, `rating` tinyint (4) NOT NULL DEFAULT 0, `years_experience` tinyint (4) NOT NULL DEFAULT 0, PRIMARY KEY (`id`) ) TYPE=InnoDB CHARACTER SET latin1 COLLATE latin1_swedish_ci; CREATE TABLE `candidates_work_subcategories` ( `id` integer (11) NOT NULL AUTO_INCREMENT , `candidate_id` integer (11), `work_subcategory_id` integer (11), PRIMARY KEY (`id`) ) TYPE=InnoDB CHARACTER SET latin1 COLLATE latin1_swedish_ci; Both rating and years_experience in the candidates_skills and candidates_softwares are scores out of 5. When client's submit their search, sets of skills_id's, software_id's and work_subcategory_id's are sent back to the server and the query is performed in the following way: A view is created to sum of the candidates rating and years_experience scores for each related skill: create view skill_view( candidate_id, skill_score) as SELECT candidates.id, SUM( candidates_skills.rating + candidates_skills.years_experience) as rating FROM candidates, candidates_skills WHERE candidates_skills.candidate_id = candidates.id AND ( candidates_skills.skill_id = 250 or candidates_skills.skill_id = 368) GROUP by candidates.id A second view is created to sum of the candidates rating and years_experience scores for each related software: create view software_view( candidate_id, software_score) as SELECT candidates.id, SUM( candidates_softwares.rating + candidates_softwares.years_experience) as rating FROM candidates, candidates_softwares WHERE candidates_softwares.candidate_id = candidates.id AND ( candidates_softwares.software_id = 120 or candidates_softwares.software_id = 112) GROUP by candidates.id A third view is created to sum of the number of work types applicable to each candidates: create view work_type_view( candidate_id, work_type_score) as SELECT candidates.id, SUM( 1) as rating FROM candidates, candidates_work_subcategories WHERE candidates_work_subcategories.candidate_id = candidates.id AND ( work_subcategory_id = 7 OR work_subcategory_id = 15) group by candidates.id And now the skills_view and software views are combined to create a subtotal score; my understanding is that this needs to be done using a Full Outer Join as candidates may be present in one view but not in the other: create view skill_software_view( candidate_id, sub_score) as SELECT skill_view.candidate_id, ( if( software_score IS NULL, 0, software_score) + if( skill_score IS NULL, 0, skill_score)) AS total_score FROM skill_view LEFT JOIN software_view ON skill_view.candidate_id = software_view.candidate_id UNION SELECT software_view.candidate_id, ( if( software_score IS NULL, 0, software_score) + if( skill_score IS NULL, 0, skill_score)) AS total_score FROM $skill_view RIGHT JOIN $software_view ON $skill_view.candidate_id = $software_view.candidate_id And finally the work type score is included by completing a full outer join with the work_type_view and the skill_software_view: create view best_candidates( candidate_id, total_score) as SELECT work_type_view.candidate_id, ( if( sub_score IS NULL, 0, sub_score) + if( work_type_score IS NULL, 0, work_type_score)) AS total_score FROM work_type_view LEFT JOIN skill_software_view ON work_type_view.candidate_id = skill_software_view.candidate_id UNION SELECT skill_software_view.candidate_id, ( if( sub_score IS NULL, 0, sub_score) + if( work_type_score IS NULL, 0, work_type_score)) AS total_score FROM work_type_view RIGHT JOIN skill_software_view ON work_type_view.candidate_id = skill_software_view.candidate_id And that's it! I can now perform queries on the best_candidates view. Unfortunately these queries are taking at least 4 seconds for even a few skills, software and work types. Is there a way to perform the search more efficiently? I'd appreciate any help or comment! |
| ||||
| On 16 Feb, 07:32, battle.ch...@gmail.com wrote: > Hello. > > I have a really hard search query to perform and it's taking at least > 4 seconds to get a result. Can anyone offer a better approach or > optimize my SQL? > > We're searching for candidates with certain skills sets, software > experience and work experience from the following tables: > > CREATE TABLE `candidates` > ( > `id` integer (11) NOT NULL AUTO_INCREMENT , > `first_name` varchar (127), > `surname` varchar (127), > `created_date` date, > `date_of_birth` date, > PRIMARY KEY (`id`) > ) TYPE=InnoDB CHARACTER SET latin1 COLLATE latin1_swedish_ci; > > CREATE TABLE `candidates_skills` > ( > `id` integer (11) NOT NULL AUTO_INCREMENT , > `candidate_id` integer (11) NOT NULL DEFAULT 0, > `skill_id` integer (11) NOT NULL DEFAULT 0, > `rating` tinyint (4) NOT NULL DEFAULT 0, > `years_experience` tinyint (4) NOT NULL DEFAULT 0, > PRIMARY KEY (`id`) > ) TYPE=InnoDB CHARACTER SET latin1 COLLATE latin1_swedish_ci; > > CREATE TABLE `candidates_softwares` > ( > `id` integer (11) NOT NULL AUTO_INCREMENT , > `candidate_id` integer (11) NOT NULL DEFAULT 0, > `software_id` integer (11) NOT NULL DEFAULT 0, > `rating` tinyint (4) NOT NULL DEFAULT 0, > `years_experience` tinyint (4) NOT NULL DEFAULT 0, > PRIMARY KEY (`id`) > ) TYPE=InnoDB CHARACTER SET latin1 COLLATE latin1_swedish_ci; > > CREATE TABLE `candidates_work_subcategories` > ( > `id` integer (11) NOT NULL AUTO_INCREMENT , > `candidate_id` integer (11), > `work_subcategory_id` integer (11), > PRIMARY KEY (`id`) > ) TYPE=InnoDB CHARACTER SET latin1 COLLATE latin1_swedish_ci; > > Both rating and years_experience in the candidates_skills and > candidates_softwares are scores out of 5. > > When client's submit their search, sets of skills_id's, software_id's > and work_subcategory_id's are sent back to the server and the query is > performed in the following way: > > A view is created to sum of the candidates rating and years_experience > scores for each related skill: > > create view skill_view( candidate_id, skill_score) as > SELECT candidates.id, SUM( candidates_skills.rating + > candidates_skills.years_experience) as rating > FROM candidates, candidates_skills > WHERE candidates_skills.candidate_id = candidates.id AND > ( candidates_skills.skill_id = 250 or candidates_skills.skill_id = > 368) > GROUP by candidates.id > > A second view is created to sum of the candidates rating and > years_experience scores for each related software: > > create view software_view( candidate_id, software_score) as > SELECT candidates.id, SUM( candidates_softwares.rating + > candidates_softwares.years_experience) as rating > FROM candidates, candidates_softwares > WHERE candidates_softwares.candidate_id = candidates.id > AND ( candidates_softwares.software_id = 120 or > candidates_softwares.software_id = 112) > > GROUP by candidates.id > > A third view is created to sum of the number of work types applicable > to each candidates: > > create view work_type_view( candidate_id, work_type_score) as > SELECT candidates.id, SUM( 1) as rating > FROM candidates, candidates_work_subcategories > WHERE candidates_work_subcategories.candidate_id = candidates.id > AND ( work_subcategory_id = 7 OR work_subcategory_id = 15) group by > candidates.id > > And now the skills_view and software views are combined to create a > subtotal score; my understanding is that this needs to be done using a > Full Outer Join as candidates may be present in one view but not in > the other: > > create view skill_software_view( candidate_id, sub_score) as > SELECT skill_view.candidate_id, ( if( software_score IS NULL, 0, > software_score) + if( skill_score IS NULL, 0, skill_score)) AS > total_score > FROM skill_view LEFT JOIN software_view ON skill_view.candidate_id = > software_view.candidate_id > UNION > SELECT software_view.candidate_id, ( if( software_score IS NULL, 0, > software_score) + if( skill_score IS NULL, 0, skill_score)) AS > total_score > FROM $skill_view RIGHT JOIN $software_view ON > $skill_view.candidate_id = $software_view.candidate_id > > And finally the work type score is included by completing a full outer > join with the work_type_view and the skill_software_view: > > create view best_candidates( candidate_id, total_score) as > SELECT work_type_view.candidate_id, ( if( sub_score IS NULL, 0, > sub_score) + if( work_type_score IS NULL, 0, work_type_score)) AS > total_score > FROM work_type_view LEFT JOIN skill_software_view ON > work_type_view.candidate_id = skill_software_view.candidate_id > UNION > SELECT skill_software_view.candidate_id, ( if( sub_score IS NULL, 0, > sub_score) + if( work_type_score IS NULL, 0, work_type_score)) AS > total_score > FROM work_type_view RIGHT JOIN skill_software_view ON > work_type_view.candidate_id = skill_software_view.candidate_id > > And that's it! I can now perform queries on the best_candidates view. > Unfortunately these queries are taking at least 4 seconds for even a > few skills, software and work types. > > Is there a way to perform the search more efficiently? I'd appreciate > any help or comment! Almost certainly the problem is the multiple nested subselects. You will find many posts on this forum showing how to replace subselects with LEFT JOINs and there is a specific chapter in the MySQL manual about it, in which one paragraph states: "A LEFT [OUTER] JOIN can be faster than an equivalent subquery because the server might be able to optimize it better - a fact that is not specific to MySQL Server alone. Prior to SQL-92, outer joins did not exist, so subqueries were the only way to do certain things. Today, MySQL Server and many other modern database systems offer a wide range of outer join types." So I would dump all the views and re-write the "and finally" as a single query containing LEFT JOINs and possibly some INNER JOINs. |