Unix Technical Forum

How to speed up this "translation" query?

This is a discussion on How to speed up this "translation" query? within the Pgsql Performance forums, part of the PostgreSQL category; --> I need some expert advice on how to optimize a "translation" query (this word choice will become clear shortly, ...


Go Back   Unix Technical Forum > Database Server Software > PostgreSQL > Pgsql Performance

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 04-19-2008, 09:11 AM
tlm
 
Posts: n/a
Default How to speed up this "translation" query?

I need some expert advice on how to optimize a "translation" query (this
word choice will become clear shortly, I hope).

Say I have a HUMONGOUS table of foreign language "translations" (call it
TRANS) with records like these:

meaning_id: 1
language_id: 5
translation: jidoosha

meaning_id: 1
language_id: 2
translation: voiture

meaning_id: 1
language_id: 5
translation: kuruma

meaning_id: 2
language_id: 2
translation: chat

meaning_id: 2
language_id: 5
translation: neko

meaning_id: 2
language_id: 3
translation: katz

meaning_id: 3
language_id: 4
translation: pesce

meaning_id: 3
language_id: 2
translation: poisson

meaning_id: 3
language_id: 5
translation: sakana

For the sake of this description, let's assume that the records above are
all the records in TRANS (though in fact the number of records in TRANS is
really about ten million times greater).

Now suppose I have a tiny table called INPUT consisting of single text field
(say, word). E.g. suppose that INPUT looks like this:

katz
voiture
pesce

Now, let's fix a language_id, say 5. This is the "target" language_id.
Given this target language_id, and this particular INPUT table, I want the
results of the query to be something like this:

neko
jidoosha
kuruma
sakana

I.e. for each word W in INPUT, the query must first find each record R in
TRANS that has W as its translation field; then find each record Q in
TRANS whose language_id is 5 (the target language_id) AND has the same
meaning_id as R does. E.g. if W is 'katz', then R is

meaning_id: 2
language_id: 3
translation: katz

and therefore the desired Q is

meaning_id: 2
language_id: 5
translation: neko

....and so on.

The only difficulty here is that performance is critical, and in real
life, TRANS has around 50M records (and growing), while INPUT has
typically between 500 and 1000 records.

Any advice on how to make this as fast as possible would be much
appreciated.

Thanks!

G.

P.S. Just to show that this post is not just from a college student trying
to get around doing homework, below I post my most successful query so far.
It works, but it's performance isn't great. And it is annoyingly complex,
to boot; I'm very much the SQL noob, and if nothing else, at least I'd like
to learn to write "better" (i.e. more elegant, more legible, more
clueful) SQL that this:

SELECT q3.translation, q2.otherstuff
FROM
(
SELECT INPUT.word, q1.meaning_id, INPUT.otherstuff
FROM
INPUT
INNER JOIN
(
SELECT translation, meaning_id
FROM TRANS
WHERE translation IN (SELECT word FROM INPUT)
) AS q1
ON INPUT.word = q1.translation
) AS q2
LEFT JOIN
(
SELECT translation, meaning_id
FROM TRANS
WHERE language_id=5
) AS q3
ON q2.meaning_id=q3.meaning_id;

As you can see, there are additional fields that I didn't mention in my
original description (e.g. INPUT.otherstuff). Also the above is actually a
subquery in a larger query, but it is by far, the worst bottleneck. Last,
there's an index on TRANS(translation).

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2 (permalink)  
Old 04-19-2008, 09:11 AM
Niklas Johansson
 
Posts: n/a
Default Re: How to speed up this "translation" query?


On 1 aug 2006, at 20.09, tlm wrote:
> SELECT q3.translation, q2.otherstuff
> FROM
> (
> SELECT INPUT.word, q1.meaning_id, INPUT.otherstuff
> FROM
> INPUT
> INNER JOIN
> (
> SELECT translation, meaning_id
> FROM TRANS
> WHERE translation IN (SELECT word FROM INPUT)
> ) AS q1
> ON INPUT.word = q1.translation
> ) AS q2
> LEFT JOIN
> (
> SELECT translation, meaning_id
> FROM TRANS
> WHERE language_id=5
> ) AS q3
> ON q2.meaning_id=q3.meaning_id;


Maybe I'm not following you properly, but I think you've made things
a little bit more complicated than they need to be. The nested sub-
selects look a little nasty.

Now, you didn't provide any explain output but I think the following
SQL will achieve the same result, and hopefully produce a better plan:

SELECT t2.translation, i.otherstuff
FROM input i INNER JOIN trans t ON i.word=t.translation
INNER JOIN trans t2 ON t.meaning_id=t2.meaning_id
WHERE t2.language_id=5;

The query will also benefit from indices on trans.meaning_id and
trans.language_id. Also make sure the tables are vacuumed and
analyzed, to allow the planner to make good estimates.



Sincerely,

Niklas Johansson





---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 04:47 AM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Search Engine Optimization by vBSEO 3.2.0
www.UnixAdminTalk.com