Unix Technical Forum

Re: Regex performance issue

This is a discussion on Re: Regex performance issue within the Pgsql Performance forums, part of the PostgreSQL category; --> Hello... I cannot use LIKE , because the order of the match is reversed. The prefix column is containing ...


Go Back   Unix Technical Forum > Database Server Software > PostgreSQL > Pgsql Performance

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 04-19-2008, 09:48 AM
Alexandru Coseru
 
Posts: n/a
Default Re: Regex performance issue

Hello...

I cannot use LIKE , because the order of the match is reversed.
The prefix column is containing telephone destinations.
IE: ^001 - US , ^0039 Italy , etc..

Here is a small sample:

asterisk=> select * from destlist LIMIT 10;
id | id_ent | dir | prefix | country | network | tip
----+--------+-----+------------+-------------+--------------------+-----
1 | -1 | 0 | (^0093) | AFGHANISTAN | AFGHANISTAN | 6
2 | -1 | 0 | (^00937) | AFGHANISTAN | AFGHANISTAN Mobile | 5
3 | -1 | 0 | (^00355) | ALBANIA | ALBANIA | 6
4 | -1 | 0 | (^0035538) | ALBANIA | ALBANIA Mobile | 5
5 | -1 | 0 | (^0035568) | ALBANIA | ALBANIA Mobile | 5
6 | -1 | 0 | (^0035569) | ALBANIA | ALBANIA Mobile | 5
7 | -1 | 0 | (^00213) | ALGERIA | ALGERIA | 6
8 | -1 | 0 | (^0021361) | ALGERIA | ALGERIA Mobile | 5
9 | -1 | 0 | (^0021362) | ALGERIA | ALGERIA Mobile | 5
10 | -1 | 0 | (^0021363) | ALGERIA | ALGERIA Mobile | 5


Now , I have to match a dialednumber (let's say 00213618833) and find
it's destination...(It's algeria mobile).
I tried to make with a query of using LIKE , but i was not able to get
something..


Regards
Alex





----- Original Message -----
From: "Dave Dutcher" <dave@tridecap.com>
To: "'Alexandru Coseru'" <alexandru.coseru@totaltelecom.ro>;
<pgsql-performance@postgresql.org>
Sent: Saturday, December 02, 2006 10:36 PM
Subject: RE: [PERFORM] Regex performance issue


> -----Original Message-----
> From: pgsql-performance-owner@postgresql.org On Behalf Of Alexandru Coseru
> asterisk=> explain analyze SELECT * FROM destlist WHERE
> '0039051248787' ~
> prefix AND id_ent='-2' AND dir=0 ORDER by length(prefix) DESC;
>
>
> QUERY PLAN
> --------------------------------------------------------------
> ----------------------------------------------------------------------
> Sort (cost=7925.07..7925.15 rows=31 width=67) (actual
> time=857.715..857.716 rows=2 loops=1)
> Sort Key: length((prefix)::text)
> -> Bitmap Heap Scan on destlist (cost=60.16..7924.30
> rows=31 width=67)
> (actual time=2.156..857.686 rows=2 loops=1)
> Recheck Cond: ((id_ent = -2) AND (dir = 0))
> Filter: ('0039051248787'::text ~ (prefix)::text)
> -> Bitmap Index Scan on destlist_indx2 (cost=0.00..60.16
> rows=6193 width=0) (actual time=1.961..1.961 rows=5205 loops=1)
> Index Cond: ((id_ent = -2) AND (dir = 0))
> Total runtime: 857.804 ms
> (8 rows)
>
>
> "mmumu" btree (prefix varchar_pattern_ops)
>


I'm surpised Postgres isn't using the index on prefix seeing as the index
uses the varchar_pattern_ops operator class. It could be that the index
isn't selective enough, or is Postgres not able to use an index with Posix
regular expressions? The docs seem to say that it can, but I'd be curious
to see what happens if you use LIKE instead of ~.

Dave





--
No virus found in this incoming message.
Checked by AVG Free Edition.
Version: 7.1.409 / Virus Database: 268.15.4/563 - Release Date: 12/2/2006



---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2 (permalink)  
Old 04-19-2008, 09:48 AM
Oleg Bartunov
 
Posts: n/a
Default Re: Regex performance issue

I may miss something but I'd use tsearch2. Check
intdict dictionary for basic idea - http://www.sai.msu.su/~megera/wiki/Gendict

Oleg
On Sat, 2 Dec 2006, Alexandru Coseru wrote:

> Hello...
>
> I cannot use LIKE , because the order of the match is reversed.
> The prefix column is containing telephone destinations.
> IE: ^001 - US , ^0039 Italy , etc..
>
> Here is a small sample:
>
> asterisk=> select * from destlist LIMIT 10;
> id | id_ent | dir | prefix | country | network | tip
> ----+--------+-----+------------+-------------+--------------------+-----
> 1 | -1 | 0 | (^0093) | AFGHANISTAN | AFGHANISTAN | 6
> 2 | -1 | 0 | (^00937) | AFGHANISTAN | AFGHANISTAN Mobile | 5
> 3 | -1 | 0 | (^00355) | ALBANIA | ALBANIA | 6
> 4 | -1 | 0 | (^0035538) | ALBANIA | ALBANIA Mobile | 5
> 5 | -1 | 0 | (^0035568) | ALBANIA | ALBANIA Mobile | 5
> 6 | -1 | 0 | (^0035569) | ALBANIA | ALBANIA Mobile | 5
> 7 | -1 | 0 | (^00213) | ALGERIA | ALGERIA | 6
> 8 | -1 | 0 | (^0021361) | ALGERIA | ALGERIA Mobile | 5
> 9 | -1 | 0 | (^0021362) | ALGERIA | ALGERIA Mobile | 5
> 10 | -1 | 0 | (^0021363) | ALGERIA | ALGERIA Mobile | 5
>
>
> Now , I have to match a dialednumber (let's say 00213618833) and find
> it's destination...(It's algeria mobile).
> I tried to make with a query of using LIKE , but i was not able to get
> something..
>
>
> Regards
> Alex
>
>
>
>
>
> ----- Original Message ----- From: "Dave Dutcher" <dave@tridecap.com>
> To: "'Alexandru Coseru'" <alexandru.coseru@totaltelecom.ro>;
> <pgsql-performance@postgresql.org>
> Sent: Saturday, December 02, 2006 10:36 PM
> Subject: RE: [PERFORM] Regex performance issue
>
>
>> -----Original Message-----
>> From: pgsql-performance-owner@postgresql.org On Behalf Of Alexandru Coseru
>> asterisk=> explain analyze SELECT * FROM destlist WHERE
>> '0039051248787' ~
>> prefix AND id_ent='-2' AND dir=0 ORDER by length(prefix) DESC;
>>
>>
>> QUERY PLAN
>> --------------------------------------------------------------
>> ----------------------------------------------------------------------
>> Sort (cost=7925.07..7925.15 rows=31 width=67) (actual
>> time=857.715..857.716 rows=2 loops=1)
>> Sort Key: length((prefix)::text)
>> -> Bitmap Heap Scan on destlist (cost=60.16..7924.30
>> rows=31 width=67)
>> (actual time=2.156..857.686 rows=2 loops=1)
>> Recheck Cond: ((id_ent = -2) AND (dir = 0))
>> Filter: ('0039051248787'::text ~ (prefix)::text)
>> -> Bitmap Index Scan on destlist_indx2 (cost=0.00..60.16
>> rows=6193 width=0) (actual time=1.961..1.961 rows=5205 loops=1)
>> Index Cond: ((id_ent = -2) AND (dir = 0))
>> Total runtime: 857.804 ms
>> (8 rows)
>>
>>
>> "mmumu" btree (prefix varchar_pattern_ops)
>>

>
> I'm surpised Postgres isn't using the index on prefix seeing as the index
> uses the varchar_pattern_ops operator class. It could be that the index
> isn't selective enough, or is Postgres not able to use an index with Posix
> regular expressions? The docs seem to say that it can, but I'd be curious
> to see what happens if you use LIKE instead of ~.
>
> Dave
>
>
>
>
>
>


Regards,
Oleg
__________________________________________________ ___________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 05:01 AM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Search Engine Optimization by vBSEO 3.2.0
www.UnixAdminTalk.com