Unix Technical Forum

Performance of REGEXP_LIKE vs LIKE?

This is a discussion on Performance of REGEXP_LIKE vs LIKE? within the Oracle Miscellaneous forums, part of the Oracle Database category; --> Hello fellow netters, I'm curious if anyone has done any performance testing of REGEXP_LIKE vs LIKE. There are certain ...


Go Back   Unix Technical Forum > Database Server Software > Oracle Database > Oracle Miscellaneous

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 04-08-2008, 12:37 PM
zstringer999@gmail.com
 
Posts: n/a
Default Performance of REGEXP_LIKE vs LIKE?

Hello fellow netters,

I'm curious if anyone has done any performance testing of REGEXP_LIKE
vs LIKE. There are certain situations where the syntax of REGEXP_LIKE
is cleaner and shorter than the comparable LIKE expression. A common
search request for us involves doing a case-insensitive, wildcard
search of a 3 million record table, for a series of text strings.

For example:

SELECT *
FROM big_table bt
WHERE UPPER(TRIM(bt.txt)) LIKE '%ABRASION%'
OR UPPER(TRIM(bt.txt)) LIKE '%DERMATOLOGICAL%'
OR UPPER(TRIM(bt.txt)) LIKE '%PSORIASIS%'

The equivalent regular expression search is:

SELECT *
FROM big_table bt
WHERE REGEXP_LIKE(UPPER(TRIM(bt.txt)),
'(ABRASION|DERMATOLOGICAL|PSORIASIS)')

The regular expression syntax is cleaner, especially when you have a
lot of strings to search for! However, the LIKE expression runs in 20
seconds, while the REGEXP_LIKE one runs in 60 seconds. Has anyone
else noticed this? Any way to speed it up?

BTW there's some kind of limit to the string you can pass to
REGEXP_LIKE. You'll get the error message: ORA-12733: regular
expression too long

Thanks,
Zstringer

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2 (permalink)  
Old 04-08-2008, 12:37 PM
sybrandb@hccnet.nl
 
Posts: n/a
Default Re: Performance of REGEXP_LIKE vs LIKE?

On 18 Apr 2007 13:52:41 -0700, "zstringer999@gmail.com"
<zstringer999@gmail.com> wrote:

>Hello fellow netters,
>
>I'm curious if anyone has done any performance testing of REGEXP_LIKE
>vs LIKE. There are certain situations where the syntax of REGEXP_LIKE
>is cleaner and shorter than the comparable LIKE expression. A common
>search request for us involves doing a case-insensitive, wildcard
>search of a 3 million record table, for a series of text strings.
>
>For example:
>
>SELECT *
> FROM big_table bt
> WHERE UPPER(TRIM(bt.txt)) LIKE '%ABRASION%'
> OR UPPER(TRIM(bt.txt)) LIKE '%DERMATOLOGICAL%'
> OR UPPER(TRIM(bt.txt)) LIKE '%PSORIASIS%'
>
>The equivalent regular expression search is:
>
>SELECT *
> FROM big_table bt
> WHERE REGEXP_LIKE(UPPER(TRIM(bt.txt)),
> '(ABRASION|DERMATOLOGICAL|PSORIASIS)')
>
>The regular expression syntax is cleaner, especially when you have a
>lot of strings to search for! However, the LIKE expression runs in 20
>seconds, while the REGEXP_LIKE one runs in 60 seconds. Has anyone
>else noticed this? Any way to speed it up?
>
>BTW there's some kind of limit to the string you can pass to
>REGEXP_LIKE. You'll get the error message: ORA-12733: regular
>expression too long
>
>Thanks,
>Zstringer


Regular expressions are implemented using Java. You force Oracle to
call Java, an interpreted language. The delay would be caused by the
architecture.

--
Sybrand Bakker
Senior Oracle DBA
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #3 (permalink)  
Old 04-08-2008, 12:37 PM
Thomas Kellerer
 
Posts: n/a
Default Re: Performance of REGEXP_LIKE vs LIKE?

On 18.04.2007 22:52 zstringer999@gmail.com wrote:
> The regular expression syntax is cleaner, especially when you have a
> lot of strings to search for! However, the LIKE expression runs in 20
> seconds, while the REGEXP_LIKE one runs in 60 seconds. Has anyone
> else noticed this? Any way to speed it up?
>


I thought you should be able to create a function based index to support
the regex match but I'm not sure (could have been Postgres where I did
that the last time )

Thomas
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #4 (permalink)  
Old 04-08-2008, 12:37 PM
keithjgordon@hotmail.co.uk
 
Posts: n/a
Default Re: Performance of REGEXP_LIKE vs LIKE?

Hello.

You write that "Regular expressions are implemented using Java. You
force Oracle to call Java, an interpreted language. The delay would be
caused by the architecture."

I don't think this is always correct. I am using Oracle XE at home and
it doesn't install Java and you can't write Java stored procedures.
You can still use the "regexp_like" function however. I think
therefore that in Oracle XE the regular expressions are not written in
Java. Someone else can confirm this is the same in Enterprise and
Standard Oracle versions.

Please correct me if I am wrong - I am still learning Oracle.

Thank you
John

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #5 (permalink)  
Old 04-08-2008, 12:37 PM
zstringer999@gmail.com
 
Posts: n/a
Default Re: Performance of REGEXP_LIKE vs LIKE?

On Apr 19, 7:40 am, keithjgor...@hotmail.co.uk wrote:
> Hello.
>
> You write that "Regular expressions are implemented using Java. You
> force Oracle to call Java, an interpreted language. The delay would be
> caused by the architecture."
>
> I don't think this is always correct. I am using Oracle XE at home and
> it doesn't install Java and you can't write Java stored procedures.
> You can still use the "regexp_like" function however. I think
> therefore that in Oracle XE the regular expressions are not written in
> Java. Someone else can confirm this is the same in Enterprise and
> Standard Oracle versions.
>
> Please correct me if I am wrong - I am still learning Oracle.
>
> Thank you
> John


I forgot to mention that I'm using Oracle 10g. I've seen
implementations of regular expressions in earlier versions of Oracle
that were done in Java. However, in 10G I think that they're
implemented using PL/SQL. Can anyone confirm?

Peter

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #6 (permalink)  
Old 04-08-2008, 12:37 PM
Vladimir M. Zakharychev
 
Posts: n/a
Default Re: Performance of REGEXP_LIKE vs LIKE?

On Apr 19, 6:02 pm, "zstringer...@gmail.com" <zstringer...@gmail.com>
wrote:
> On Apr 19, 7:40 am, keithjgor...@hotmail.co.uk wrote:
>
>
>
> > Hello.

>
> > You write that "Regular expressions are implemented using Java. You
> > force Oracle to call Java, an interpreted language. The delay would be
> > caused by the architecture."

>
> > I don't think this is always correct. I am using Oracle XE at home and
> > it doesn't install Java and you can't write Java stored procedures.
> > You can still use the "regexp_like" function however. I think
> > therefore that in Oracle XE the regular expressions are not written in
> > Java. Someone else can confirm this is the same in Enterprise and
> > Standard Oracle versions.

>
> > Please correct me if I am wrong - I am still learning Oracle.

>
> > Thank you
> > John

>
> I forgot to mention that I'm using Oracle 10g. I've seen
> implementations of regular expressions in earlier versions of Oracle
> that were done in Java. However, in 10G I think that they're
> implemented using PL/SQL. Can anyone confirm?
>
> Peter


Actually, they are implemented in C and built into the database kernel
in all editions. As for the origins of the implementation - hard to
tell, but very unlikely that it's Perl.

Regards,
Vladimir M. Zakharychev
N-Networks, makers of Dynamic PSP(tm)
http://www.dynamicpsp.com

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #7 (permalink)  
Old 04-08-2008, 12:37 PM
Vladimir M. Zakharychev
 
Posts: n/a
Default Re: Performance of REGEXP_LIKE vs LIKE?

On Apr 19, 12:52 am, "zstringer...@gmail.com" <zstringer...@gmail.com>
wrote:
> Hello fellow netters,
>
> I'm curious if anyone has done any performance testing of REGEXP_LIKE
> vs LIKE. There are certain situations where the syntax of REGEXP_LIKE
> is cleaner and shorter than the comparable LIKE expression. A common
> search request for us involves doing a case-insensitive, wildcard
> search of a 3 million record table, for a series of text strings.
>
> For example:
>
> SELECT *


(> FROM big_table bt
> WHERE UPPER(TRIM(bt.txt)) LIKE '%ABRASION%'
> OR UPPER(TRIM(bt.txt)) LIKE '%DERMATOLOGICAL%'
> OR UPPER(TRIM(bt.txt)) LIKE '%PSORIASIS%'
>
> The equivalent regular expression search is:
>
> SELECT *
> FROM big_table bt
> WHERE REGEXP_LIKE(UPPER(TRIM(bt.txt)),
> '(ABRASION|DERMATOLOGICAL|PSORIASIS)')
>
> The regular expression syntax is cleaner, especially when you have a
> lot of strings to search for! However, the LIKE expression runs in 20
> seconds, while the REGEXP_LIKE one runs in 60 seconds. Has anyone
> else noticed this? Any way to speed it up?
>
> BTW there's some kind of limit to the string you can pass to
> REGEXP_LIKE. You'll get the error message: ORA-12733: regular
> expression too long
>
> Thanks,
> Zstringer


How about Oracle Text? It's as simple as

CREATE INDEX IX$CTX$BIG_TABLE#TXT ON BIG_TABLE (TEXT) INDEXTYPE IS
CTXSYS.CONTEXT PARAMETERS ('FILTER NULL_FILTER SYNC ON COMMIT')
/

(be aware that SYNC ON COMMIT option is unsafe in 10.1 up to 10.1.0.5
- there are a couple of bugs with it that can corrupt the index on
subsequent optimization, fixed in 10.1.0.5 and, I believe, 10.2.0.2)

and then you query like this:

SELECT * FROM BIG_TABLE
WHERE CONTAINS(TXT,'abrasion OR dermatological OR psoriasis') > 0

or

.... CONTAINS(TXT,'about(abrasion) or about(psoriasis)') > 0

or ... well, I won't go into copying the Text Reference here, check it
out yourself... Using Text should reduce your query response times
dramatically while being much more flexible with regard to the way you
search for the right information in unstructured data in and outside
of the database. Just open the Text Reference and discover a whole new
world of powerful and efficient full text indexing and searching
supplied by Oracle out of the box. And it comes with all Oracle
editions for no extra cost.

Hth,
Vladimir M. Zakharychev
N-Networks, makers of Dynamic PSP(tm)
http://www.dynamicpsp.com

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #8 (permalink)  
Old 04-08-2008, 12:38 PM
Rene Nyffenegger
 
Posts: n/a
Default Re: Performance of REGEXP_LIKE vs LIKE?

On 2007-04-19, Thomas Kellerer <JUAXQOSZFGQQ@spammotel.com> wrote:
> On 18.04.2007 22:52 zstringer999@gmail.com wrote:
>> The regular expression syntax is cleaner, especially when you have a
>> lot of strings to search for! However, the LIKE expression runs in 20
>> seconds, while the REGEXP_LIKE one runs in 60 seconds. Has anyone
>> else noticed this? Any way to speed it up?
>>

>
> I thought you should be able to create a function based index to support
> the regex match but I'm not sure (could have been Postgres where I did
> that the last time )


You can't, at least on 10g XE:


SQL> create table foo_bar_t (a number, b varchar2(20));

Table created.

SQL> create index foo_bar_i1 on foo_bar_t (upper(b));

Index created.

SQL> create index foo_bar_i2 on foo_bar_t(regexp_like(b, '^[[:digit:]]+$'));
create index foo_bar_i2 on foo_bar_t(regexp_like(b, '^[[:digit:]]+$'))
*
ERROR at line 1:
ORA-00904: "REGEXP_LIKE": invalid identifier


Maybe, I am missing something, then please let me know.

Cheers, RenÚ

--
Rene Nyffenegger
http://www.adp-gmbh.ch
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #9 (permalink)  
Old 04-08-2008, 12:38 PM
Thomas Kellerer
 
Posts: n/a
Default Re: Performance of REGEXP_LIKE vs LIKE?



On 20.04.2007 08:52 Rene Nyffenegger wrote:
> On 2007-04-19, Thomas Kellerer <JUAXQOSZFGQQ@spammotel.com> wrote:
>> On 18.04.2007 22:52 zstringer999@gmail.com wrote:
>>> The regular expression syntax is cleaner, especially when you have a
>>> lot of strings to search for! However, the LIKE expression runs in 20
>>> seconds, while the REGEXP_LIKE one runs in 60 seconds. Has anyone
>>> else noticed this? Any way to speed it up?
>>>

>> I thought you should be able to create a function based index to support
>> the regex match but I'm not sure (could have been Postgres where I did
>> that the last time )

>
> You can't, at least on 10g XE:


OK, then it was indeed with PostgreSQL
Thanks for the clarification

Regards
Thomas
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #10 (permalink)  
Old 04-08-2008, 12:38 PM
Maxim Demenko
 
Posts: n/a
Default Re: Performance of REGEXP_LIKE vs LIKE?

Rene Nyffenegger schrieb:
> On 2007-04-19, Thomas Kellerer <JUAXQOSZFGQQ@spammotel.com> wrote:
>> On 18.04.2007 22:52 zstringer999@gmail.com wrote:
>>> The regular expression syntax is cleaner, especially when you have a
>>> lot of strings to search for! However, the LIKE expression runs in 20
>>> seconds, while the REGEXP_LIKE one runs in 60 seconds. Has anyone
>>> else noticed this? Any way to speed it up?
>>>

>> I thought you should be able to create a function based index to support
>> the regex match but I'm not sure (could have been Postgres where I did
>> that the last time )

>
> You can't, at least on 10g XE:
>
>
> SQL> create table foo_bar_t (a number, b varchar2(20));
>
> Table created.
>
> SQL> create index foo_bar_i1 on foo_bar_t (upper(b));
>
> Index created.
>
> SQL> create index foo_bar_i2 on foo_bar_t(regexp_like(b, '^[[:digit:]]+$'));
> create index foo_bar_i2 on foo_bar_t(regexp_like(b, '^[[:digit:]]+$'))
> *
> ERROR at line 1:
> ORA-00904: "REGEXP_LIKE": invalid identifier
>
>
> Maybe, I am missing something, then please let me know.
>
> Cheers, RenÚ
>


Because regexp_like is not a function but condition.
Here (on 10.2.0.3):


SQL> create index foo_bar_i2 on foo_bar_t(regexp_replace(b,
'^([[:digit:]]+)$','\1'));

Index created.


Best regards

Maxim
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 08:24 AM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Search Engine Optimization by vBSEO 3.2.0
www.UnixAdminTalk.com