Unix Technical Forum

Regular expressions: splitting with REGEXP_SUBSTR and "null"

This is a discussion on Regular expressions: splitting with REGEXP_SUBSTR and "null" within the Oracle Miscellaneous forums, part of the Oracle Database category; --> Hi, I have a problem, I found on internet a way to split strings separated by pipe | (for ...


Go Back   Unix Technical Forum > Database Server Software > Oracle Database > Oracle Miscellaneous

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 04-08-2008, 01:39 PM
citte
 
Posts: n/a
Default Regular expressions: splitting with REGEXP_SUBSTR and "null"

Hi,
I have a problem, I found on internet a way to split strings separated
by pipe | (for example) with regexp, something like that:

SELECT REGEXP_SUBSTR('One|Two|Three|Four','[^|]+', 1, 3)
FROM dual;

which gives the result:
Three

the problem comes when I have one of the words separated by the pipe
is actually a Null, so the string becomes
'One|Two||Four|'

the query

SELECT REGEXP_SUBSTR('One|Two||Four','[^|]+', 1, 3)
FROM dual;

result in:
Four

I needed a Null (or empty string).
How is this possible?

Thank you in advance!
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2 (permalink)  
Old 04-08-2008, 01:39 PM
Mark D Powell
 
Posts: n/a
Default Re: Regular expressions: splitting with REGEXP_SUBSTR and "null"

On Feb 28, 11:02*am, citte <nicola.ame...@gmail.com> wrote:
> Hi,
> I have a problem, I found on internet a way to split strings separated
> by pipe | (for example) with regexp, something like that:
>
> SELECT REGEXP_SUBSTR('One|Two|Three|Four','[^|]+', 1, 3)
> FROM dual;
>
> which gives the result:
> Three
>
> the problem comes when I have one of the words separated by the pipe
> is actually a Null, so the string becomes
> 'One|Two||Four|'
>
> the query
>
> SELECT REGEXP_SUBSTR('One|Two||Four','[^|]+', 1, 3)
> FROM dual;
>
> result in:
> Four
>
> I needed a Null (or empty string).
> How is this possible?
>
> Thank you in advance!


Will this help?

SQL> l
1 select regexp_substr(replace('one|two||four','||','|NULL| '),
2* '[^|]+',1,3) from dual
SQL> /

REGE
----
NULL

Also to return an actual NULL value
SQL> select regexp_substr(replace('one|two||four','||',''),
2 '[^|]+',1,3) from dual;

R
-


HTH -- Mark D Powell --
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #3 (permalink)  
Old 04-08-2008, 01:39 PM
Doug Miller
 
Posts: n/a
Default Re: Regular expressions: splitting with REGEXP_SUBSTR and "null"

In article <248f35ac-1e8c-424a-80cd-1da3edc66966@e6g2000prf.googlegroups.com>, Mark D Powell <Mark.Powell@eds.com> wrote:
>On Feb 28, 11:02=A0am, citte <nicola.ame...@gmail.com> wrote:
>> Hi,
>> I have a problem, I found on internet a way to split strings separated
>> by pipe | (for example) with regexp, something like that:
>>
>> SELECT REGEXP_SUBSTR('One|Two|Three|Four','[^|]+', 1, 3)
>> FROM dual;
>>
>> which gives the result:
>> Three
>>
>> the problem comes when I have one of the words separated by the pipe
>> is actually a Null, so the string becomes
>> 'One|Two||Four|'
>>
>> the query
>>
>> SELECT REGEXP_SUBSTR('One|Two||Four','[^|]+', 1, 3)
>> FROM dual;
>>
>> result in:
>> Four
>>
>> I needed a Null (or empty string).
>> How is this possible?
>>
>> Thank you in advance!

>
>Will this help?
>
>SQL> l
> 1 select regexp_substr(replace('one|two||four','||','|NULL| '),
> 2* '[^|]+',1,3) from dual
>SQL> /
>
>REGE
>----
>NULL


Doesn't work in a more general case, e.g. attempting to select the fourth
string from 'one|two|||five' -- the desired result is NULL, but this method
returns 'five'. Conversely, trying to retrieve the fifth string should return
'five' but instead returns NULL.
>
>Also to return an actual NULL value
>SQL> select regexp_substr(replace('one|two||four','||',''),
> 2 '[^|]+',1,3) from dual;
>
>R
>-


Also doesn't work for more general cases:
- specifying the second substring should return 'two' but instead returns
'twofour'
- specifying the fourth substring should return 'four' but instead returns a
null string
- specifying the third substring in 'one|two||four|five' should return a null
string, but instead returns 'five'

In general, when seeking the i-th substring, if the omitted substring is in
the n-th position, this method is guaranteed to produce correct results only
when 0 < i < (n - 1), or when i = n AND there are at most n non-null
substrings. If there are multiple omitted substrings, correct results may also
be obtained for other values of i, but only by coincidence.
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #4 (permalink)  
Old 04-08-2008, 01:39 PM
Peter Nilsson
 
Posts: n/a
Default Re: Regular expressions: splitting with REGEXP_SUBSTR and "null"

citte <nicola.ame...@gmail.com> wrote:
> Hi,
> I have a problem, I found on internet a way to split
> strings separated by pipe | (for example) with regexp,
> something like that:
>
> SELECT REGEXP_SUBSTR('One|Two|Three|Four','[^|]+', 1, 3)
> FROM dual;
>
> which gives the result:
> Three
>
> the problem comes when I have one of the words separated
> by the pipe is actually a Null, so the string becomes
> 'One|Two||Four|'
>
> the query
>
> SELECT REGEXP_SUBSTR('One|Two||Four','[^|]+', 1, 3)
> FROM dual;
>
> result in:
> Four
>
> I needed a Null (or empty string).
> How is this possible?


Here are some (not necessarily pretty) ways, assuming n > 0...

select
x,
n,
substr(
'|' || x || '|',
instr('|' || x || '|', '|', 1, n) + 1,
instr('|' || x || '|', '|', 1, n + 1)
- instr('|' || x || '|', '|', 1, n) - 1),
regexp_replace(
'|' || x || '|',
'^([^\|]*\|){' || n || ',' || n || '}([^\|]*)|.*',
'\2' ),
regexp_substr(
regexp_substr(x || '|', '[^\|]*\|', 1, n),
'[^\|]*'),
regexp_substr(
'|' || x,
'[^\|]*',
instr('|' || x, '|', 1, n) + 1)
from
(select 'one|two||four|five' x from dual),
(select level n from dual connect by level <= 7)

--
Peter
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #5 (permalink)  
Old 04-08-2008, 01:39 PM
Mark D Powell
 
Posts: n/a
Default Re: Regular expressions: splitting with REGEXP_SUBSTR and "null"

On Feb 29, 1:11*pm, spamb...@milmac.com (Doug Miller) wrote:
> In article <248f35ac-1e8c-424a-80cd-1da3edc66...@e6g2000prf.googlegroups.com>, Mark D Powell <Mark.Pow...@eds.com> wrote:
>
>
>
>
>
> >On Feb 28, 11:02=A0am, citte <nicola.ame...@gmail.com> wrote:
> >> Hi,
> >> I have a problem, I found on internet a way to split strings separated
> >> by pipe | (for example) with regexp, something like that:

>
> >> SELECT REGEXP_SUBSTR('One|Two|Three|Four','[^|]+', 1, 3)
> >> FROM dual;

>
> >> which gives the result:
> >> Three

>
> >> the problem comes when I have one of the words separated by the pipe
> >> is actually a Null, so the string becomes
> >> 'One|Two||Four|'

>
> >> the query

>
> >> SELECT REGEXP_SUBSTR('One|Two||Four','[^|]+', 1, 3)
> >> FROM dual;

>
> >> result in:
> >> Four

>
> >> I needed a Null (or empty string).
> >> How is this possible?

>
> >> Thank you in advance!

>
> >Will this help?

>
> >SQL> l
> > *1 *select regexp_substr(replace('one|two||four','||','|NULL| '),
> > *2* *'[^|]+',1,3) from dual
> >SQL> /

>
> >REGE
> >----
> >NULL

>
> Doesn't work in a more general case, e.g. attempting to select the fourth
> string from 'one|two|||five' -- the desired result is NULL, but this method
> returns 'five'. Conversely, trying to retrieve the fifth string should return
> 'five' but instead returns NULL.
>
>
>
> >Also to return an actual NULL value
> >SQL> select regexp_substr(replace('one|two||four','||',''),
> > *2 * '[^|]+',1,3) from dual;

>
> >R
> >-

>
> Also doesn't work for more general cases:
> - specifying the second substring should return 'two' but instead returns
> 'twofour'
> - specifying the fourth substring should return 'four' but instead returnsa
> null string
> - specifying the third substring in 'one|two||four|five' should return a null
> string, but instead returns 'five'
>
> In general, when seeking the i-th substring, if the omitted substring is in
> the n-th position, this method is guaranteed to produce correct results only
> when 0 < i < (n - 1), or when i = n AND there are at most n non-null
> substrings. If there are multiple omitted substrings, correct results may also
> be obtained for other values of i, but only by coincidence.- Hide quoted text -
>
> - Show quoted text -


Doug, valid point. I was not thinking about general cases only the
posted one and one should always be aware of what will break what you
wrote.

-- Mark D Powell --
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #6 (permalink)  
Old 04-24-2008, 07:09 PM
citte
 
Posts: n/a
Default Re: Regular expressions: splitting with REGEXP_SUBSTR and "null"

On 29 Feb, 20:11, spamb...@milmac.com (Doug Miller) wrote:
> In article <248f35ac-1e8c-424a-80cd-1da3edc66...@e6g2000prf.googlegroups.com>, Mark D Powell <Mark.Pow...@eds.com> wrote:
>
>
>
> >On Feb 28, 11:02=A0am, citte <nicola.ame...@gmail.com> wrote:
> >> Hi,
> >> I have a problem, I found on internet a way to split strings separated
> >> by pipe | (for example) with regexp, something like that:

>
> >> SELECTREGEXP_SUBSTR('One|Two|Three|Four','[^|]+', 1, 3)
> >> FROM dual;

>
> >> which gives the result:
> >> Three

>
> >> the problem comes when I have one of the words separated by the pipe
> >> is actually aNull, so the string becomes
> >> 'One|Two||Four|'

>
> >> the query

>
> >> SELECTREGEXP_SUBSTR('One|Two||Four','[^|]+', 1, 3)
> >> FROM dual;

>
> >> result in:
> >> Four

>
> >> I needed aNull(or empty string).
> >> How is this possible?

>
> >> Thank you in advance!

>
> >Will this help?

>
> >SQL> l
> > 1 selectregexp_substr(replace('one|two||four','||',' |NULL|'),
> > 2* '[^|]+',1,3) from dual
> >SQL> /

>
> >REGE
> >----
> >NULL

>
> Doesn't work in a more general case, e.g. attempting to select the fourth
> string from 'one|two|||five' -- the desired result isNULL, but this method
> returns 'five'. Conversely, trying to retrieve the fifth string should return
> 'five' but instead returnsNULL.
>
>
>
> >Also to return an actualNULLvalue
> >SQL> selectregexp_substr(replace('one|two||four','||',' '),
> > 2 '[^|]+',1,3) from dual;

>
> >R
> >-

>
> Also doesn't work for more general cases:
> - specifying the second substring should return 'two' but instead returns
> 'twofour'
> - specifying the fourth substring should return 'four' but instead returns anullstring
> - specifying the third substring in 'one|two||four|five' should return anull
> string, but instead returns 'five'
>
> In general, when seeking the i-th substring, if the omitted substring is in
> the n-th position, this method is guaranteed to produce correct results only
> when 0 < i < (n - 1), or when i = n AND there are at most n non-null
> substrings. If there are multiple omitted substrings, correct results may also
> be obtained for other values of i, but only by coincidence.


yes, you're right. maybe something more compact than that proposed by
Peter could be:

select
replace(
regexp_substr(replace('one|two||four','|','|\'),'[^|]+',1,4)
, '\', ''
) REG_RESULT
from dual;

where '\' should be a character that NEVER appears in "field" strings
(another one could be chosen)
it's an inelegant solution (or really ugly?), you should be really
sure it's never present

I wanted to learn regexp, but I think replace could be replaced
(sorry with a regular expression...

thank you!
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #7 (permalink)  
Old 04-24-2008, 07:09 PM
citte
 
Posts: n/a
Default Re: Regular expressions: splitting with REGEXP_SUBSTR and "null"

On 29 Feb, 20:11, spamb...@milmac.com (Doug Miller) wrote:
> In article <248f35ac-1e8c-424a-80cd-1da3edc66...@e6g2000prf.googlegroups.com>, Mark D Powell <Mark.Pow...@eds.com> wrote:
>
>
>
> >On Feb 28, 11:02=A0am, citte <nicola.ame...@gmail.com> wrote:
> >> Hi,
> >> I have a problem, I found on internet a way to split strings separated
> >> by pipe | (for example) with regexp, something like that:

>
> >> SELECTREGEXP_SUBSTR('One|Two|Three|Four','[^|]+', 1, 3)
> >> FROM dual;

>
> >> which gives the result:
> >> Three

>
> >> the problem comes when I have one of the words separated by the pipe
> >> is actually aNull, so the string becomes
> >> 'One|Two||Four|'

>
> >> the query

>
> >> SELECTREGEXP_SUBSTR('One|Two||Four','[^|]+', 1, 3)
> >> FROM dual;

>
> >> result in:
> >> Four

>
> >> I needed aNull(or empty string).
> >> How is this possible?

>
> >> Thank you in advance!

>
> >Will this help?

>
> >SQL> l
> > 1 selectregexp_substr(replace('one|two||four','||',' |NULL|'),
> > 2* '[^|]+',1,3) from dual
> >SQL> /

>
> >REGE
> >----
> >NULL

>
> Doesn't work in a more general case, e.g. attempting to select the fourth
> string from 'one|two|||five' -- the desired result isNULL, but this method
> returns 'five'. Conversely, trying to retrieve the fifth string should return
> 'five' but instead returnsNULL.
>
>
>
> >Also to return an actualNULLvalue
> >SQL> selectregexp_substr(replace('one|two||four','||',' '),
> > 2 '[^|]+',1,3) from dual;

>
> >R
> >-

>
> Also doesn't work for more general cases:
> - specifying the second substring should return 'two' but instead returns
> 'twofour'
> - specifying the fourth substring should return 'four' but instead returns anullstring
> - specifying the third substring in 'one|two||four|five' should return anull
> string, but instead returns 'five'
>
> In general, when seeking the i-th substring, if the omitted substring is in
> the n-th position, this method is guaranteed to produce correct results only
> when 0 < i < (n - 1), or when i = n AND there are at most n non-null
> substrings. If there are multiple omitted substrings, correct results may also
> be obtained for other values of i, but only by coincidence.


yes, you're right. maybe something more compact than that proposed by
Peter could be:

select
replace(
regexp_substr(replace('one|two||four','|','|\'),'[^|]+',1,4)
, '\', ''
) REG_RESULT
from dual;

where '\' should be a character that NEVER appears in "field" strings
(another one could be chosen)
it's an inelegant solution (or really ugly?), you should be really
sure it's never present

I wanted to learn regexp, but I think replace could be replaced
(sorry with a regular expression...

thank you!
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 08:12 AM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Search Engine Optimization by vBSEO 3.2.0
www.UnixAdminTalk.com