Unix Technical Forum

duplicate rows mystery

This is a discussion on duplicate rows mystery within the pgsql Novice forums, part of the PostgreSQL category; --> I'm trying to remove duplicate records from a table. The best way I could think of was to select ...


Go Back   Unix Technical Forum > Database Server Software > PostgreSQL > pgsql Novice

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 04-17-2008, 08:42 PM
Gerry Jensen
 
Posts: n/a
Default duplicate rows mystery

I'm trying to remove duplicate records from a table. The best way I could
think of was to select into a new table using DISTINCT.

I issued the following command:

select distinct on (symbol, date) * into price from pricebackup;

It did indeed remove several rows that had duplicate items. I then tried
to build a unique index on the columns (symbol, date) in pricebackup with:

create unique index price_symbol_date on pricebackup (symbol, date);

and got the error:

ERROR: could not create unique index
DETAIL: Table contains duplicated values.

How is it possible that I could have duplicate values in that table when I
just selected them into the table as distinct? What am I missing?

Any help is appreciated.

Thanks,

Gerry

---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2 (permalink)  
Old 04-17-2008, 08:42 PM
Michael Fuhr
 
Posts: n/a
Default Re: duplicate rows mystery

On Thu, Apr 14, 2005 at 03:26:16AM -0600, Gerry Jensen wrote:
>
> select distinct on (symbol, date) * into price from pricebackup;
>
> It did indeed remove several rows that had duplicate items. I then tried
> to build a unique index on the columns (symbol, date) in pricebackup with:
>
> create unique index price_symbol_date on pricebackup (symbol, date);
>
> and got the error:
>
> ERROR: could not create unique index
> DETAIL: Table contains duplicated values.


If the SELECT and CREATE statements are what you really did, then
you tried to create an index on the original table (pricebackup),
not the table you SELECTed into (price). Are you sure you have the
right table names in the right places?

--
Michael Fuhr
http://www.fuhr.org/~mfuhr/

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #3 (permalink)  
Old 04-17-2008, 08:42 PM
Tom Lane
 
Posts: n/a
Default Re: duplicate rows mystery

Gerry Jensen <gerry@xmission.com> writes:
> I issued the following command[s]:
> select distinct on (symbol, date) * into price from pricebackup;
> create unique index price_symbol_date on pricebackup (symbol, date);
> and got the error:
> ERROR: could not create unique index
> DETAIL: Table contains duplicated values.


Um, isn't pricebackup the one that still has the dup rows? You should
have created the index on the new table.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to majordomo@postgresql.org)

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #4 (permalink)  
Old 04-17-2008, 08:42 PM
Gerry Jensen
 
Posts: n/a
Default Re: duplicate rows mystery

Sorry. That was a typo in my post. I switched the table names in my
SELECT INTO line in my post (I tested it again to make sure that's not
what I did in psql and my results were the same). price was the original
table with duplicate rows. What I really did was:

select distinct on (symbol, date) * into pricebackup from price;

then:

create unique index price_symbol_date on pricebackup (symbol, date);

Yet it still gave me the duplicate value errors. How is this possible?

Thanks,

Gerry

On Thu, 14 Apr 2005, Michael Fuhr wrote:

> On Thu, Apr 14, 2005 at 03:26:16AM -0600, Gerry Jensen wrote:
>>
>> select distinct on (symbol, date) * into price from pricebackup;
>>
>> It did indeed remove several rows that had duplicate items. I then tried
>> to build a unique index on the columns (symbol, date) in pricebackup with:
>>
>> create unique index price_symbol_date on pricebackup (symbol, date);
>>
>> and got the error:
>>
>> ERROR: could not create unique index
>> DETAIL: Table contains duplicated values.

>
> If the SELECT and CREATE statements are what you really did, then
> you tried to create an index on the original table (pricebackup),
> not the table you SELECTed into (price). Are you sure you have the
> right table names in the right places?
>
> --
> Michael Fuhr
> http://www.fuhr.org/~mfuhr/
>


---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #5 (permalink)  
Old 04-17-2008, 08:42 PM
Steve Tucknott
 
Posts: n/a
Default Re: duplicate rows mystery

How about the simple case that pricebackup is not empty before the
select? You say it is a new table, but did you do a test run or anything
like that before the first select? How about dropping or truncating the
table and starting again?


On Thu, 2005-04-14 at 18:09, Gerry Jensen wrote:
> Sorry. That was a typo in my post. I switched the table names in my
> SELECT INTO line in my post (I tested it again to make sure that's not
> what I did in psql and my results were the same). price was the original
> table with duplicate rows. What I really did was:
>
> select distinct on (symbol, date) * into pricebackup from price;
>
> then:
>
> create unique index price_symbol_date on pricebackup (symbol, date);
>
> Yet it still gave me the duplicate value errors. How is this possible?
>
> Thanks,
>
> Gerry
>
> On Thu, 14 Apr 2005, Michael Fuhr wrote:
>
> > On Thu, Apr 14, 2005 at 03:26:16AM -0600, Gerry Jensen wrote:
> >>
> >> select distinct on (symbol, date) * into price from pricebackup;
> >>
> >> It did indeed remove several rows that had duplicate items. I then tried
> >> to build a unique index on the columns (symbol, date) in pricebackup with:
> >>
> >> create unique index price_symbol_date on pricebackup (symbol, date);
> >>
> >> and got the error:
> >>
> >> ERROR: could not create unique index
> >> DETAIL: Table contains duplicated values.

> >
> > If the SELECT and CREATE statements are what you really did, then
> > you tried to create an index on the original table (pricebackup),
> > not the table you SELECTed into (price). Are you sure you have the
> > right table names in the right places?
> >
> > --
> > Michael Fuhr
> > http://www.fuhr.org/~mfuhr/
> >

>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: Have you searched our list archives?
>
> http://archives.postgresql.org

--


Regards,

Steve Tucknott
ReTSol Ltd

DDI 01903 828769
MOBILE 07736715772



---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #6 (permalink)  
Old 04-17-2008, 08:42 PM
Gerry Jensen
 
Posts: n/a
Default Re: duplicate rows mystery

Yes. I tried it several times, dropping the newly created table each
time. I'm wondering if my database is just corrupt.

My indexes in my original table were definitely corrupt (I originally had
a unique (symbol, date) index). I discovered this when certain queries
failed to return certain rows (e.g. selecting on a specific date), but
different queries did return those rows (e.g. selecting a range of dates
that included the missing date). When I dropped the indexes, I was able
to again see the missing rows with the specific date query. Apparently
the index corruption had allowed the duplicate rows to be inserted in the
first place.

So perhaps there's something seriously wrong with my database and I need
to start over from scratch. The main reason for my post is that I wanted
to make sure I was understanding "DISTINCT ON" correctly and there was
nothing wrong with my logic. If those things look fine and logical to
everyone, then my posting here has been beneficial to me.

Anyway, thanks for the suggestion.

Gerry

On Thu, 14 Apr 2005, Steve Tucknott wrote:

> How about the simple case that pricebackup is not empty before the
> select? You say it is a new table, but did you do a test run or anything
> like that before the first select? How about dropping or truncating the
> table and starting again?
>
>
> On Thu, 2005-04-14 at 18:09, Gerry Jensen wrote:
>> Sorry. That was a typo in my post. I switched the table names in my
>> SELECT INTO line in my post (I tested it again to make sure that's not
>> what I did in psql and my results were the same). price was the original
>> table with duplicate rows. What I really did was:
>>
>> select distinct on (symbol, date) * into pricebackup from price;
>>
>> then:
>>
>> create unique index price_symbol_date on pricebackup (symbol, date);
>>
>> Yet it still gave me the duplicate value errors. How is this possible?
>>
>> Thanks,
>>
>> Gerry
>>
>> On Thu, 14 Apr 2005, Michael Fuhr wrote:
>>
>>> On Thu, Apr 14, 2005 at 03:26:16AM -0600, Gerry Jensen wrote:
>>>>
>>>> select distinct on (symbol, date) * into price from pricebackup;
>>>>
>>>> It did indeed remove several rows that had duplicate items. I then tried
>>>> to build a unique index on the columns (symbol, date) in pricebackup with:
>>>>
>>>> create unique index price_symbol_date on pricebackup (symbol, date);
>>>>
>>>> and got the error:
>>>>
>>>> ERROR: could not create unique index
>>>> DETAIL: Table contains duplicated values.
>>>
>>> If the SELECT and CREATE statements are what you really did, then
>>> you tried to create an index on the original table (pricebackup),
>>> not the table you SELECTed into (price). Are you sure you have the
>>> right table names in the right places?
>>>
>>> --
>>> Michael Fuhr
>>> http://www.fuhr.org/~mfuhr/
>>>

>>
>> ---------------------------(end of broadcast)---------------------------
>> TIP 6: Have you searched our list archives?
>>
>> http://archives.postgresql.org

> --
>
>
> Regards,
>
> Steve Tucknott
> ReTSol Ltd
>
> DDI 01903 828769
> MOBILE 07736715772
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Don't 'kill -9' the postmaster
>


---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 09:45 PM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Search Engine Optimization by vBSEO 3.2.0
www.UnixAdminTalk.com