This is a discussion on CHECKSUM to determine record changes within the SQL Server forums, part of the Microsoft SQL Server category; --> I've searched the forum for uses of CHECKSUM and havent found a satisfactory answer. I need to know when ...
| |||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| I've searched the forum for uses of CHECKSUM and havent found a satisfactory answer. I need to know when a row changes and I dont care what it was or is, I just need to know it changed. Detail: Hourly I select rows where the record as a create date or change date of the last 3 days. I am only interested when the address or name changes, not other columns. Currently, I might select 30,000 rows that are new or changed, but turns out that only 100 have address changes and 500 are new. I want to process the 600, not the 30,000. My main concern is if some columns change, but the resulting checksum doesnt. Then I would have missed processing that record. The column types that I will be tracking are: "forenames" VARCHAR(50) NULL, "surname" VARCHAR(51) NULL, "ADDRESS1" VARCHAR(60) NULL, "ADDRESS2" VARCHAR(60) NULL, "ADDRESS4" VARCHAR(50) NULL, "STATE" VARCHAR(10) NULL, "ZIP" VARCHAR(20) NULL, "email" VARCHAR(50) NULL, "telephone" VARCHAR(10) NULL (I left off the keys, these are the only fields that I will be using for the checksum()). Am I safe? I read about 32 bit CRC and some data changes would go unnoticed, but not sure if this layout would qualifiy. TIA Rob |
| |||
| "rcamarda" <robert.a.camarda@gmail.com> wrote in message news:2d88cd48-2853-4f01-b797-c5c166bd167d@k13g2000hse.googlegroups.com... > Im using SQL Server 2005 currently patched. CHECKSUM isn't a reliable way to detect change because it's quite common to find different rows with the same CHECKSUM value. You could use a ROWVERSION column instead. ROWVERSION is guaranteed to increment when the row data is updated. Another alternative is to use a hash. The HashBytes function will return a secure hash of a binary value with a very high probability of uniqueness. Duplicate hashes are theoretically possible but are incredibly unlikely to occur unintentionally. If you are extremely paranoid then you can use two different hashes. -- David Portas |
| ||||
| David Portas wrote: > > "rcamarda" <robert.a.camarda@gmail.com> wrote in message > news:2d88cd48-2853-4f01-b797-c5c166bd167d@k13g2000hse.googlegroups.com... > > Im using SQL Server 2005 currently patched. > > CHECKSUM isn't a reliable way to detect change because it's quite common to > find different rows with the same CHECKSUM value. You could use a ROWVERSION > column instead. ROWVERSION is guaranteed to increment when the row data is > updated. > > Another alternative is to use a hash. The HashBytes function will return a > secure hash of a binary value with a very high probability of uniqueness. > Duplicate hashes are theoretically possible but are incredibly unlikely to > occur unintentionally. If you are extremely paranoid then you can use two > different hashes. > > -- > David Portas I agree about the advice for use ROWVERSION. However, CHECKSUM is also just a hash. Whether the chance of an unnotices change is lower if you use HashBytes function instead of CHECKSUM depends on your data. Although adding a second hash value will lower the chance of a missed change, it would be silly to do so. In the end you would need as many hash bytes as there are bytes in the data. -- Gert-Jan |