Unix Technical Forum

Escaping the ARC patent

This is a discussion on Escaping the ARC patent within the pgsql Hackers forums, part of the PostgreSQL category; --> I've been doing a bit of research on $subj, and coming to the conclusion that the ARC patent is ...


Go Back   Unix Technical Forum > Database Server Software > PostgreSQL > pgsql Hackers

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 04-11-2008, 03:35 AM
Tom Lane
 
Posts: n/a
Default Escaping the ARC patent

I've been doing a bit of research on $subj, and coming to the conclusion
that the ARC patent is a lot narrower than it might appear. In fact
most of the parts of the algorithm that we actually want have prior art.
I looked in particular at Johnson and Shasha's well-known "2Q" paper,
published in 1994 (http://citeseer.ist.psu.edu/63909.html). This paper
describes the use of two lists, which they call A1 and Am (as opposed to
ARC's T1 and T2) but the basic principle is the same: a page goes into
A1 on first use, and doesn't get to Am unless used a second time before
aging out of the cache. 2Q also includes a list of pages that have
recently been in the cache but no longer are. So the actually
patentable parts of ARC are just some rather narrow decisions about the
management of these lists, in particular the use of a target T1len to
dynamically adapt the sizes of the lists. The 2Q paper proposes using
fixed fractions of the total available space for each list --- and it
includes statistics showing that the algorithm isn't excessively
sensitive to the exact values used, so ARC's claimed "self tuning"
advantage isn't all that great after all.

These conclusions are borne out by a close reading of the patent
application (which is at
http://appft1.uspto.gov/netacgi/nph-...DN/20040098541
if you want to look for yourself). Claim 1 reads

1. A method for adaptively managing pages in a cache memory with a
variable workload, comprising: maintaining the cache memory into a first
list L1 and a second list L2; wherein the cache memory has a capacity to
store c pages; and adaptively distributing the workload between the
first list L1 and the second list L2, to a total capacity of c pages.

Given the prior art, the critical word in this sentence is "adaptively";
take that out and you have nothing that wasn't published long before.
If we remove the adaptivity --- ie, just use a fixed division of list
sizes --- we escape claim 1 and all the other claims that depend on it.

The only other claim that isn't dependent on claim 1 or a restatement of
it is

45. A method for adaptively managing pages in a memory, comprising:
defining a cache memory; defining a cache directory; organizing the
cache directory into fours disjoint lists of pages: list T1, list T2,
list B1, and list B2; and wherein the cache memory contains pages that
are members of any of the list T1 or the list T2.

So if we use non-variable sizes of T1/T2 and don't use the four-way
list structure to manage remembrance of pages-formerly-in-cache,
we escape the patent. But we still have scan resistance, which is the
main thing that ARC was going to buy us. Pages that are scanned only
once don't get out of A1 and so aren't able to swamp out pages
referenced multiple times.

After reading the 2Q paper my inclination is to use exactly Johnson and
Shasha's "simplified 2Q" algorithm, which uses just A1 and Am with no
remembrance of formerly cached pages. Their "full 2Q" algorithm strikes
me as a tad bizarre because it will only promote a page into Am after it
has fallen out of A1, ie, it takes two physical reads of the page to
get into Am. That's just weird. I think that pages should advance from
A1 into Am on second reference. Given that, you don't need any
remembrance of pages that were formerly in A1, which basically halves
the memory overhead of the ARC algorithm.

An advantage of heading in this direction (as opposed to, say, LRU/k
or other algorithms) is that this represents a direct simplification
of the ARC code we have now. We can probably implement it almost
entirely by deletions from freelist.c, with little newly written code.
That gives me a whole lot more confidence that the result will be
reliable enough to back-patch into 8.0.*.

Comments?

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2 (permalink)  
Old 04-11-2008, 03:36 AM
Jonah H. Harris
 
Posts: n/a
Default Re: Escaping the ARC patent

I found the reference I had seen. The engine was the Multicache
Simulation Environment written in C++. I can't find the code to it
anymore but I've contacted the author for a copy.

Jonah H. Harris wrote:

> I'll dive into my bookmarks and see if I can find it.
>
> Tom Lane wrote:
>
>> "Jonah H. Harris" <jharris@tvi.edu> writes:
>>
>>
>>> I'm familiar with the 2Q algorithm. I also remember seeing, I
>>> believe, a public domain 2Q implementation floating around somewhere.
>>>

>>
>>
>> No doubt, but I think the more conservative way to get there is to
>> proceed by trimming down the working code we already have. Adapting
>> someone else's code to fit into our backend infrastructure would involve
>> a fair amount of modifications (memory management and error handling at
>> the very least) with consequent risks of introducing bugs.
>>
>> Still, it wouldn't be bad to have someone else's implementation at hand
>> as a comparison point. Do you have a link handy?
>>
>> regards, tom lane
>>
>>

>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
> (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)




---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #3 (permalink)  
Old 04-11-2008, 03:37 AM
Tom Lane
 
Posts: n/a
Default Re: Escaping the ARC patent

"Jonah H. Harris" <jharris@tvi.edu> writes:
> I'm familiar with the 2Q algorithm. I also remember seeing, I believe,
> a public domain 2Q implementation floating around somewhere.


No doubt, but I think the more conservative way to get there is to
proceed by trimming down the working code we already have. Adapting
someone else's code to fit into our backend infrastructure would involve
a fair amount of modifications (memory management and error handling at
the very least) with consequent risks of introducing bugs.

Still, it wouldn't be bad to have someone else's implementation at hand
as a comparison point. Do you have a link handy?

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #4 (permalink)  
Old 04-11-2008, 03:37 AM
Jonah H. Harris
 
Posts: n/a
Default Re: Escaping the ARC patent

I'll dive into my bookmarks and see if I can find it.

Tom Lane wrote:

>"Jonah H. Harris" <jharris@tvi.edu> writes:
>
>
>>I'm familiar with the 2Q algorithm. I also remember seeing, I believe,
>>a public domain 2Q implementation floating around somewhere.
>>
>>

>
>No doubt, but I think the more conservative way to get there is to
>proceed by trimming down the working code we already have. Adapting
>someone else's code to fit into our backend infrastructure would involve
>a fair amount of modifications (memory management and error handling at
>the very least) with consequent risks of introducing bugs.
>
>Still, it wouldn't be bad to have someone else's implementation at hand
>as a comparison point. Do you have a link handy?
>
> regards, tom lane
>
>



---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to majordomo@postgresql.org)

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #5 (permalink)  
Old 04-11-2008, 03:37 AM
Jonah H. Harris
 
Posts: n/a
Default Re: Escaping the ARC patent

I'm familiar with the 2Q algorithm. I also remember seeing, I believe,
a public domain 2Q implementation floating around somewhere.

Tom Lane wrote:

>I've been doing a bit of research on $subj, and coming to the conclusion
>that the ARC patent is a lot narrower than it might appear. In fact
>most of the parts of the algorithm that we actually want have prior art.
>I looked in particular at Johnson and Shasha's well-known "2Q" paper,
>published in 1994 (http://citeseer.ist.psu.edu/63909.html). This paper
>describes the use of two lists, which they call A1 and Am (as opposed to
>ARC's T1 and T2) but the basic principle is the same: a page goes into
>A1 on first use, and doesn't get to Am unless used a second time before
>aging out of the cache. 2Q also includes a list of pages that have
>recently been in the cache but no longer are. So the actually
>patentable parts of ARC are just some rather narrow decisions about the
>management of these lists, in particular the use of a target T1len to
>dynamically adapt the sizes of the lists. The 2Q paper proposes using
>fixed fractions of the total available space for each list --- and it
>includes statistics showing that the algorithm isn't excessively
>sensitive to the exact values used, so ARC's claimed "self tuning"
>advantage isn't all that great after all.
>
>These conclusions are borne out by a close reading of the patent
>application (which is at
>http://appft1.uspto.gov/netacgi/nph-...DN/20040098541
>if you want to look for yourself). Claim 1 reads
>
>1. A method for adaptively managing pages in a cache memory with a
>variable workload, comprising: maintaining the cache memory into a first
>list L1 and a second list L2; wherein the cache memory has a capacity to
>store c pages; and adaptively distributing the workload between the
>first list L1 and the second list L2, to a total capacity of c pages.
>
>Given the prior art, the critical word in this sentence is "adaptively";
>take that out and you have nothing that wasn't published long before.
>If we remove the adaptivity --- ie, just use a fixed division of list
>sizes --- we escape claim 1 and all the other claims that depend on it.
>
>The only other claim that isn't dependent on claim 1 or a restatement of
>it is
>
>45. A method for adaptively managing pages in a memory, comprising:
>defining a cache memory; defining a cache directory; organizing the
>cache directory into fours disjoint lists of pages: list T1, list T2,
>list B1, and list B2; and wherein the cache memory contains pages that
>are members of any of the list T1 or the list T2.
>
>So if we use non-variable sizes of T1/T2 and don't use the four-way
>list structure to manage remembrance of pages-formerly-in-cache,
>we escape the patent. But we still have scan resistance, which is the
>main thing that ARC was going to buy us. Pages that are scanned only
>once don't get out of A1 and so aren't able to swamp out pages
>referenced multiple times.
>
>After reading the 2Q paper my inclination is to use exactly Johnson and
>Shasha's "simplified 2Q" algorithm, which uses just A1 and Am with no
>remembrance of formerly cached pages. Their "full 2Q" algorithm strikes
>me as a tad bizarre because it will only promote a page into Am after it
>has fallen out of A1, ie, it takes two physical reads of the page to
>get into Am. That's just weird. I think that pages should advance from
>A1 into Am on second reference. Given that, you don't need any
>remembrance of pages that were formerly in A1, which basically halves
>the memory overhead of the ARC algorithm.
>
>An advantage of heading in this direction (as opposed to, say, LRU/k
>or other algorithms) is that this represents a direct simplification
>of the ARC code we have now. We can probably implement it almost
>entirely by deletions from freelist.c, with little newly written code.
>That gives me a whole lot more confidence that the result will be
>reliable enough to back-patch into 8.0.*.
>
>Comments?
>
> regards, tom lane
>
>---------------------------(end of broadcast)---------------------------
>TIP 4: Don't 'kill -9' the postmaster
>
>



---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to majordomo@postgresql.org)

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 01:37 AM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Search Engine Optimization by vBSEO 3.2.0
www.UnixAdminTalk.com