Unix Technical Forum

arrays of floating point numbers / linear algebra operations into the DB

This is a discussion on arrays of floating point numbers / linear algebra operations into the DB within the Pgsql General forums, part of the PostgreSQL category; --> Hello, I'd like to perform linear algebra operations on float4/8 arrays. These tasks are tipically carried on using ad ...


Go Back   Unix Technical Forum > Database Server Software > PostgreSQL > Pgsql General

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 04-10-2008, 12:19 AM
Enrico Sirola
 
Posts: n/a
Default arrays of floating point numbers / linear algebra operations into the DB

Hello,
I'd like to perform linear algebra operations on float4/8 arrays.
These tasks are tipically carried on using ad hoc optimized libraries
(e.g. BLAS). In order to do this, I studied a bit how arrays are
stored internally by the DB: from what I understood, arrays are
basically a vector of Datum, and floating point numbers are stored by
reference into Datums. At a first glance, this seem to close the
discussion because in order to perform fast linear algebra operations,
you need to store array items in consecutive memory cells.
What are the alternatives? Create a new specialized data type for
floating point vectors?
Basically, the use-case is to be able to rescale, add and multiply
(element-by-element)
vectors.

Thanks for your help,
e.




---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2 (permalink)  
Old 04-10-2008, 12:19 AM
Colin Wetherbee
 
Posts: n/a
Default Re: arrays of floating point numbers / linear algebraoperations into the DB

Enrico Sirola wrote:
> Hello,
> I'd like to perform linear algebra operations on float4/8 arrays. These
> tasks are tipically carried on using ad hoc optimized libraries (e.g.
> BLAS). In order to do this, I studied a bit how arrays are stored
> internally by the DB: from what I understood, arrays are basically a
> vector of Datum, and floating point numbers are stored by reference into
> Datums. At a first glance, this seem to close the discussion because in
> order to perform fast linear algebra operations, you need to store array
> items in consecutive memory cells.
> What are the alternatives? Create a new specialized data type for
> floating point vectors?
> Basically, the use-case is to be able to rescale, add and multiply
> (element-by-element)
> vectors.


I'm not sure about the internals of PostgreSQL (eg. the Datum object(?)
you mention), but if you're just scaling vectors, consecutive memory
addresses shouldn't be absolutely necessary. Add and multiply
operations within a linked list (which is how I'm naively assuming Datum
storage for arrays in memory is implemented) will be "roughly" just as fast.

How many scaling operations are you planning to execute per second, and
how many elements do you scale per operation?

Colin

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #3 (permalink)  
Old 04-10-2008, 12:19 AM
Martijn van Oosterhout
 
Posts: n/a
Default Re: arrays of floating point numbers / linear algebra operations into the DB

On Fri, Feb 01, 2008 at 11:31:37AM +0100, Enrico Sirola wrote:
> Hello,
> I'd like to perform linear algebra operations on float4/8 arrays.
> These tasks are tipically carried on using ad hoc optimized libraries
> (e.g. BLAS). In order to do this, I studied a bit how arrays are
> stored internally by the DB: from what I understood, arrays are
> basically a vector of Datum, and floating point numbers are stored by
> reference into Datums.


Well, arrays are not vectors of Datum, they are a vector of the objects
they contain. When passed to a function floats, arrays and other by-ref
types as passed by reference, but the array object itself does not
contain references, it contains the actual objects.

That doesn't necessarily make it the same as a C array though, the
alignment considerations may be different. But at first glance
certainly seems like an array would be in the right format for what
you're doing.

Have a nice day,
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/
> Those who make peaceful revolution impossible will make violent revolution inevitable.
> -- John F Kennedy


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)

iD8DBQFHozJ2IB7bNG8LQkwRAj3pAJ9ZXFv3ZZjMw6BSCdTfNq gXo1fRlQCdGCZx
uecYYqNxlzaCWwNpr491D2o=
=M05n
-----END PGP SIGNATURE-----

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #4 (permalink)  
Old 04-10-2008, 12:19 AM
Webb Sprague
 
Posts: n/a
Default Re: arrays of floating point numbers / linear algebra operations into the DB

On Feb 1, 2008 2:31 AM, Enrico Sirola <enrico.sirola@gmail.com> wrote:
> Hello,
> I'd like to perform linear algebra operations on float4/8 arrays.
> These tasks are tipically carried on using ad hoc optimized libraries
> (e.g. BLAS).


If there were a coherently designed, simple, and fast LAPACK/ MATLAB
style library and set of datatypes for matrices and vectors in
Postgres, I think that would be a HUGE plus for the project!

I would have used it on a project I am working on in mortality
forecasting (I would have been able to put all of my mathematics in
the database instead of using scipy), it would tie in beautifully with
the GIS and imagery efforts, it would ease fancy statistics
calculation on database infrastructure, it would provide useful
libraries for the datamining/ knowledge discovery types, etc, etc. If
we just had fast matrix arithmetic, eigen-stuff (including singular
value decomposition), convolution, random matrix generation, and
table <-> matrix functions, that would be amazing and would provide
the material for further library development since a lot of complex
algorithms just fall out when you can do advanced linear algebra.

We need to be able to convert transparently between matrices/ vectors
(which I think should be simple N by 1 matrices by default) and
arrays, but we would probably want to go for a separate datatype in
order to get speed since scientifically important matrices can be
HUGE.

Just my fairly worthless $0.02, as I all I would provide would be to
be a tester and member of the peanut-gallery, but there you go.
Seems like a perfect Summer Of Code project for someone better at
C-level programming than me.

-W

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #5 (permalink)  
Old 04-10-2008, 12:19 AM
Ron Mayer
 
Posts: n/a
Default Re: arrays of floating point numbers / linear algebra operationsinto the DB

Webb Sprague wrote:
> On Feb 1, 2008 2:31 AM, Enrico Sirola <enrico.sirola@gmail.com> wrote:
>> I'd like to perform linear algebra operations on float4/8 arrays...

>
> If there were a coherently designed, simple, and fast LAPACK/ MATLAB
> style library and set of datatypes for matrices and vectors in
> Postgres, I think that would be a HUGE plus for the project!


I'd also be very excited about this project.

Especially if some GIST or similar index could efficiently search
for vectors "close" to other vectors.

I assume something like "within a n-dimensional bounding box"
would be possible with GIST....

I'd be eager to help, test, debug, etc; but probably aren't qualified
to take the lead on such a project.



---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #6 (permalink)  
Old 04-10-2008, 12:19 AM
Webb Sprague
 
Posts: n/a
Default Re: arrays of floating point numbers / linear algebra operations into the DB

(I had meant also to add that a linear algebra package would help
Postgres to be the mediator for real-time data, from things like
temprature sensors, etc, and their relationship to not-so-scientific
data, say in a manufacturing environment).

On Feb 1, 2008 12:19 PM, Ron Mayer <rm_pg@cheapcomplexdevices.com> wrote:
> Webb Sprague wrote:
> > On Feb 1, 2008 2:31 AM, Enrico Sirola <enrico.sirola@gmail.com> wrote:
> >> I'd like to perform linear algebra operations on float4/8 arrays...

> >
> > If there were a coherently designed, simple, and fast LAPACK/ MATLAB
> > style library and set of datatypes for matrices and vectors in
> > Postgres, I think that would be a HUGE plus for the project!

>
> I'd also be very excited about this project.
>
> Especially if some GIST or similar index could efficiently search
> for vectors "close" to other vectors.


That would be very interesting as we could play with a multitude of
different distance metrics from Analysis!!! Wow!

> I'd be eager to help, test, debug, etc; but probably aren't qualified
> to take the lead on such a project.


I almost think the hardest part would be to spec it out and design the
interface to the libraries. Once we had that, the libraries are
already there, though figuring out how we are going to handle gigabyte
size elements (e.g. a satellite image) will require some finesse, and
perhaps some tiling ...

Hmm. If I get some more interest on this list (I need just one LAPACK
/ BLAS hacker...), I will apply for a pgFoundry project and appoint
myself head of the peanut gallery...

---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #7 (permalink)  
Old 04-10-2008, 12:19 AM
Joe Conway
 
Posts: n/a
Default Re: arrays of floating point numbers / linear algebraoperations into the DB

Enrico Sirola wrote:
> typically, arrays contain 1000 elements, and an operation is either
> multiply it by a scalar or multiply it element-by-element with another
> array. The time to rescale 1000 arrays, multiply it for another array
> and at the end sum all the 1000 resulting arrays should be enough to be
> carried on in an interactive application (let's say 0.5s). This, in the
> case when no disk-access is required. Disk access will obviously
> downgrade performances a bit ad the beginning, but the workload is
> mostly read-only so after a while the whole table will be cached anyway.
> The table containing the arrays would be truncated/repopulated every day
> and the number of arrays is expected to be more or less 150000 (at least
> this is what we have now). Nowadays, we have a c++ middleware between
> the calculations and an aggressive caching of the table contents (and we
> don't use arrays, just a row per element) but the application could be
> refactored (and simplified a lot) if we have a smart way to save data
> into the DB.


I don't know if the speed will meet your needs, but you might test to
see if PL/R will work for you:

http://www.joeconway.com/plr/

You could use pg.spi.exec() from within the R procedure to grab the
arrays, do all of your processing inside R (which uses whatever BLAS
you've set it up to use), and then return the result out to Postgres.

Joe


---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #8 (permalink)  
Old 04-10-2008, 12:19 AM
Enrico Sirola
 
Posts: n/a
Default Re: arrays of floating point numbers / linear algebra operations into the DB

Hi Joe,

> I don't know if the speed will meet your needs, but you might test
> to see if PL/R will work for you:
>
> http://www.joeconway.com/plr/
>
> You could use pg.spi.exec() from within the R procedure to grab the
> arrays, do all of your processing inside R (which uses whatever BLAS
> you've set it up to use), and then return the result out to Postgres.


Thanks a lot for the hint, I'll give it a try. It also will be much
easier to implement a prototype :-)
Bye,
e.


---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #9 (permalink)  
Old 04-10-2008, 12:19 AM
Ron Mayer
 
Posts: n/a
Default Re: arrays of floating point numbers / linear algebra operationsinto the DB

Webb Sprague wrote:
> On Feb 1, 2008 12:19 PM, Ron Mayer <rm_pg@cheapcomplexdevices.com> wrote:
>> Webb Sprague wrote:
>>> On Feb 1, 2008 2:31 AM, Enrico Sirola <enrico.sirola@gmail.com> wrote:
>>>> ...linear algebra ...
>>> ... matrices and vectors .

>> ...Especially if some GIST or similar index could efficiently search
>> for vectors "close" to other vectors...

>
> Hmm. If I get some more interest on this list (I need just one LAPACK
> / BLAS hacker...), I will apply for a pgFoundry project and appoint
> myself head of the peanut gallery...


I think you should start one. I'd be happy to help.

I'm rather proficient in C; somewhat literate about postgres' GIST
stuff (I think a couple of my bugfix patches were accepted in postgis);
and deal with a big database doing lots of similarity-based searches (a
6'2" guy with light brown hair being similar to a 6'1" guy with dark
blond hair) - and am experimenting with modeling some of the data as
vectors in postgres.


---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org/

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #10 (permalink)  
Old 04-10-2008, 12:19 AM
Webb Sprague
 
Posts: n/a
Default Re: arrays of floating point numbers / linear algebra operations into the DB

> >>>> ...linear algebra ...
> >>> ... matrices and vectors .
> >> ...Especially if some GIST or similar index could efficiently search
> >> for vectors "close" to other vectors...

> >
> > Hmm. If I get some more interest on this list (I need just one LAPACK
> > / BLAS hacker...), I will apply for a pgFoundry project and appoint
> > myself head of the peanut gallery...

>
> I think you should start one. I'd be happy to help.


OK. You are on. I think designing an interface is the first step,
and I am inclined to use matlab syntax plus cool things I wish they
had (convolution matrices, recycling, etc).

> I'm rather proficient in C; somewhat literate about postgres' GIST
> stuff (I think a couple of my bugfix patches were accepted in postgis);


Nifty! I am having trouble bending my head around how we can fit 10K
by 10K matrices into Datums, but if you have worked with PostGIS then
a lot of those big geographic fields might help.

> and deal with a big database doing lots of similarity-based searches (a
> 6'2" guy with light brown hair being similar to a 6'1" guy with dark
> blond hair) - and am experimenting with modeling some of the data as
> vectors in postgres.


Well, I bet a good linear algebra library would help. A lot.

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org/

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 10:19 AM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Search Engine Optimization by vBSEO 3.2.0
www.UnixAdminTalk.com