This is a discussion on arrays of floating point numbers / linear algebra operations into the DB within the Pgsql General forums, part of the PostgreSQL category; --> Hello, I'd like to perform linear algebra operations on float4/8 arrays. These tasks are tipically carried on using ad ...
| |||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| Hello, I'd like to perform linear algebra operations on float4/8 arrays. These tasks are tipically carried on using ad hoc optimized libraries (e.g. BLAS). In order to do this, I studied a bit how arrays are stored internally by the DB: from what I understood, arrays are basically a vector of Datum, and floating point numbers are stored by reference into Datums. At a first glance, this seem to close the discussion because in order to perform fast linear algebra operations, you need to store array items in consecutive memory cells. What are the alternatives? Create a new specialized data type for floating point vectors? Basically, the use-case is to be able to rescale, add and multiply (element-by-element) vectors. Thanks for your help, e. ---------------------------(end of broadcast)--------------------------- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to majordomo@postgresql.org so that your message can get through to the mailing list cleanly |
| |||
| Enrico Sirola wrote: > Hello, > I'd like to perform linear algebra operations on float4/8 arrays. These > tasks are tipically carried on using ad hoc optimized libraries (e.g. > BLAS). In order to do this, I studied a bit how arrays are stored > internally by the DB: from what I understood, arrays are basically a > vector of Datum, and floating point numbers are stored by reference into > Datums. At a first glance, this seem to close the discussion because in > order to perform fast linear algebra operations, you need to store array > items in consecutive memory cells. > What are the alternatives? Create a new specialized data type for > floating point vectors? > Basically, the use-case is to be able to rescale, add and multiply > (element-by-element) > vectors. I'm not sure about the internals of PostgreSQL (eg. the Datum object(?) you mention), but if you're just scaling vectors, consecutive memory addresses shouldn't be absolutely necessary. Add and multiply operations within a linked list (which is how I'm naively assuming Datum storage for arrays in memory is implemented) will be "roughly" just as fast. How many scaling operations are you planning to execute per second, and how many elements do you scale per operation? Colin ---------------------------(end of broadcast)--------------------------- TIP 2: Don't 'kill -9' the postmaster |
| |||
| On Fri, Feb 01, 2008 at 11:31:37AM +0100, Enrico Sirola wrote: > Hello, > I'd like to perform linear algebra operations on float4/8 arrays. > These tasks are tipically carried on using ad hoc optimized libraries > (e.g. BLAS). In order to do this, I studied a bit how arrays are > stored internally by the DB: from what I understood, arrays are > basically a vector of Datum, and floating point numbers are stored by > reference into Datums. Well, arrays are not vectors of Datum, they are a vector of the objects they contain. When passed to a function floats, arrays and other by-ref types as passed by reference, but the array object itself does not contain references, it contains the actual objects. That doesn't necessarily make it the same as a C array though, the alignment considerations may be different. But at first glance certainly seems like an array would be in the right format for what you're doing. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Those who make peaceful revolution impossible will make violent revolution inevitable. > -- John F Kennedy -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.1 (GNU/Linux) iD8DBQFHozJ2IB7bNG8LQkwRAj3pAJ9ZXFv3ZZjMw6BSCdTfNq gXo1fRlQCdGCZx uecYYqNxlzaCWwNpr491D2o= =M05n -----END PGP SIGNATURE----- |
| |||
| On Feb 1, 2008 2:31 AM, Enrico Sirola <enrico.sirola@gmail.com> wrote: > Hello, > I'd like to perform linear algebra operations on float4/8 arrays. > These tasks are tipically carried on using ad hoc optimized libraries > (e.g. BLAS). If there were a coherently designed, simple, and fast LAPACK/ MATLAB style library and set of datatypes for matrices and vectors in Postgres, I think that would be a HUGE plus for the project! I would have used it on a project I am working on in mortality forecasting (I would have been able to put all of my mathematics in the database instead of using scipy), it would tie in beautifully with the GIS and imagery efforts, it would ease fancy statistics calculation on database infrastructure, it would provide useful libraries for the datamining/ knowledge discovery types, etc, etc. If we just had fast matrix arithmetic, eigen-stuff (including singular value decomposition), convolution, random matrix generation, and table <-> matrix functions, that would be amazing and would provide the material for further library development since a lot of complex algorithms just fall out when you can do advanced linear algebra. We need to be able to convert transparently between matrices/ vectors (which I think should be simple N by 1 matrices by default) and arrays, but we would probably want to go for a separate datatype in order to get speed since scientifically important matrices can be HUGE. Just my fairly worthless $0.02, as I all I would provide would be to be a tester and member of the peanut-gallery, but there you go. Seems like a perfect Summer Of Code project for someone better at C-level programming than me. -W ---------------------------(end of broadcast)--------------------------- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq |
| |||
| Webb Sprague wrote: > On Feb 1, 2008 2:31 AM, Enrico Sirola <enrico.sirola@gmail.com> wrote: >> I'd like to perform linear algebra operations on float4/8 arrays... > > If there were a coherently designed, simple, and fast LAPACK/ MATLAB > style library and set of datatypes for matrices and vectors in > Postgres, I think that would be a HUGE plus for the project! I'd also be very excited about this project. Especially if some GIST or similar index could efficiently search for vectors "close" to other vectors. I assume something like "within a n-dimensional bounding box" would be possible with GIST.... I'd be eager to help, test, debug, etc; but probably aren't qualified to take the lead on such a project. ---------------------------(end of broadcast)--------------------------- TIP 5: don't forget to increase your free space map settings |
| |||
| (I had meant also to add that a linear algebra package would help Postgres to be the mediator for real-time data, from things like temprature sensors, etc, and their relationship to not-so-scientific data, say in a manufacturing environment). On Feb 1, 2008 12:19 PM, Ron Mayer <rm_pg@cheapcomplexdevices.com> wrote: > Webb Sprague wrote: > > On Feb 1, 2008 2:31 AM, Enrico Sirola <enrico.sirola@gmail.com> wrote: > >> I'd like to perform linear algebra operations on float4/8 arrays... > > > > If there were a coherently designed, simple, and fast LAPACK/ MATLAB > > style library and set of datatypes for matrices and vectors in > > Postgres, I think that would be a HUGE plus for the project! > > I'd also be very excited about this project. > > Especially if some GIST or similar index could efficiently search > for vectors "close" to other vectors. That would be very interesting as we could play with a multitude of different distance metrics from Analysis!!! Wow! > I'd be eager to help, test, debug, etc; but probably aren't qualified > to take the lead on such a project. I almost think the hardest part would be to spec it out and design the interface to the libraries. Once we had that, the libraries are already there, though figuring out how we are going to handle gigabyte size elements (e.g. a satellite image) will require some finesse, and perhaps some tiling ... Hmm. If I get some more interest on this list (I need just one LAPACK / BLAS hacker...), I will apply for a pgFoundry project and appoint myself head of the peanut gallery... ---------------------------(end of broadcast)--------------------------- TIP 6: explain analyze is your friend |
| |||
| Enrico Sirola wrote: > typically, arrays contain 1000 elements, and an operation is either > multiply it by a scalar or multiply it element-by-element with another > array. The time to rescale 1000 arrays, multiply it for another array > and at the end sum all the 1000 resulting arrays should be enough to be > carried on in an interactive application (let's say 0.5s). This, in the > case when no disk-access is required. Disk access will obviously > downgrade performances a bit ad the beginning, but the workload is > mostly read-only so after a while the whole table will be cached anyway. > The table containing the arrays would be truncated/repopulated every day > and the number of arrays is expected to be more or less 150000 (at least > this is what we have now). Nowadays, we have a c++ middleware between > the calculations and an aggressive caching of the table contents (and we > don't use arrays, just a row per element) but the application could be > refactored (and simplified a lot) if we have a smart way to save data > into the DB. I don't know if the speed will meet your needs, but you might test to see if PL/R will work for you: http://www.joeconway.com/plr/ You could use pg.spi.exec() from within the R procedure to grab the arrays, do all of your processing inside R (which uses whatever BLAS you've set it up to use), and then return the result out to Postgres. Joe ---------------------------(end of broadcast)--------------------------- TIP 6: explain analyze is your friend |
| |||
| Hi Joe, > I don't know if the speed will meet your needs, but you might test > to see if PL/R will work for you: > > http://www.joeconway.com/plr/ > > You could use pg.spi.exec() from within the R procedure to grab the > arrays, do all of your processing inside R (which uses whatever BLAS > you've set it up to use), and then return the result out to Postgres. Thanks a lot for the hint, I'll give it a try. It also will be much easier to implement a prototype :-) Bye, e. ---------------------------(end of broadcast)--------------------------- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match |
| |||
| Webb Sprague wrote: > On Feb 1, 2008 12:19 PM, Ron Mayer <rm_pg@cheapcomplexdevices.com> wrote: >> Webb Sprague wrote: >>> On Feb 1, 2008 2:31 AM, Enrico Sirola <enrico.sirola@gmail.com> wrote: >>>> ...linear algebra ... >>> ... matrices and vectors . >> ...Especially if some GIST or similar index could efficiently search >> for vectors "close" to other vectors... > > Hmm. If I get some more interest on this list (I need just one LAPACK > / BLAS hacker...), I will apply for a pgFoundry project and appoint > myself head of the peanut gallery... I think you should start one. I'd be happy to help. I'm rather proficient in C; somewhat literate about postgres' GIST stuff (I think a couple of my bugfix patches were accepted in postgis); and deal with a big database doing lots of similarity-based searches (a 6'2" guy with light brown hair being similar to a 6'1" guy with dark blond hair) - and am experimenting with modeling some of the data as vectors in postgres. ---------------------------(end of broadcast)--------------------------- TIP 4: Have you searched our list archives? http://archives.postgresql.org/ |
| ||||
| > >>>> ...linear algebra ... > >>> ... matrices and vectors . > >> ...Especially if some GIST or similar index could efficiently search > >> for vectors "close" to other vectors... > > > > Hmm. If I get some more interest on this list (I need just one LAPACK > > / BLAS hacker...), I will apply for a pgFoundry project and appoint > > myself head of the peanut gallery... > > I think you should start one. I'd be happy to help. OK. You are on. I think designing an interface is the first step, and I am inclined to use matlab syntax plus cool things I wish they had (convolution matrices, recycling, etc). > I'm rather proficient in C; somewhat literate about postgres' GIST > stuff (I think a couple of my bugfix patches were accepted in postgis); Nifty! I am having trouble bending my head around how we can fit 10K by 10K matrices into Datums, but if you have worked with PostGIS then a lot of those big geographic fields might help. > and deal with a big database doing lots of similarity-based searches (a > 6'2" guy with light brown hair being similar to a 6'1" guy with dark > blond hair) - and am experimenting with modeling some of the data as > vectors in postgres. Well, I bet a good linear algebra library would help. A lot. ---------------------------(end of broadcast)--------------------------- TIP 4: Have you searched our list archives? http://archives.postgresql.org/ |