[–] polskibus link

Do you know if it is an in-memory solution or can it work with data that doesn't fit into RAM?

reply

[–] lima link

Yes, in fact, it's specifically designed for data that doesn't fit in RAM. Their storage engine optimizes data locality for fast retrieval from non-flash storage.

See: https://clickhouse.yandex/presentations/data_at_scale/

Their docs also talk about it a lot.

reply

[–] bsg75 link

Its a disk based database, so not RAM limited.

reply

[–] eis link

Have a look at the page. It can do both.

reply

[–] lima link

Have a look at ClickHouse for OLAP workloads:

https://clickhouse.yandex/

It's a recently open sourced database by Yandex. It powers their web analytics backend. CloudFlare is already using it for their DNS analytics.

reply

[–] pjungwir link

I've tried it a couple times but didn't see as dramatic a speedup as I'd hoped either. What has worked for me is storing a full array in a single row of a regular Postgres table. Then if you have to load e.g. 1 million FLOATs, you can just slurp an 8MB chunk off disk instead of looking in 1 million different places. Postgres doesn't have a lot of functions for working on that kind of data though, so I wrote an extension to provide a few at [0]. I will probably add some more functions soon around masking out values so you can work "R/Pandas-style" on these arrays.

My approach breaks down if you are heavily updating your arrays though. On the mailing list Tom Lane suggested I store that kind of data outside the database.[1] But of course all software development is an attempt to have your cake and eat it too, so I wrote an extension that is a bit like cstore_fdw, but maybe more extreme: you can store/load/append a float array as an ordinary file.[2] It has similar operational disadvantages to cstore_fdw (no replication, no pg_dump, etc.) but performs faster. On the other hand it is definitely not battle-tested and still likely to change a little. (I need to improve the crash story and may add compression.) If you decide to use it, let me know!

Oh another limitation of keeping everything in an array (whether with floatfile or not) is that Postgres has a 1 GB limit on most RAM allocations. For example you could try running array_agg on a giant table and you might get a message about `Exception invalid memory alloc request size 1073741824`. I don't think cstore_fdw has that problem.

[0] https://github.com/pjungwir/aggs_for_arrays/

[1] http://www.postgresql-archive.org/Performance-appending-to-a...

[2] https://github.com/pjungwir/floatfile/

reply

[–] mtuncer link

replication support is recently added to cstore_fdw. You can replicate a cstore_fdw table to another server using streaming replication.

reply

[–] metdos link

This actually depends on your use case and type of your queries. There is a blog post (2014) about a benchmark: https://www.citusdata.com/blog/2014/06/14/columnar-store-ben...

I also believe that the main use case is for the compression of data more than the performance benefits.

reply

[–] metdos link

If you have a common filter, it is a good idea to sort your data on that field and load into cstore_fdw. This will help to use skip indexes. https://github.com/citusdata/cstore_fdw#using-skip-indexes

reply

[–] drej link

Has anyone achieved a dramatic speedup using this extension? Last time I tried it, I got barely a 30% speedup in a generic aggregation across a few fairly normalised tables with ~10M rows. I don't know if I'm doing something wrong or if this is all I'm getting. Thanks!

reply