This seems to be mentioned in the very first sentence of the article:
> PostgreSQL supports Hash Index from a long time, but they are not much used in production mainly because they are not durable.
And there's a good reason for bringing them up anyway in the second sentence:
> Now, with the next version of PostgreSQL, they will be durable.
Saying "they are not durable" downplays the extent of how non-durable they are. That kind of "non-durability" needs flashing neon signs prominently featuring very large arrows, pointing at the specifics, not four vague words.
You get a warning when you try to create one in PostgreSQL 9.6. Maybe it should have been stronger worded, but since the issue is fixed in 10 anyway it is too late to do something about it.
# CREATE INDEX ON t USING hash (id);
WARNING: hash indexes are not WAL-logged and their use is discouraged
why are they not durable?
why shouldn't you use them for big tables? For example for PRIMARY KEY lookups, I never ask for a range of PRIMARY KEYs, so why not use a hash?
I think I'm missing the important point.
Durability (the D in ACID) means that if the power goes out or somebody kill -9s it, the database will be in a consistent state and anything that was committed was in fact committed. In current and previous versions of PostGres, pulling the plug at the wrong time could leave hash indexes in an inconsistent state. That's fixable by deleting and recreating the index but the time required to do so is proportional to the table size. In the next version of PostGres, it won't be a problem and hash indexes can be used.
In the case of a primary key, a row is inserted or deleted but the power goes out before the hash index is updated. When the power comes back, the hash index still points to rows were deleted and is missing rows that were inserted.
They are not durable because when durability was added to PostgreSQL a long time ago people were not sure how to add durability to hash indexes, so that task was left for later. But later did not happen, until very recently when some people worked hard on adding durability for them. Adding durability required quite major changes.
PostgreSQL 10 which will be released this autumn will include durable hash index, do not use hash indexes until then unless you really know what you are doing.
Collision probabilities are way more probably than people intuitively care to believe. The pigeonhole principle or the birthday Paradoxon exemplify this quiet well.
Hash tables are good, for small (relative to available space) static sets that don't change. Dynamic production environments will introduce an escalating probablity for performance degradation, and this can detrimental for critical systems.
Changes to hash indexes are not WAL-logged, so they are not crash safe. Wait for 10.0.
They're most useful with small memory-only tables that have very frequent lookups. For example, in a lot of systems, you might store the validation rules for each field in a table. Or you might have a set of autocomplete options. Or you may have a table with the set of currently logged-in sessions. These tables have row counts that measure in the hundreds, not millions, and the whole table can fit within a couple pages.
For something like this, the cost of reindexing is trivial, and the full data is usually in a .sql file somewhere, so even if there's data loss it takes all of about 10 seconds to restore it.
Why would you do this instead of hardcoding this in code, storing it in a local hashtable, or putting it in Redis/Memcached? Bunch of reasons. These tables can be shared across all clients of the database, so if you have multiple apps (in different languages!) that require common validation, they can all make a SQL query instead of recoding the logic. If you don't otherwise need Redis/Memcached, you can avoid adding that dependency to your stack. You can use foreign key constraints to ensure that the values in real data tables match the valid values in the validation table. You can throw an admin interface on top of the tables and let customers edit the set of available options, rather than forcing all changes to go through the developers.
This is buried in a few other comments, but it bears repeating as it's perhaps far more important than any performance gains you'd receive from hash indexes over btree... Hash indexes are not crash safe and not WAL logged. This means if you're running with high availability or even using something for disaster recovery like WAL-E and need to recover your database your index would not be in a working state after recovery. Instead you'd need to reindex that data.
For any real production database hash indexes should be used with extreme caution if used at all.
Something that the article does not mention: equality operations are cheaper with hash indexes than btree because less data pages need to be fetched. This is good particularly on columns with a high cardinality.
Hash table updates are quick until you have to rehash.
Rehashing can be done in the background, but then you had added code complexity, you'll be doing significantly more I/O during the rehashing (did you think about rehashing when sizing your machines?) and in pathological cases, you can run out of hash table space before the resize is complete. Now what?
Postgres hash indexes use linear hashing, which effectively means that rehashing/expansion is done incrementally: https://en.wikipedia.org/wiki/Linear_hashing
Rehashing can be incremental.
1: Allocate a new table of size at least 2X. 2: All accesses to the table hit both the old and new tables. 3: Inserts only go in new. 4: Any change (insert, update, delete) also moves 2 other elements.
You can prevent a O(n) rescan at 4 by keeping a single persistent cursor that scans through the old table from top to bottom. There, still O(1) throughout (C may go up by 2-3 but that's still small). And only uses 1.5x memory.
Not saying that's what postgres does here (it uses WAL, so probably not exactly).
So this is a great point that I forgot about. Yes, it's a problem for sure, but definitely not unaddressable. The databases community has been thinking about hashing schemes that address this issue since the 80s, like linear hashing, extendible hashing and consistent hashing. The trick is often around amortizing the cost of rehashing over insertions (no background tasks needed!), or somehow minimizing the amount of remapping and rehashing that needs to be done by using clever positioning. So yeah, like anything, it's a tradeoff and not a cure-all, but it can be done quite well.
Hash indexes have other benefits than just algorithmic complexity too: traversing and updating tree structures requires a lot of pointer chasing, so in cases where your btree index might not fit well in a reasonable level of cache (L3 etc), then this means paging stuff in and out of memory a lot, which can get very expensive very quick.
And also with updates to hash tables (e.g. transactional workloads), they're very quick, versus btree updates require traversing the tree structure.
Hash indexes are currently (9.6) a little bit larger than btree indexes . There seems to be no space improvement yet committed for postgres 10. But stay tuned, maybe somebody will do it soneday.
Hash indexes have not been recommended for years because they have not been crashsafe and therefore are not really that optimized in every term yet (size, query planner, ...).
I'm not a Postgres specialist but, if memory serves, when joining 2 tables the entire hash must be loaded on 1 side (typically the smaller side, of course).
You're mixing up hash joins with hash indexes.
I know they're different things, but isn't the main reason to do hash indexes in a relational db to then to a hash join?
More to the point: even if you're just talking indexes you can't load/use half a hash table, so wouldn't the point also apply to isolated use of a hash table.
I know it's not what was asked for, just a related point.
> I know they're different things, but isn't the main reason to do hash indexes in a relational db to then to a hash join?
I think the primary use case is O(1) lookup by key, i.e. similar to how key/value stores work.
I'm not aware of the planner being able to reuse an existing hash index to perform a hash join. AFAIK the hash buckets are created on the fly. That does sound like a cool idea though.
> More the point, even if you're just talking indexes you can't load/use half a hash table, so wouldn't the point also apply to isolated use of a hash table.
What do you mean by half a hash table?
Thanks for this clarification
>> What do you mean by half a hash table?
What I'm trying to get across is my impression that you don't have to load an entire B-tree index into memory to use it... you can traverse it. Whereas a hash index can't be "streamed" or traversed like that, you have to use the whole thing.
I read some papers on hash indexes and hash joins several months ago, and know little about the Postgres specifics... just wanting to wrap my head around it, so please help me understand if I'm off base with this.
>> That means you don't load it into memory at all, you just jump to Bucket[Hash(KEY)] and then check there for KEY (it may or may not be there due to false positives and the bucket being full).
Right, of course. You're making me realize this really is my confusion between hash indexes and hash joins. The issue I'm thinking of is all about dynamically constructed hash joins. Sorry, you called it right from the start but it took me a minute to see... thanks for helping talk through it.
A hash index allows you to determine which bucket an arbitrary key is mapped to without having to look anything else up.
That means you don't load it into memory at all, you just jump to Bucket[Hash(KEY)] and then check there for KEY (it may or may not be there due to false positives and the bucket being full).
Question for the Postgres developers who I know are lurking around here ...
What's the disk usage of hash indexes v.s. regular btrees?
Does it depend on the underlying type or is it a fixed size regardless of the key fields?
How's this compare to the poor man's hash index, i.e. BTREE(HASH(keys...))?
EDIT: I got curious so I ran a test of building btrees vs hash indexes. This is done on 9.6 (not HEAD with the new WAL logging supported hash indexes). Looks like the sweet spot is when the key size is around 20 bytes. After that the fixed size of the hash index gives it a win (on space usage) v.s. the btree which would have to repeatedly store the key values.
CREATE TABLE IF NOT EXISTS hash_test_results (
CREATE OR REPLACE FUNCTION pg_temp.test_hash_index(p_key_size int)
DROP TABLE IF EXISTS index_test;
CREATE TABLE index_test as select lpad('' || x, p_key_size, '0') AS a FROM generate_series(1,100000) x;
CREATE INDEX index_test_ix_btree ON index_test USING btree (a);
CREATE INDEX index_test_ix_hash ON index_test USING hash (a);
INSERT INTO hash_test_results
(key_size, btree_size, hash_size)
p_key_size AS key_size,
(SELECT pg_relation_size(c.oid) FROM pg_class c WHERE c.relname = 'index_test_ix_btree') AS btree_size,
(SELECT pg_relation_size(c.oid) FROM pg_class c WHERE c.relname = 'index_test_ix_hash') AS hash_size;
SELECT pg_temp.test_hash_index(x * 8) FROM generate_series(1, 16) x;
pg_size_pretty(btree_size) AS btree_size,
pg_size_pretty(hash_size) AS hash_size
key_size | btree_size | hash_size
8 | 3104 kB | 4112 kB
16 | 3992 kB | 4112 kB
24 | 4880 kB | 4112 kB
32 | 5792 kB | 4112 kB
40 | 6648 kB | 4112 kB
48 | 7592 kB | 4112 kB
56 | 8464 kB | 4112 kB
64 | 9352 kB | 4112 kB
72 | 10 MB | 4112 kB
80 | 11 MB | 4112 kB
88 | 12 MB | 4112 kB
96 | 13 MB | 4112 kB
104 | 14 MB | 4112 kB
112 | 15 MB | 4112 kB
120 | 15 MB | 4112 kB
128 | 16 MB | 4112 kB
Durable in the ACID sense, or specifically that write operations go through the WAL (write-ahead log, i.e. a journal), with all the consequences this brings (both durability and replication).
Before this work was done, hash indexes could literally be corrupted by power outages, etc.
Oh right, that's scary! In the case of corruption would the index just be rebuilt (I realise that's a big problem with big indexes) or would it just silently corrupt query output?
Hash index operations are not presently WAL-logged, so hash indexes might need to be rebuilt with REINDEX after a database crash if there were unwritten changes. Also, changes to hash indexes are not replicated over streaming or file-based replication after the initial base backup, so they give wrong answers to queries that subsequently use them. For these reasons, hash index use is presently discouraged.
That is the warning in the current version of Postgres about hash indexes.
Hash indexes are not WAL logged in Postgres so they are not replicated and will not survive non-graceful shutdowns, i.e. if the server crashes you have to rebuild it from scratch. As such, they're not recommended unless you know exactly what you're doing.
Just to clarify, they are WAL logged starting in Postgres 10, which is what this article was talking about.
What does durable mean in the context of a Postgres index? I can sort of guess but from the way it's used here it seems like it has a well know definition. A quick google search (on my phone) returned this article a few times but not much to else that explains it.
No range lookups. No order by speedups. One thing that I can see Hash Indexes being really useful for is people who use UUID primary keys. Pretty much the only thing you do with a UUID is equality anyways, so if Hash Index is faster than btree, it makes sense to use that instead.
Until you hit the problem of needing to iterate over your table (batches).
As far as I can tell, the only real way to do efficient iteration over postgres tables is to continually bound ID at either end and then take an offset. (SELECT * from <foo table> where id > <largest id from last batch> limit 10). With btrees on UUID you get lexicographic ordering here I imagine, so it'll work. With hash indexes no dice.
Cursors don't end up working out because if you kill long-running queries or idle connections (which most websites should, in production) you'll kill applications using cursors. With pgbouncer, at least, reading from cursors doesn't seem to bump idle timeouts.
Surely hash indexes have some stable ordering you can order by, even if its meaning is obtuse.
Presumably, hash indexes are based on hash tables underneath, which don't generally have a stable ordering, even an obtuse one. (Any write could cause the underlying table to get resized, which will cause the contents to get re-ordered.) Perhaps if you can guarantee that writes aren't happening, they might, but I don't think this is generally applicable to most people's use case.
(but range queries / being able to iterate are generally why I prefer btrees, unless I have a good reason. Keep in mind that the O(log n) on B-trees is log with a huge base — it's not power of two. It takes very few disk reads for a b-tree to find its query; it's just not the O(1) of a hash table.)
Ah, yes I see now.
Hash indexes don't support any operations other than = as far as I can tell, so they won't even be considered for >.
UUIDs can be chosen to have at least a rough ordering to make them more efficient to use as primary keys, discussion about this use here: https://github.com/jmcvetta/guid/blob/master/README.md
Hash indexes only support the equality operator, they're not as versatile as btree indexes (see https://www.postgresql.org/docs/devel/static/indexes-types.h...).
How do they work for range retrievals ?
EDIT: Isn't that the point of btree ?
Did you read the article?
To start with let us see the impact of work done to improve the performance of hash indexes. Below is the performance data of the pgbench read-only workload to compare the performance difference of Hash indexes between 9.6 and HEAD on IBM POWER-8 having 24 cores, 192 hardware threads, 492GB RAM.
The workload is such that all the data fits in shared buffers (scale factor is 300 (~4.5GB) and shared_buffers is 8GB).
And chart itself says it is a median of 3 5 minute runs.
I believe I did.
Why would I assume that Concurrent runs == Seperate Runs or that other caching mechanisms aren't in place or really anything. Computers do really odd things trying to optimize and making assumptions that your system is the same over a period of 30 minutes when you don't even know it is the same 30 minutes is concurrent. There is all sorts of stuff that gets in the way of performance tests and I would like to know how it is mitigated. Again, I am sure there is proper process, but why wouldn't know want to know what that is?
These articles drive me crazy when they don't mention the testing process. Yes, he is using a "server class" machine, but did he disable the features on a hardware level that would skew the test. Answer: probably. I'll assume he did some sort of clear to makes sure that the data wasn't cached on the hard drive or assume that all the reads were in memory. I can't even tell if he ran the test over several iterations.
It isn't that I am doubting the results, but without some trail on how to get the results I can't make this useful information. For example if I were architecting a system and was using this as information I have no idea if I run this in a VM that I need enough memory to get these results.
Yes, they are durable now in the master branch.
Great news if they will really make hash indexes durable! That will allow some interesting new developments, such as O(1) efficient lookups regardless of table size. This in turn will make it easier and more appealing to build graph databases using Postgres.
Hash indexes are "the right way" to speed-up index lookups.
They aren't used in Postgres because their implementation there has many problems. One of them is that they are slower than the theoretically suboptimal B-trees. In the process of fixing those problems, people will compare both indexes many times.
I'm still dubious of any micro benchmark that says "x is better than y" since these are often not real world work loads.
Tuning/optimization starts with using the right schema.
Okay, maybe so, but this benchmarking was done against POWER8 architecture, and not many people are running that. How does it stack up on x86_64?
The problem with hashes is that they offer a likelihood of collision which in practice and for very large sets, does happen.
Trees don't and are easier to reason about and prove. Esp. Important for critical systems or projects on massive scales,ud want a tree ADT on that Mars rover.
That's the gist of my conversation with my algorithms tutor about why my use of hashing to solve problems, while correct, is flawed.
I did get the points for those exercises and my lower bounds were alot better than what was demanded, but the problem lies in the fact that it's not guaranteed.
Would be interesting if postgres started supporting sorted keys, like HBase.
How they do show sorted data when not having a tree, but hashes?
Which is a nice side effect if you can trust the order of your primary key.
Unless you need to fetch items in order, for some reason.