[–] peff link

You can already play with this using Git's "grafts" and "replace refs" features. Most of Git's graph algorithms are OK with this because they traverse the graph while setting one or more "SEEN" bits on each node. If you find one that loops infinitely, it would be worth making a bug report.

reply

[–] laumars link

Interesting idea. Continuing your theory, if a git utility then got locked in an infinite cycle trying to import the commit due to the recursive parent-child relationship, then such an attack might be exploitable for DOSing cloud-based git repositories.

reply

[–] makomk link

I don't think this is possible. While there are ways to use collisions to make a document contain its own hash, they rely on format-specific tricks to insert collisions between each character of the hash. There's no way I know of to do this with a bare, unformatted MD5 or SHA-1 hash.

reply

[–] kefka link

It sees an easy way to fix this is to scan the report for sha1 tags, and put them on the stack. Then compare the next to the list. If match is found to be duplicated, refuse to load or load in degraded state with warning bells.

reply

[–] rmc link

A circular directory would be much worse. As soon as you check it out, your disk fills up.

reply

[–] maweki link

but git doesn't store directory information. it is implicit with the file information.

So symlink-shenanigans shouldn't be possible using just the git objects.

reply

[–] rbehrends link

Git doesn't store directory information per se, but the directory structure is still reflected in tree objects.

reply

[–] xg15 link

One thing I'm wondering about that I haven't seen discussed yet is the possibility to make circular git histories:

When a git commit hash is calculated, one of the data that is used are the pointers to parent commits. That results in two properties for commits:

1) A child commit's hash cannot be predicted without knowing the parent commits

2) Once a commit is created, its parent pointers are practically immutable. The only way to change them is to create a new commit with a different hash - which cannot carry over the old commit's children.

Those properties meant that it's impossible to construct a commit with a parent pointer that points to its own children. Therefore it was safe to assume that a git history is always a DAG.

I think with deliberate SHA1 collisions now possible, someone could actually violate property (2) and introduce a cycle.

As the DAG assumption is pretty fundamental, I'd imagine such a cycle could break a lot of tools. Has this ever been tested?

reply

[–] unholiness link

Interesting that it's hosted on GitHub. Everyone in this thread is wondering about the 2^-90 chance that a random commit will fail, but I wonder how these guys prevented their repo with test files from being rejected.

reply

[–] koolba link

The files don't trigger a collision because git adds a header prior to hashing. The hash is the sha1 of the total bytes.

reply

[–] theoh link

On that note, in the case of this kind of system, the possibility of a false positive (so a commit fails, but shouldn't) there's always the option of adding a salt or a NOP of some kind to the file being committed so as to alter its hash.

Filesystems like Venti could conceivably, on the one hand, check for collisions, but on the other, compare the stored block with the newly committed block. If there's a collision, but the blocks differ, just perturb the new block with a NOP, a nonce header of some kind...

reply

[–] collinmanderson link

You could also just zip the files and unzip when testing

reply

[–] indolering link

> Everyone in this thread is wondering about the 2^-90 chance that a random commit will fail

Considering that the Bitcoin network burns through ~2^90 hashes in a year ... I'm not worried : )

reply

[–] koolba link

The article links to this repo that actually does the work of finding possible collisions: https://github.com/cr-marcstevens/sha1collisiondetection

My understanding of it is that it runs a SHA-1 and examines the internal state of the of the digest along the way to see if it matches up with known vectors that could be manipulated to cause a collision.

reply

[–] peff link

It is possible; the researchers estimate the likelihood of a false positive at 2^-90 (which puts us back in "Sun engulfs the Earth" territory).

There are metrics that will alert GitHub's infrastructure team if a collision is found (to confirm that we aren't seeing any false positives). Those metrics were quietly shipped (without the matching "die") for a week before flipping the final switch.

If you want to know more about the patterns, see the sha1collisiondetection project:

https://github.com/cr-marcstevens/sha1collisiondetection

There's a research paper linked in the README.

reply

[–] rspeer link

They say that the chance of a false positive is less than 2^-90.

reply

[–] 3JPLW link

Would it be possible to exploit the found bit-pattern in a "false positive" (that wasn't intentionally constructed to collide) to construct a second object with that preimage? Is a preimage attack in that situation more feasible?

reply

[–] orblivion link

Per commit, or over the expected lifetime of Git's utilization of sha-1? If it's the former it's not very useful unless we have an idea how many commits are made.

reply

[–] jcranmer link

You should expect to see a hash collision with probability >½ with about sqrt(2^n) hashed objects for a hash of length n. I believe hashes are made per changed file per commit.

A large project (like Firefox) might make a few hundred commits per day, or tens of thousand per year. So that comes out to 2^20-2^30 hashes per year. I don't know how many distinct repositories GitHub has, but I doubt it's larger than 2^30. So that means that GitHub has no more than 2^60 SHA-1 hashes related to git, and probably more like 2^40-2^45.

So the probability of a collision is <<2^-30 (the collision function is logistic, so assuming linearity between 1 and 2^90 is a wildly overoptimistic assumption) in the optimistic case, probably something like <<2^50. In perspective, it's more likely that you will win the lottery tomorrow but fail to discover so because you were killed by lightning than it is that GitHub will detect a chance SHA-1 collision.

reply

[–] orblivion link

Wait, are we talking about chance SHA-1 collisions, or a false positive for this test for this vulnerability?

reply

[–] SamReidHughes link

It's not logistic, that's a specific kind of function.

reply

[–] semi-extrinsic link

There are ~3.1e9 seconds in a century, and ~7.5e9 people on Earth. So if every human makes a git commit every second for a century, the odds of a single false positive is ~5.3e-7 (i.e. less than one in a million).

reply

[–] sillysaurus3 link

if every human makes a git commit every second for a century

Will someone please write a dystopian novel around this premise? It almost writes itself. GitHub turns evil, forcing the world population into subservience from birth; using Git is all anyone is ever allowed to do, and is how people conceptualize the universe and access entertainment; various cults form with the belief that randomness can be influenced via the right ceremony...

You can source inspiration from https://twitter.com/DystopianYA

reply

[–] stevesearer link

I was recently thinking of another idea where humanity works for Google and everyone's job is to solve recaptchas. Everything is free for people who earn enough credits by solving the puzzles.

reply

[–] apearson link

Isn't this an episode of black mirror?

reply

[–] StavrosK link

It is.

reply

[–] stevesearer link

Oh funny, which episode? I've never seen the show before.

Someone here had mentioned that they try to incorrectly answer recaptchas as a way to spite them for trying to steal free labor from them. My mind took that to the extreme of everyone working for Google... :)

reply

[–] frei link

I believe the parent poster is referring to the second episode [0], where people must earn credits for cycling on exercise bikes.

[0] https://en.wikipedia.org/wiki/Fifteen_Million_Merits

reply

[–] StavrosK link

Yep, exactly. I didn't remember the title offhand, thanks.

reply

[–] MrRadar link

Sounds like the plot to For-Profit Online University[1]. "I'm a digital gardener."

[1] https://www.youtube.com/watch?v=XQLdhVpLBVE

reply

[–] tmsldd link

If it is not a movie yet.. you should definitely write one :D

reply

[–] paulddraper link

> various cults form

...each believing in a Supreme branching strategy

reply

[–] sillysaurus3 link

And your status in society is determined by the number of consecutive digits of the hash of your very first git commit.

reply

[–] ntumlin link

Alternative premise, tech interviews continue to become increasingly competitive, and if you want to be a true dev hirable by Google, etc. this is the output you need.

reply

[–] boyter link

I could even see a matrix like aspect where those who are unplugged do so by creating hash collisions to hide what they are really doing.

reply

[–] caf link
[–] semi-extrinsic link

You're right, I suck at probabilities and did it wrong: (3.1e9*7.5e9)/2^90 .

reply

[–] AnimalMuppet link

Well, I'd lay a fair amount of money that the number of commits is less than 10^20...

reply

[–] rlanday link

It's also possible for a SHA-1 collision to result from random chance, however it's extremely extremely unlikely to the point where it will almost certainly never happen.

reply

[–] tinus_hn link

This is true of any hash though. SHA-1 is broken because it is known how to create a collision intentionally.

reply

[–] paulddraper link

True of any good hash, at least.

So, to the question "Isn't it possible for a valid non-colliding object or commit to contain that pattern as well", the answer is "No, not really."

reply

[–] click170 link

> The recent attack uses special techniques to exploit weaknesses in the SHA-1 algorithm that find a collision in much less time. These techniques leave a pattern in the bytes which can be detected when computing the SHA-1 of either half of a colliding pair.

> GitHub.com now performs this detection for each SHA-1 it computes, and aborts the operation if there is evidence that the object is half of a colliding pair.

Isn't it possible for a valid non-colliding object or commit to contain that pattern as well? It sounds like eventually, though possibly in the far distant future, someone will be unable to push a commit to Github because it matches the pattern but doesn't contain colliding objects.

Does anyone know what the pattern is they're looking for? I'm curious now.

reply

[–] yuhong link

I wonder how many chosen prefix (not identical prefix) attacks this detects. There are many ways to do it ranging from 2^77 to 2^80 time. Of course, this is still much more expensive than identical prefix, and git would probably not the best target for such attacks.

reply

[–] undefined link
[deleted]

reply

[–] taejo link

The team that found it is at least partly academic; they published a paper on a free-start collision at CRYPTO 2015, and made a prediction there about how long it would take them and how much it would cost to find a full collision, which was more or less accurate. They are hash function researchers, not spies or hackers. There's no reason to believe they were hiding anything.

reply

[–] runholm link

Weaknesses with theoretical attacks have been known for over 10 years, and most major companies started phasing it out of use years ago. The recent news that someone was able to find two colliding inputs is just yet another red flag against continuous use, but there is no more need to panic now than a year ago. Remember that finding a collision for a spesific input is much harder than finding two inputs which collide.

reply

[–] simias link

That doesn't make a lot of sense to me, if anything this type of attack only gets more practical as the cost to compute a collision dwindles. It's not like a 0-day vulnerability.

I'm also not aware of any significant recent attempt at mitigating SHA-1 collisions that would've suddenly made the attack impractical. And in general these collisions are pretty difficult to exploit since you can't control the collision exactly.

reply

[–] yeukhon link

Maybe. But SHA-1 is still widely used today, perhaps no long for signing certificates, but a lot of developers and systems today continue to use SHA1 to produce checksum.

reply

[–] rmc link

We don't know what tools the NSA have in their arsenal.

reply

[–] IncRnd link

The SHAttered attack was made public shortly after browsers deprecated SHA-1 or started flagging SHA-1 in certs as insecure. This specific practical attack may have existed for a while before it was made public. In fact it may have been made public only when it no longer provided practical benefits.

Just some food for thought.

reply

[–] detaro link

Because Git only supports SHA-1 as of now, and files in Git, distributed to users, is what they worry about here. If it were some internal system, then they could switch to a different hash easier.

reply

[–] everlost link

Naive question - why can't GitHub use a SHA2 function for commit hashes?

reply

[–] Achshar link

I don't think github is in a place to change how git itself works. That's Linus and the team's job.

reply

[–] Buge link

That's a good idea, but only provides full protection if all git clients do it. The official git client itself doesn't support that, so people would still be vulnerable if they got a malicious half from github, or a signature on a non-malicious half from github and a malicious half somewhere else.

This protection github is implementing uses a heuristic to try to find collision attempts by just looking at a single file. So it prevents github from hosting any collision attempts.

reply

[–] tkremer link

Why not just store a second hash with a different algorithm, and check both hashes match?

reply

[–] chrisseaton link

> rather than doing a direct comparison

A direct comparison between what two things? In the attack they are worried about, they only see one half of the attack.

> hashing using a more secure algorithm

They don't control how Git works.

reply

[–] gruez link

Why use that specific method to detect sha1 collisions rather than doing a direct comparison or hashing using a more secure algorithm?

reply

[–] peff link

No, it's better than that. It detects evidence of the cryptanalytic tricks that were used to generate that collision. So it will find any collision generated using their attack technique (which is ~2^63 operations, as opposed to 2^80 for brute force).

There may be new disturbance vectors discovered as the attack matures, but they can be added to the detector.

reply

[–] TorKlingberg link

If I understand right they detect files that contain the specific collision that Google released. It will work fine until somebody else does the same work Google did to generate an other collision pair.

reply

[–] uwu link

what is it?

reply

[–] fny link

Looks like a bug with Cyrillic character highlighting. I'm guessing this is probably an issue with Rogue/Pygments[0][1] since I doubt Github is rolling their own syntax highlighter.

[0]: https://github.com/jneen/rouge

[1]: http://rouge.jneen.net/pastes/6nDQ

reply

[–] antons_ghost link

>>> "params[:sandboх] = true".decode('ascii')

Traceback (most recent call last):

  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xd1 in position 14: ordinal not in range(128)

reply

[–] AnimalMuppet link

One of two things will happen. Either the machines will be generating random commits, and nobody will care if they collide...

... or the machines will be able to generate real, useful code that fast, in which case it will be the machines' problem to solve.

reply

[–] charles-salvia link

And in the latter case, most human programmers will be unemployed, and thus unconcerned with hash collisions.

reply

[–] nojvek link

If you were generating hashes on the gpu for collisions, according to https://hashcat.net/forum/thread-4314.html

A single titanX does 2^9 hashes per second. Suppose you had a 10,000 Gpus (expensive but not impossible) you can do 2e13 hashes per second.

In 100 years there are ~3e9 seconds, so 6e22 hashes. Still magnitudes below 90.

Sha1 isn't still crackable by brute force in a meaningful time.

reply

[–] paulddraper link

> What happens when machines are writing commits and there are far more than 5 million and 1 per second?

Then a collision will happen far sooner than the Sun engulfing the Earth.

Maybe in the next million years, instead of 7 billion.

reply

[–] pvg link

No. Let's say the commit rate increases by a factor of a million. You still have centuries to upgrade and fix the problem.

reply

[–] IncRnd link

Attacks always get better, not worse. You never have centuries to fix a broken security primitive, such as this.

reply

[–] pvg link

Why are you telling me this? We are talking about accidental collisions not a 'security primitive' or attacks.

reply

[–] IncRnd link

The github change was not put in place to handle accidental collisions but to detect an attack on the sha-1 security primitive.

reply

[–] pvg link

So? This subthread is about a line in the write-up that talks about accidental collisions. Your objection to 'centuries' is a complete non-sequitur.

reply

[–] IncRnd link

You apparently like to argue. Even if the sub-discussion is about accidental collisions centuries do not remain to fix the problem, which is that there was a recent practical attack of a mathematical speedup for certain types of collisions. Your conclusion doesn't follow from your premise.

The attack persists regardless of the lack or presence of accidental collisions, therefore there aren't centuries to fix the problem.

reply

[–] pvg link

You're the one who started this stupid argument. There is no 'attack'. This is was the conversation:

Someone: "Huh, they say it would take billions of years for an accidental collision at checkin rate X"

Someone else: "What if the collision rate were much higher? Famous last words, 640k, herf glorp"

Me: "Even at a million times higher checkin rate you are unlikely to see collisions anytime soon"

You: "It's about ethics in gaming journalism!"

My conclusion doesn't follow your premise because your premise is simply wrong. Enough.

reply

[–] CJefferson link

No, this is "640K ought to be enough for anybody while we fix the underlying problem and put a better hash function into git". This is more like living with a 4GB 32-bit memory space while we wait for 64-bit CPUs to become popular.

reply

[–] graton link

> Is this a "640K ought to be enough for anybody" quote of our time?

No.

1) That quote wasn't actually said by Bill Gates

2) Very few people will quote that 15-20 years from now.

reply

[–] jacquesm link

> Very few people will quote that 15-20 years from now.

But that didn't stop someone from quoting it today and that's the third time or so this week that I come across that quote. The fact that it is quoted today - for which there is really no good reason - makes me suspect it will still be quoted 15-20 years from now. Hopefully less frequently.

reply

[–] pvg link

One of the reasons it lives on is because it does represent what very quickly turned out to be a design mistake in the PC architecture - sticking 'reserved' memory at the top of a 20 bit address space. If it hadn't been for that, nobody would have ever had to opine whether 640k is or isn't enough, they'd have just added more memory and moved on.

reply

[–] lorenzhs link

Pretty sure the GP is referring to GitHub's statement with "that", not the 640k quote. Nobody will quote the probability of a false positive of a SHA1 collision detector in 15-20 years.

reply

[–] bbcbasic link

Updated to make it modern: 640k of Minifed, Gzipped JS should be enough for anybody

reply

[–] nojvek link

Tell that to angular.

reply

[–] web007 link

5 billion years divided by (speed of computers / 5 million per second) is probably still high enough for most practical terms.

reply

[–] markcerqueira link

I guess the pertinent question now is how close are we to 5 million and 1 commits per second!

reply

[–] a13n link

> Two objects colliding accidentally is exceedingly unlikely. If you had five million programmers each generating one commit per second, your chances of generating a single accidental collision before the Sun turns into a red giant and engulfs the Earth is about 50%.

What happens when machines are writing commits and there are far more than 5 million and 1 per second? Is this a "640K ought to be enough for anybody" quote of our time?

reply

[–] cyphar link

But commit signatures are actually just signing the SHA1 hash. So you still have the problem that you could just reuse the signature for the valid commit with the malicious one.

reply

[–] BrailleHunting link

Then git also needs to change, but the process as above for github and others is still correct.

0. Upgrade to SHA3/Keccak, eventually make SHA1 a read-only legacy feature.

1. Upgrade signing to sign blobs and sign signatures for non-blobs instead of signing refs.

reply

[–] homakov link

It's not them it's GPG that's too ugly to be used even by programmers. If they enforce it, too many people will struggle

reply

[–] FungalRaincloud link

I see this line of thought pretty frequently. I don't really understand why we collectively believe that GPG is so tough. I started using it as a teen, barely even understanding the software, let alone how the actual encryption worked. I've come to understand it, but even when I didn't, it didn't seem difficult at all to get passable security out of it.

Cyphar has the right answer for why this would be problematic, though - GPG would just prove that the initial commit was from a trusted committer. It would be trivial to use the same signature with the fake commit. That's not an unsolvable problem, but switching to a different hashing algorithm seems to me like the more reasonable solution.

reply

[–] mstade link

I set up GPG for my account a couple of months ago, and found it surprisingly simple. The setup has even survived a recent machine change with no issues what so ever. Curious to know what makes this "too ugly" if you don't mind elaborating!

reply

[–] BrailleHunting link

Forcing committers to GPG sign their commits and authorizing only verified commits from certain key ids would be a stronger, end-to-end chain-of-custody process. Right now, GPG verification badges on tags and commits is nice but it doesn't have "teeth." This is a nice step, but GH needs to keep going and add stronger, mandatory defense-in-depth (optional security doesn't get used).

reply