[–] carvalho link

On Quora someone asked what the longest search query time was. I was able to craft a query that took multiple seconds to complete. It used wildcards and undocumented iteration allowing one to stuff thausands of queries into a single query. Turns out it is someone's job to measure result response times, and he/she came into the thread to kindly ask us to stop messing up their statistics.

reply

[–] hoschicz link

This is his answer: "I work on search at Google, and I have to say, very clever answers! Now, please stop. :-p"

I don't think that he did it because it's his job to stop random people on the Internet from running slow queries. I think she was just surprised how creative people are and found it funny.

reply

[–] _ao789 link

Both a 'he' and a 'she'. Is that what it takes to pass the whiteboard b-tree reversal interviews these days?

reply

[–] slazaro link

It's Schrödinger's gender, it's both when unknown, until revealed.

reply

[–] _ao789 link

Aka: Schrödinger's Googler

reply

[–] _ao789 link
undefined

reply

[–] undefined link
[deleted]

reply

[–] logicallee link

It's hard to believe, but years ago, back when Google had what was called "stop words" (like 'the', that it ordinarily ignored) I was able to make Google perform a search that took over 30 seconds.

The reason stop words take such a long time is that millions of sites have words like "the" on them, so doing a join on all those simply takes a long time.

My method to find a long string consisting entirely of stop words, was to just download a project gutenberg of the complete works of shakespeare, and find the longest string consisting of just stop words in there, then search for it as a literal quote.

The longest one I found was: "From what it is to a".

Let me see how long Google takes to do it now :)

2.04 seconds! Nice :) - http://i.imgur.com/IhPTpr6.png

that took 30+ seconds 'back in the day'.

reply

[–] dispo001 link

Also long long ago... I was watching a video of a guy channeling aliens. Someone in the audience asked what would be the next big thing for humanity. I immediately typed the query into Google. It showed the page head.... 20-30 seconds later it came up with a single published paper about "nuclear magnetic resonance identification"

Today it resolves instantly and finds at least 10 publications from before 2000.

reply

[–] sova link

Is it really technically correct to say that Google was performing web-wide joins on data? Isn't it all about clever indexing?

reply

[–] logicallee link

There's nothing to index. How could it have found my Shakespeare quote via an index? It consisted entirely of words 'from what it is to a' but produced only the Shakespeare quote. I don't see how it could have indexed anything.... it must have done a join. (Which makes sense given the 30+ seconds I had to sit and wait before it returned its answer, while also reporting the time it took to produce it. What else could it have been doing?)

By the way I believe I wanted to know whether it would return the Shakespeare quote at all.

If you mean that it might have cached the results of the query, I doubt anyone else queried that exact phrase, other than me.

reply

[–] andrewstuart2 link

Google does index pages, in the database sense. An index in the database sense is nothing more than reorganizing data (or subsets of data) into structures optimized for searching and seeking, rather than full scans.

I'm guessing you're most familiar with btree indexes as present and default in many SQL solutions, which are good for quickly answering exact, greater/less matches. There are dozens of data structures useful for indexing, some of which are built to index full text documents. For an example, check out the gin and gist indexes in Postgres [1].

It's my understanding that database indexing and index compression was a primary differentiator Google excelled at from the beginning. They could beat others at fractions of the typical cost because they didn't need data centers to store and query huge quantities of documents.

Seriously, there's no way even Google could intersect the sets of all crawled web documents containing those individual words in 30 seconds, much less two seconds.

[1] https://www.postgresql.org/docs/current/static/textsearch-in...

reply

[–] logicallee link

>Seriously, there's no way even Google could intersect the sets of all crawled web documents containing those individual words in 30 seconds, much less two seconds.

I believe you're mistaken. What I've heard is that for every word, Google has a list of every web site that contains that word - they've flipped the database. So, I believe, if you search for (without quotes) neanderthal violet narwhal obsequious tandem then -- and I just did this query, which took 0.56 seconds, but decided to remove some of the words, so it can get it me results. When I did plus signs, making my query +neanderthal +violet +narwhal +obsequious +tandem it said it worked 0.7 seconds to determine that in all of the entirety of the Internet, there is not a single document that has those 5 words on it.

How do you think it determines in 700 ms that all of the sites it has indexed on all of the Internet does not contain those 5 words anywhere on it?

The answer is that it has a rather short list of sites that contain the word narwhal, which it then intersects with the somewhat larger list of sites that contain obsequious and so on. 700 seconds is plenty fast when you take that approach.

so, this explains why joining stop words (which consist of billions of pages, each) takes so very long.

using stop words it is easy to make queries that take one or two seconds each.

reply

[–] Benjammer link

>There's nothing to index

Huh? What do you mean? Google indexes HTML web page content from the entire public internet using web crawlers...

reply

[–] logicallee link

I'm confused. By "clever indexing" I thought they meant, in the database sense of the word.

The reason my search took 30 seconds is because it started by getting a list of every site with "from" on it, every site with "what" on it, and so on, intereseecting them all. That's how it ended up finding my quote. how else do you think it did it?

----- edit:

to find the string "from what it is to a" which occurs only hidden in the middle of shaespeare's texts -- what do you think they do?

In my opinion they combine the list of sites that have every word - starting with the least common ones. It's easier if you search for something that has a few uncommon words. Then you start with a small list, and have to combine it with other small lists.

When every word in the phrase has billions of sites (there are billions of pages that have the word "to" on them, same for "from", "what", "it", "is", "a"), you have to combine them all. Then you have to do a string search within the resulting set, since I put it in quotation marks. There is no easy strategy. Hence the long search time.

what else could they be doing?

reply

[–] Benjammer link

I'm curious how else you think large-scale data is stored other than in an index in the database sense of the word as well. You think Google has some kind of massive heap-like, unstructured data-store that they run search queries against? That doesn't make sense to me, but I've also never worked in global scale web search, soooo idk.

reply

[–] Benjammer link

You said "There's nothing to index," as if Google is making web requests to every domain in existence, parsing the document responses, and seeing which sites have these words on them, all at runtime when you type a search query. Google obviously indexes the web in the sense that they store their own cached versions of web pages "locally," on top of which they then build an insanely complicated, web-facing, search architecture.

reply

[–] logicallee link

we're talking past each other. sova referred to this meaning - https://en.wikipedia.org/wiki/Database_index when they said "clever indexing."

the sense you mean is a different sense of the word index - meaning, to crawl. Yes, of course it does that too.

reply

[–] sova link

I was not referring to database indexes. That is not pertinent here. I was thinking about the index that Google creates, its locally cached version, that it queries. If you have a locally cached version, you are not going to rifle through them one by one until you find matches, nor are you going to rifle through them and find partial matches and then intersect them all to see if any overlap in your final product. Among other weird assumptions, that final method assumes there is a solution for every query.

Google, no doubt, has a very sophisticated way of querying against their cache of the WWW and it has probably evolved over time. However, it is inappropriate to say Google does a join over the entire internet for one query. It is much more reasonable to say that Google checked your query string against their gigantic index of terms, and it took a while to dig that deep into the pile. The performance hit such a complex query takes is more like unzipping a large archive to get a specific megabyte's worth of info, rather than saying it smashed all the files together and then searched for the exact term like notepad.

Anyway, think about it for a while, it's clearly a cool issue in search, and programs and algorithms do not have to visually search things as humans must.

reply

[–] falsedan link

> what else could they be doing?

I recommend reading the Stanford paper[0] (page 12), which spells out in a lot of detail exactly what they were doing.

In short, your pathological query would have searched for every document which contained one of your words, discarded those which didn't match all, and then sorted by word proximity. I expect for a literal phase search, there would be a final pass to look for the exact phrase in order.

[0]: http://ilpubs.stanford.edu:8090/361/1/1998-8.pdf

reply

[–] teen link

I think it indexes the entire string, no? or would that be too many combinatorics, idk.

reply

[–] sova link

Yes, it must, let's be exhaustive because we can and because the problem is solvable and because someone must do it.

Have you accidentally searched Google for a long URL and seen it come up? It is actively caching stuff all the time, and that cache just grows and grows, and it must be a pretty beautiful megastructure that you can run queries against.

reply

[–] falsedan link

Google never had stop words. The original lexicon only included the most popular 14 million words (for fast bucketing), and rare words were processed specially.

reply

[–] logicallee link

It did - I had to use plus signs to force them to use them.

Normally it ignored those words. I am fairly certain of this detail. I must have found a list of those words - how else would I have found the string "from what it is to a"? I had a list of its stop words.

Edit: for proof, here's someone's screenshot of the same - http://farm3.static.flickr.com/2270/2201828252_45a32da7f4.jp...

As you can see, it is Google saying it is ignoring a word because it is too common. It has a list of every site that has that, but that list is huge and it doesn't usually use it.

reply

[–] falsedan link

Stop words are words that are ignored when indexing, not when querying. Since you did find a result, those words must have been indexed.

reply

[–] SamBam link

Interestingly, I wonder if it cached your query.

My same query as you took 0.3s, but if I stripped out one word ("From what it is to") it took 2.2 seconds.

reply

[–] logicallee link

of course it cached my query. :) try it again in a few weeks.

reply

[–] RugnirViking link

I managed to beat that query with "The thing is what it is"

reply

[–] ClassyJacket link

>and he/she came into the thread to kindly ask us to stop messing up their statistics.

Shouldn't someone with a job in statistics know how to account for outliers?

reply

[–] annnnd link

And also, wouldn't they be interested in getting those queries so they could either fix their performance or block them?

reply

[–] developer2 link

The query was already published on Quora. I assume that Google did patch the issue. They were simply requesting that it not be posted in a public forum, resulting in the potential for denial of service attacks. It wouldn't have been a statistician making a fuss about their pretty graphs being ruined. The problem was the very real performance impact the queries were having on the service, as thousands of visitors copy/pasted from Quora to see for themselves.

This is why companies like Google have bounties for such things. "Please submit bugs and performance issues privately so we can patch them before you disclose the details publicly and hurt our services - we'll even pay you for your discretion!"

reply

[–] developer2 link

This particular issue was posted on Quora, where anyone could pick it up and participate in what is essentially a denial of service attack (whether or not performed intentionally). It wasn't submitted as a private bug report to Google so they could fix the issue. It was spread in a public forum. I think it's fair for Google to politely ask "a few of your own tests to validate an issue you will submit as a bug report is fine, but please don't disclose to the public until we patch it."

When you operate at the scale of Google, everything is expected to be airtight; outliers should not be possible. It wouldn't surprise me if their monitoring systems are built without the ability to "massage" (ie: manipulate) statistics, as it is a terrible practice. I don't think a statistician who relies on ignoring outliers would last long working for Google. They're not doing their job if the only thing they care about is silencing warnings to make pretty graphs that falsely show everything is running smoothly. Their job is to work with the truth - not manufacture little white lies to appease management.

reply

[–] londons_explore link

Boss: Median latency is 100ms and 99.9th percentile latency is 1 second.

Nobody ever asks about that 0.1%...

reply

[–] developer2 link

When that 0.1% - or even 0.001% - are 5-60 second requests, you have a bomb waiting to go off. There really is a massive difference when you are operating at the scale of Google. If the median is 100ms, the maximum acceptable time - 100th percentile - is likely below 200ms. A three nines percentile that is 10x the median isn't a good thing at large scale. Perfect consistency is more important than statistics. A small scale service deployed on my-little-unused-tool.com that receives a few requests/minute is an entirely different ballgame.

reply

[–] Confusion link

When it's tried often enough by enough readers of the question, it's not an outlier any more.

reply

[–] dsacco link

Do you have a link to that? I'd be interested in reading it his response and I can't see it by searching Quora.

reply

[–] vochinch link

Nice! Do you still have the link to the Quora question or an example of the query?

reply

[–] kw71 link

Something to show for effort. This matters a bit.

One of my young-adulthood colleagues went on to be an early googler who is influential in relevant policies. We built our relationship sharing bugs and analysis techniques. Quite a few years ago some scoundrels whose trust I gained proudly showed me how they were using youtube links to drop malware. Since my old mate worked there, I mentioned it and they were quite interested.

We hadn't shared anything in a while, both of us demonstrating loyalty to our employers and not talking about work details. I said that it would be really cool to have a one dollar check from Google for a bug report. I probably offered to send something cool from my workplace too.

They said, "We don't pay for bugs" Fifty cents? "We don't pay for bugs!"

I felt like I was simply after a piece of paper and the evildoers were a mildly useful source, but I could easily do without them and the souvenir would have been treasured.

I was unreasonably miffed that I couldn't get that piece of paper, though. So I reviewed the links I'd collected and passed some general information but withheld details that would be obviously unique to these attackers. They expressed disappointment with me the next time we spoke. It turns out that what I gave wasn't specific enough to easily identify the lame cross site exploit, despite my actual intent to lead them to the bug.

Interesting they have a bounty program now.

reply

[–] ChuckMcM link

Nice catch. A long time ago the services on the backend were killed by a special URL. And someone found it, and it wasn't filtered by the front end. And of course someone tried to use it, but it never returns since it kills the service, but their client retried ... it was a lot of "what the heck is happening" going on until SRE figured it out and then they immediately patched the front end and the anomalies stopped. It is too bad the person who caused it didn't file for a bug bounty like this person did, they probably would have had something to show for their efforts besides "hey look at this funny thing you can do, oh wait it doesn't do it any more."

reply

[–] developer2 link

That bug is extremely common, and the source is always the use of soft-deletes in the database. When you view the list of items (ex: inbox), the database query includes a "WHERE deleted = false" to exclude rows which have been soft-deleted. When viewing a single item (ex: message) the URL contains a unique identifier, whether an auto-increment integer, UID, etc. The query used to load one item is "WHERE id = :id" instead of the correct "WHERE id = :id AND deleted = false".

Managing soft-deletes on a database table requires an attention to detail, with every single query ever touching that table, that many developers lack the discipline to handle. Discipline aside, it's difficult for every developer on a team to remember which tables use soft-delete, and when checking that flag is or is not necessary. Finally, ORM abstractions often automate soft-delete in such a way that makes it exhausting for developers to validate every query. I've seen this bug over and over again at every company I've worked for. Happens so often it's impossible to keep count.

reply

[–] elmigranto link

> Managing soft-deletes on a database table requires an attention to detail.

> Discipline aside, it's difficult for every developer on a team to remember which tables use soft-delete, and when checking that flag is or is not necessary.

That's the case where instead of "try harder not to make mistakes", you design a system so it is not possible to make them. One way would be to rename original table `raw_messages` and `create view messages as select * from raw_messages where not deleted`.

reply

[–] endgame link

One of the problems with ORMs is that because it lets people forget about the annoying details of their databases, it also makes the forget the useful details of their databases.

reply

[–] developer2 link

Damn you! I wrote a bloody essay in a reply[1] to explain, in superfluous detail, what you summarized in one sentence. Anyone with basic knowledge of the topic would know what you mean. I need to figure out this magic people like you possess. I'm tired of rambling, when nobody will read it. Thank you for the incentive to improve.

[1] https://news.ycombinator.com/item?id=14374031

reply

[–] endgame link

That's the kindest spontaneous compliment I've received in a while. Thank you. But: while the pithy comment might farm more imaginary internet points, the essay may actually teach a lesson to the person who doesn't yet get it.

As for writing: it's not magic, but for me it's not consciously applied processes either. If I had to guess how my earlier comment came about, I'd suggest something like this as a generative process:

1. Find two effects with a common cause (provided upthread). 2. State each effect, sharing words and rhythm to bring out contrast. 3. Omit needless words. (Thanks, Strunk/White!)

HTH.

reply

[–] stickfigure link

I don't disagree that this is common, but it doesn't require special skills or extra attention to detail. Your code to load the 'thing' should always use a Loader (or whatever you choose to call it) - some abstraction that loads the thing and checks for permission. It's really not that hard, and it's the bare minimum of competence I expect from a junior web developer. If you have littered your code with SQL statements, you're almost certainly doing it wrong.

reply

[–] developer2 link

The majority of incidents I encounter with this bug occur specifically because of the complexity introduced with abstracted query building via ORMs and DBALs. Developers assume their configuration, such as appending a "deleted = false" clause, applies to every query without manually verifying each one. Yes, it's technically the developers' fault for not understanding how and when the abstraction kicks in, but that doesn't mean it's "simple", or that these cases are avoided.

I've seen abstraction layers where it's impossible to add a default clause to every SELECT query for a model. I've seen other abstractions where "AND deleted = false" can be automatically added to every SELECT query. I've also seen abstractions where that clause is added to all SELECT, UPDATE, and DELETE queries.

Here's a list of problems:

a) Developers bypassing the model, executing a complex JOIN that includes the table in question, and forgetting they need the "deleted = false". Most complex queries wind up being written as raw SQL or a parsed variant, that never executes the model behavior to append the "AND deleted = false" clause. Is it "wrong" to bypass the model? Most of the time, yes! But it happens every day. We're talking about what happens in reality, not what should happen in an ideal fantasy world.

b) Developers missing the case where they should be including soft-deleted rows. When the abstraction layer enforces a "deleted = false" on every query, it can be difficult or impossible to force backtracking to include soft-deleted rows. Back in the MyISAM days (before foreign keys), I found an "account deletion" mechanism that executed a "DELETE FROM messages WHERE userid = :userid AND deleted = false" - soft-deleted rows were not deleted when required, because an abstraction layer excluded soft-deleted rows in a DELETE query and the original developer never noticed.

c) What happens with UPDATE and DELETE queries? I've seen abstractions that only append the soft-delete mechanism to SELECTs, and others that also affect UPDATEs and DELETEs. Again, should every developer on a codebase understand in which situations the abstraction kicks in? Yes. The fact is they don't, because abstractions inherently make developers not inspect the behavior of their code as deeply as they should.

I don't remember soft-deletes being an issue at all - literally non-existent - 10 years ago, when all SQL queries were typed out by hand. When you're forced to write the query yourself, you have time to think about what you are doing. When you delegate the majority of the task to an abstraction layer that magically modifies your queries on the fly, bad things happen. The most stable and maintainable code base I ever worked on had every single query in XML files. It sounds tedious and bloated, as if it's a joke about the "old days", but every query was located somewhere where it could be analyzed, and you actually had to use your brain when writing a new query. I've seen nothing but misery since the introduction of abstracted ORMs and DBALs, where the only way you ever see the queries being executed is in debug dumps and logs.

>> competence I expect from a junior web developer

Sadly, more than half of the senior developers I've met can't handle soft-deletes properly. So no, in the real world, this cannot be expected of junior developers.

reply

[–] RandomBK link

It's issues like this that really highlight the benefits of shuffling deleted data to a separate archive table through triggers, or leveraging temporal tables. It may not necessarily be as efficient as maintaining a flag, but it dryastically reduces the mental overhead placed on users of the database.

reply

[–] a_imho link

OTOH from a pro user pov deleted means removed and not made inaccessible. If someone request stuff to be removed, imo it should be removed from archives as well.

reply

[–] annnnd link

You should also publish it. Not out of revenge, but to alert customers of the quality of the software they are using and to put additional pressure on Wickr to fix it. Project Zero does that too (as do most of other security researchers). Of course, you should give them enough time (90 days?) but after that it's publish time...

reply

[–] idonotknowwhy link

I found a bug in wickr where I can re-read "deleted" messages. I submitted it, answered their teams questions about reproducing it. A couple of weeks later, they said they can't fix it and didn't pay me :(

I got all my wickr contacts to switch to signal, which is much less buggy...

reply

[–] jcims link

Having worked on a large bounty program myself, and having at least one thing blow up because I dropped the ball on a response, I'll just say that the front-end aspect of it can be extremely chaotic. This guy seems like he's pretty polite and patient, which you generally try to reward with a rapid response and high touch, but sometimes you can get overwhelmed with a burst of reports, distracted by problematic reporters and bogged down by working the bug through the pipeline.

There are systems and processes to help with all of this of course, but at the end of the day it's still a pretty tricky job to get perfect all the time.

reply

[–] daddyo link

What's the general signal to noise ratio for bug reports?

reply

[–] dsacco link

About 10:1 noise:signal.

This comes from a variety of experiences: I used to manage a bug bounty for a mid-size company on Bugcrowd; in 2014 I surveyed people managing a bunch of programs across different sizes; I've participated in bug bounty programs for companies of different sizes.

The more you offer for rewards and the more recognizable your company name, the more you will be spammed by people submitting reports like (I kid you not): "You have the OPTIONS method allowed on your site this is really serious." The last time I looked at the numbers, Google had over 80,000 bug bounty reports per year, with about 10% of them being valid and maybe another order of magnitude being high severity (I'm fuzzy on the last bit). It's probably over 100,000 per year at this point. It's not uncommon for recognizable but smaller companies to receive one or more per day.

I'm aware of full-time security engineers at Facebook and Google who do almost nothing but respond to bug bounty reports. It's a lot like resumes - people who have essentially no qualifications, experience or (most importantly) a real vulnerability finding will nevertheless spam boilerplate bug reports to as many companies as they can. Take a look at the list of exclusions on a given program - you'll see that many of them explicitly call out common invalid findings that are so ridiculous it's kafkaesque.

HackerOne and Bugcrowd provide a lot of technical sophistication to prime companies for success, but there is an organizational component that is very difficult. If your program is very active, it requires dedication to tune it so you're not flushing engineer-hours away responding to nonsense. This is not to say they're bad - quite the opposite, I think they're fantastic. But I generally recommend smaller companies set up a vulnerability disclosure program through a solid third party, and do so without a monetary reward until they can commit to dealing with a reasonable deluge of reports.

reply

[–] thaumasiotes link

My favorite bug bounty report so far read, in its entirety, "try it ASAP".

reply

[–] arkadiyt link

I've received reports for things like "source code disclosure" where they link to our jQuery.

reply

[–] illumin8 link

LOL - I'd like to report that I was able to download the entire source code of your website by right-clicking and selecting "View Page Source..."

reply

[–] thaumasiotes link

If only that were true... modern web pages frequently have basically nothing of any value in the page source; it's all dynamically loaded.

reply

[–] sundvor link

~10% valid submissions still sounds like a fantastic number to me. Sure you have to sort out the bad ones, but it's still a solid stream of valid reports.

reply

[–] jcims link

It's a pain when you're in the thick of it, but it really is a great way to round out your security program. There's an astonishing number of incredibly skilled and motivated folks out there, and a well-run bounty program can create a nice symbiotic relationship that benefits both.

One other thing that never really gets any press is the fact that a good chunk of the folks sending in reports are young people in impoverished nations. Some of them can be pretty tricky to deal with, but if you hold a hard line on professional expectations you can see them flourish in pretty short order to be some of the best reporters out there.

I only spent a short amount of time on the program I was with, but it was very rewarding. A+++, highly recommended.

reply

[–] sundvor link

That's great about the young people! Thanks for relating this.

reply

[–] jcims link

Whoever runs this definitely works in a bug bounty program:

https://twitter.com/cluelesssec

reply

[–] martenmickos link

Generally in the world of bug bounty programs, the signal-to-noise ratio (SNR) is around 10-20%.

Even at this low rate, it is not too bad. Let's say you receive 10 reports. You can relatively quickly identify the 8-9 noisy reports to find the 1-2 valid ones. Of course, a higher SNR is always better. It saves you time and effort.

On HackerOne, the average SNR across all programs is over 30%. The platform can automatically filter out certain reports that are duplicates or out of scope.

The platform maintains an average signal rating for each hacker (aka security researcher). Companies can limit access to their programs to hackers with a certain signal or higher. This will significantly increase SNR for the program.

Companies can also opt for a HackerOne program with triage included, in which case the SNR rises close to 100%.

reply

[–] txutxu link

> The platform maintains an average signal rating for each hacker (aka security researcher). Companies can limit access to their programs to hackers with a certain signal or higher. This will significantly increase SNR for the program.

So if a new user of the platform, finds a valid or high impact bug, will be unable to report... less noise but a high value bug unreported in that case...

reply

[–] robbiemitchell link

Is there a ticketing system of some kind in play there? I imagine there would be steps like "Respond to user" before resolving/closing.

reply

[–] dingaling link

I'll don my corporate hat and say: those are completely unacceptable excuses for poor client communication.

I'd perhaps accept 'I was so deep in the code!' once from a very junior developer, as a learning experience.

To the person that reported a bug, that one report makes or breaks their entire opinion of your organisation. We lost customers because of poor communication, and on the other hand made some very happy repeat-customers even when we had to say 'we can't fix that yet' - but they were in the loop for the whole process and understood why.

reply

[–] ArlenBales link

> 10/02/2017 – Google already fixed the issue but forgot to tell me … I contacted them asking for an update

> 19/02/2017 – Got a response, they implemented a short-term fix and forgot to sent my report to the VRP panel …

I hope Google forgetting to follow up on bug bounties and needing to be reminded isn't a common occurrence.

reply

[–] komali2 link

I'm similarly surprised we haven't heard of a AI augmented fuzzer that's been unleashed on random domains to just "try shit out." Seems like a good way to find weird little bugs. Then again, the scope of the "problem" is so massive, and the "rewards" (shit to flag as "yea check this out more") so vague, I don't even know how you'd begin.

reply

[–] undefined link
[deleted]

reply

[–] em3rgent0rdr link

If the good people don't do it soon, the bad people will...

reply

[–] TeMPOraL link

Or the curious one. Just make a point&click version of such vulnerability scanner and post in on Reddit; you'll have half of the Internet scanned in no time.

reply

[–] chrismarlow9 link

not really, there's enough bad schemes that already work to bother with fancy technical exploits

reply

[–] asperous link

I think that exists! It's called a vulnerability scanner. Maybe they could be smarter.

reply

[–] wingerlang link

Wouldn't those simply scan and try for already known vulnerabilities? I think the point of the AI would be to look for unknown ones.

reply

[–] chrismarlow9 link

Its called a fuzzer. many of them have plugin frameworks where you can tensorflow your heart out.

reply

[–] xyzzyz link

Sure, people thought of it -- Google even sells it as a product, Cloud Security Scanner[1]. The internal version has been running on internal sites for a long time now.

[1] - https://cloud.google.com/security-scanner/

reply

[–] Macuyiko link

Very interesting. Does this really implement some intelligence/learning, however? Or is it just going over a list of known vulns like most scanners do?

reply

[–] wslh link

It is an interesting subject to research but not easy. Finding and the exploiting a bug is art and science.

Augmenting fuzzying with AI is an interesting approach.

reply

[–] Macuyiko link

So I was thinking recently... with Google (amongst others, of course) themselves pushing towards AI applications, it seems to me that many of these less-advanced* bounty hunts might perhaps be able to be automated with a fuzzer+scraper+AI based approach. The fact that bug bounties are still being awarded does suggest that this is not that trivial, however, but might still be fun to explore nonetheless. I.e. can one train an agent that goes off and tries this sort of things autonomously? Might be fun to translate the HTTP intrusion domain into a deep learning architecture.

Similar things are being applied on the "defensive" side of things already anyway (i.e. Iranian, Turkish, Chinese firewall systems using machine learning to identify and block new patterns), so why not apply this on the offensive side.

*: Not to demean the author in any way; I understand that putting the time in to explore these things is easier said than done in hindsight.

reply

[–] terminalcommand link

I am a bit jealous :).

I also did a subdomain search on google a few weeks ago. I stumbled upon a lot of login sites.

A subdomain search leaded to 95 subdomains under corp.google.com.

There is some strange javascript in those pages, there is a function called riskMi.

I don't want to get sucked into it, I'm also closing the tab and going back to my terminal :).

reply

[–] rprime link

Indeed, or sometimes I want to try certain attack vendors and the next second I am thinking I am fooling myself, they're smarter than me, they wouldn't leave such bugs in, queue a few weeks later, someone gets a few $k because they let themselves sucked into it :D.

I guess it's as much mindset as it's skill.

reply

[–] Ajedi32 link

I got a bug bounty once because I reported a bug in Chrome that someone else was complaining about in the comments section of a tech blog.

If instead of just complaining that commenter had taken the time to fill out a bug report they could have easily gotten the bounty instead.

Sometimes it just takes a tiny bit of extra effort to go from noticing something's amiss to actually doing something to get it fixed.

reply

[–] rootsudo link

Good idea. Imagine if you can do one bug report a month. 5K is nice income.

reply

[–] dsacco link

What was the security issue?

reply

[–] Ajedi32 link

CVE-2015-1274

Basically, Chrome allowed users to use the "Always open files of this type" option with executable files. So if anyone was ever foolish enough to set that option after downloading a `.exe` on Windows, any future site they visited could take over their machine just by initiating a download for a malicious executable.

reply

[–] throw9912 link

How did you subdomain search? Was it a brute force / dict search?

reply

[–] schwag09 link

Shameless self-plug: You can use Fierce! A DNS reconnaissance tool - https://github.com/mschwager/fierce

reply

[–] yeukhon link

DNS recon tool should be able to do it. If you look around for DNS online tool couple dozens of Google subdmains will be revealed. With Certificate Transparency this kind of information is not as secret as it used to be. Last year there was a vulnerability I forgot which it was quite big has something to do with a legacy software and it led me to look at what domains are using my company's cert. Qualy's made a tool out of this.

reply

[–] dsacco link

DNS search. You can use a tool like fierce or subbrute if you're lazy.

reply

[–] microcolonel link

riskMi is probably from CA Technologies RiskMinder™.

reply

[–] Buge link

A few weeks ago? It says a temporary fix was done by February 10.

reply

[–] rprime link

I discovered the same error/bug a few weeks ago when a co-worker linked "this weird page" to me, I just looked around and thought it's pretty cool too see that part of Google and didn't thought too much of it, closed the tab and went back to my Terminal. :)

reply

[–] arnioxux link

In at least two other companies I've worked at we also use query params to enable debug information on live production sites. At one of those companies the only requirement was that you be on a corporate ip address but it actually still works if you're on our guest wifi.

reply

[–] joatmon-snoo link

It's a very low percentage, somewhere in the low single digits (if not an even lower order of magnitude). Still high enough for it to be worth paying engineers to maintain the backwards compatibility :)

reply

[–] carvalho link

Good catch! Also studied Google's 404 pages. Seems like they have unified all but a few of them. One of them I found was vulnerable to old utf-7 injection (specified customizable page title before character encoding) and another was vulnerable to XSS. Got a bounty for the XSS one, the utf-7 one targeted too old browsers, out of scope for the program (I do wonder how many IE6 users Google sees).

reply

[–] eslaught link

The page appears to have about 70 characters on a line:

    >>> len("which is nothing more than a simple login page (seems to be for Google")
    70
This is within the generally accepted guidelines for line length:

https://www.google.com/#q=characters+per+line

The bigger issue is, I think, font size. I could imagine that on certain displays this font might look rather small.

reply

[–] derefr link

Looks like a mobile layout that is "responsive" in higher resolutions only by adding some extra elements to surround the fixed-width central container.

reply

[–] microcolonel link

I prefer narrow columns for reading personally. Snap the article window to the side and scale it down for maximum enjoyment.

reply

[–] lmm link

You can make any site have a narrow column like that - the site should give those of us who prefer wider text a way to get that too.

reply

[–] komali2 link

Offtopic: What's with the hyper narrow width on this page? Looks like this on a 1440p monitor (ubuntu, chrome) http://i.imgur.com/m9YWcNj.png

reply

[–] lucb1e link

Haven't read one of those in years. Did one come by on HN recently?

reply

[–] frenchie14 link

This one comes to mind: https://news.ycombinator.com/item?id=14166966

> Raneri questioned my motivation and I said that I want to give the vendor ample time to resolve the issue and then I want to publish academically. He was very threatened by this and made thinly veiled threats that the FBI or other institutions would "protect him". Then he continued with statements including "we want to hire you but you must sign this NDA first." He also recommended that I only make disclosure through FINRA, SDI, NCTFA and other private fraud threat sharing organizations for financial institutions.

reply

[–] netheril96 link

Such a refreshing story after countless of security researchers get threatened or sued when they report security vulnerabilities to the company that should have thanked them instead.

reply

[–] nickcw link

I found a bug in Go which turned into this CVE

https://www.cvedetails.com/cve/CVE-2015-8618/

I applied for a bug bounty, but alas was turned down as Go isn't a Google service and it wasn't in scope for the Patch Reward Program.

I did get into the hall of fame though!

reply

[–] louprado link

Here's a link to the Google Security Rewards Program

https://www.google.com/about/appsecurity/programs-home/

reply

[–] Artemis2 link

Can you tell what SFFE means? GFE is the Google Front End but I can't find much about SFFE.

reply

[–] mh- link

fairly sure it just stands for "staticfile frontend".. nothing too exciting.

edit: some support for that at https://bugs.chromium.org/p/chromium/issues/detail?id=548688...

  /bns/pe/borg/pe/bns/static-on-bigtable/staticfile.frontend.sffe/93

reply

[–] rachelbythebay link

So if you just want to serve 5 TB... there you go.

reply

[–] slashcrypto link

Would be very interesting to know what SFFE means ;)

reply

[–] SparkyMcUnicorn link

I'm pretty certain it's a server name.

It's a similar convention as some of their other names such as GRFE (Groups), bsfe (Blog Search Front-End).

https://googlesystem.blogspot.com/2007/09/googles-server-nam...

reply

[–] CiPHPerCoder link

Static File Front-End?

reply

[–] puzzle link

The hostname is static.corp and the service is static-on-bigtable, so...

reply

[–] GauntletWizard link

As a Xoogler who misses the debug stack, this was some fun nostalgia. Good catch!

reply

[–] undefined link
[deleted]

reply

[–] roemerb link

Am I the only person who had to enlarge the page to read the article? Nice catch though.

reply

[–] dmead link

thefacebook.com redirected to facebook.com/intern/vod up until last week.

reply

[–] hackjam600 link

We do supply high quality medical marijuana (indoors and Outdoors) with great Sativa and Indica strong THC and CBD strains, Our products are of the grade A quality type

Such as

White widow

Purple haze

Afghan kush

Durban poison

And lot more Contact: hackjam600@gmail.com

reply

[–] hackjam600 link

We do supply high quality medical marijuana (indoors and Outdoors) with great Sativa and Indica strong THC and CBD strains, Our products are of the grade A quality type

Such as

White widow

Purple haze

Afghan kush

Durban poison

And lot more Contact: hackjam600@gmail.com

reply

[–] swetabhsuman8 link

Secrets of Super and Professional Hackers | Download free Hackers HandBook http://hackernucleus.com/secrets-of-super-and-professional-h...

reply

[–] dsacco link

I wish I could flag this.

There's invariably one comment like this on bug bounty stories here. One comment that isn't happy with the bug bounty result even when the researcher is and goes off the rails with a weird anti-large company bias and some conspiracy.

This has absolutely nothing to do with your Nexus RMA story, or your cloud SLA story from downthread, or whatever other agenda you have against these large companies.

Do you have any concept of it's like to run a bug bounty at Google's size? Have you ever been involved in managing one? Have you ever participated in one? Can you qualify any of your opinion with something aside from these irrelevant grievances you're throwing out?

You're not contributing to the discussion at all, you're just hijacking the thread so you can perpetuate your soapbox. Human beings make mistakes and bug bounties are an easy place to drop the ball. No one is trying to cheat security researchers out of their rewards.

reply

[–] Buge link

Some companies do worse than attempt to cheat people out of rewards: they threaten to sue them. But Google doesn't do that.

reply

[–] foota link

I find it much more likely that they just dropped the ball on paying them, rather than maliciously trying to deny them a bounty.

reply

[–] mickrussom link

Try arguing with them about an SLA violation they had with their cloud infrastructure. The game is setup where they can chisel and have these guffaw and aw garsh moments but every adword and every billable second on their cloud will get paid hell or high water.

reply

[–] dsacco link

Did you seriously come to hijack a story about a Google bug bounty so you could complain about an irrelevant issue?

reply

[–] lucb1e link

If I'm being completely fair, how often does HN as a whole discuss related topics or cases in the comment thread? The best example is probably that people kept mentioning Reader shutting down whenever they launched something new, a la "not going to use this, see what they did to Reader".

Not sure it's entirely the same as this case, but hijacking threads with "remember when this happened?" is not that unique.

reply

[–] richforrester link

Sounds like everyone got what they wanted.

Also sounds like you were dealing with humans, who (before escalating) followed company protocols.

Also seems not too far-fetched that those protocols are in place to weed out the whatever% that make bogus claims.

All in all, nothing I'd call "truly sad", "evil" and whatnot. Just big businesses being big businesses.

reply

[–] afpx link

Curious why you have such high tolerance for this type of behavior, since it appears that you don't actually approve of it.

reply

[–] dsacco link

I can't speak for the parent, but I have a tolerance for it because I've actually managed bug bounty programs; I also know Google receives on the order of 100,000 bug bounty reports annually, while a full order of magnitude less are actually valid.

reply

[–] richforrester link

Similar. Experience tought me that having patience for bad situations leads to more understanding and easier solving of issues.

The fact is that the more people are involved in a process, the slower the cogs turn.

Having patience is more of a defense mechanism than it is anything else. It has served me well and made life easier!

reply

[–] undefined link
[deleted]

reply

[–] mickrussom link

Nice , glad he got paid. Whats truly sad is they did try to chisel him out of his bounty if you read the timeline, he had to prod them to get his bounty. Can any of these top-ten companies that make like 200 million per day (google/alphabet, amazon, microsoft) ever do anything in good faith? One time google stiffed me initially on an RMA with a nexus phone until I stamped my feet a bit complaining that a company that makes 200M/day is stiffing me, the loyal android lover, for a $200 hunk of chinese-made plastic. (and just a short while later with things like SafetyNet I can no longer have root on an $800 pixel that isnt even waterproof). So much for dont be evil. Well, they dropped that in favor of -"do the right thing—follow the law, act honorably, and treat each other with respect" yeah, they can respectably be cheap chiseling money grubbers as long as its legal - like helping locate Falun Gong members in China. And the corporate stewardship -in the hands of the employees - they get blinded by the huge pay and instantly fail to see the bad things that go on. Anyways, glad he got paid. Amazing that a company with 57000 employees (and TONS of contractors, they dont want more employees who get benefits and stock when they can string contractors along) needs some random guy to find holes in the back door.

reply

[–] Buge link

What do you think the motivation would be to intentionally "forget"? To save money? If that's the motivation why not pay the reporter $500 instead of $5000 for a small info leak. It's illogical that they would intentionally "forget" then pay such a high bounty.

reply

[–] ncal link

"forgot"

reply

[–] nodesocket link

Where is this resentment and skepticism coming from? The facts say otherwise. Google is known to be receptive to bounties and payout.

reply

[–] to3m link

Most of their consumer-facing services are not, shall we say, famous for being high-touch...

But presumably different people look after the security side of things.

reply

[–] kristianp link

This story from yesterday has raised some resentment about google's support:

A startup’s Firebase bill suddenly increased from $25 to $1750 per month https://news.ycombinator.com/item?id=14356409

reply

[–] joatmon-snoo link

Their customer/enterprise support is notoriously sketchy, at least from the outside (a la HN) looking in.

That's not the same at all as their bug bounty program, which is generally one of the best out there.

reply

[–] sushid link

Google's known for their fast triage/response time. They're also fairly generous with their bounties.

reply

[–] carvalho link

Depending on the severity of your finding, your report could wake up a senior security engineer.

When your report is out of scope, Google will not ignore your report. When there is a non-serious bug, you get acknowleged in the bug report they file internally. Finally, when they can not replicate your finding, they will communicate that with you and stay patient until they can either replicate or close your report.

Edit: forgot to add that they raised the bounty with another 2k ("we updated our payouts") and they invited me to their Blackhat booth 1 year later.

reply

[–] i336_ link

Interesting. What'd you find? :)

reply

[–] carvalho link

Apparently I was part of a test where input sanitization was turned off. Reported and fixed before they could push it live.

Very "monkey on a typewriter". I was not even looking for security bugs, but studying usage of maia.css.

http://aster.or.jp/conference/icst2017/program/jmicco-keynot...

reply

[–] ensiferum link

I'm surprised that anyone at the big Corp actually bothered to even reply to this guy reporting the bug much less actually give him a bounty!

reply