On Quora someone asked what the longest search query time was. I was able to craft a query that took multiple seconds to complete. It used wildcards and undocumented iteration allowing one to stuff thausands of queries into a single query. Turns out it is someone's job to measure result response times, and he/she came into the thread to kindly ask us to stop messing up their statistics.
This is his answer: "I work on search at Google, and I have to say, very clever answers! Now, please stop. :-p"
I don't think that he did it because it's his job to stop random people on the Internet from running slow queries. I think she was just surprised how creative people are and found it funny.
Both a 'he' and a 'she'. Is that what it takes to pass the whiteboard b-tree reversal interviews these days?
It's Schrödinger's gender, it's both when unknown, until revealed.
Aka: Schrödinger's Googler
It's hard to believe, but years ago, back when Google had what was called "stop words" (like 'the', that it ordinarily ignored) I was able to make Google perform a search that took over 30 seconds.
The reason stop words take such a long time is that millions of sites have words like "the" on them, so doing a join on all those simply takes a long time.
My method to find a long string consisting entirely of stop words, was to just download a project gutenberg of the complete works of shakespeare, and find the longest string consisting of just stop words in there, then search for it as a literal quote.
The longest one I found was: "From what it is to a".
Let me see how long Google takes to do it now :)
2.04 seconds! Nice :) - http://i.imgur.com/IhPTpr6.png
that took 30+ seconds 'back in the day'.
Also long long ago... I was watching a video of a guy channeling aliens. Someone in the audience asked what would be the next big thing for humanity. I immediately typed the query into Google. It showed the page head.... 20-30 seconds later it came up with a single published paper about "nuclear magnetic resonance identification"
Today it resolves instantly and finds at least 10 publications from before 2000.
Is it really technically correct to say that Google was performing web-wide joins on data? Isn't it all about clever indexing?
There's nothing to index. How could it have found my Shakespeare quote via an index? It consisted entirely of words 'from what it is to a' but produced only the Shakespeare quote. I don't see how it could have indexed anything.... it must have done a join. (Which makes sense given the 30+ seconds I had to sit and wait before it returned its answer, while also reporting the time it took to produce it. What else could it have been doing?)
By the way I believe I wanted to know whether it would return the Shakespeare quote at all.
If you mean that it might have cached the results of the query, I doubt anyone else queried that exact phrase, other than me.
Google does index pages, in the database sense.
An index in the database sense is nothing more than reorganizing data (or subsets of data) into structures optimized for searching and seeking, rather than full scans.
I'm guessing you're most familiar with btree indexes as present and default in many SQL solutions, which are good for quickly answering exact, greater/less matches. There are dozens of data structures useful for indexing, some of which are built to index full text documents. For an example, check out the gin and gist indexes in Postgres .
It's my understanding that database indexing and index compression was a primary differentiator Google excelled at from the beginning. They could beat others at fractions of the typical cost because they didn't need data centers to store and query huge quantities of documents.
Seriously, there's no way even Google could intersect the sets of all crawled web documents containing those individual words in 30 seconds, much less two seconds.
>Seriously, there's no way even Google could intersect the sets of all crawled web documents containing those individual words in 30 seconds, much less two seconds.
I believe you're mistaken. What I've heard is that for every word, Google has a list of every web site that contains that word - they've flipped the database. So, I believe, if you search for (without quotes) neanderthal violet narwhal obsequious tandem then -- and I just did this query, which took 0.56 seconds, but decided to remove some of the words, so it can get it me results. When I did plus signs, making my query +neanderthal +violet +narwhal +obsequious +tandem it said it worked 0.7 seconds to determine that in all of the entirety of the Internet, there is not a single document that has those 5 words on it.
How do you think it determines in 700 ms that all of the sites it has indexed on all of the Internet does not contain those 5 words anywhere on it?
The answer is that it has a rather short list of sites that contain the word narwhal, which it then intersects with the somewhat larger list of sites that contain obsequious and so on. 700 seconds is plenty fast when you take that approach.
so, this explains why joining stop words (which consist of billions of pages, each) takes so very long.
using stop words it is easy to make queries that take one or two seconds each.
>There's nothing to index
Huh? What do you mean? Google indexes HTML web page content from the entire public internet using web crawlers...
I'm confused. By "clever indexing" I thought they meant, in the database sense of the word.
The reason my search took 30 seconds is because it started by getting a list of every site with "from" on it, every site with "what" on it, and so on, intereseecting them all. That's how it ended up finding my quote. how else do you think it did it?
to find the string "from what it is to a" which occurs only hidden in the middle of shaespeare's texts -- what do you think they do?
In my opinion they combine the list of sites that have every word - starting with the least common ones. It's easier if you search for something that has a few uncommon words. Then you start with a small list, and have to combine it with other small lists.
When every word in the phrase has billions of sites (there are billions of pages that have the word "to" on them, same for "from", "what", "it", "is", "a"), you have to combine them all. Then you have to do a string search within the resulting set, since I put it in quotation marks. There is no easy strategy. Hence the long search time.
what else could they be doing?
I'm curious how else you think large-scale data is stored other than in an index in the database sense of the word as well. You think Google has some kind of massive heap-like, unstructured data-store that they run search queries against? That doesn't make sense to me, but I've also never worked in global scale web search, soooo idk.
You said "There's nothing to index," as if Google is making web requests to every domain in existence, parsing the document responses, and seeing which sites have these words on them, all at runtime when you type a search query. Google obviously indexes the web in the sense that they store their own cached versions of web pages "locally," on top of which they then build an insanely complicated, web-facing, search architecture.
we're talking past each other. sova referred to this meaning - https://en.wikipedia.org/wiki/Database_index when they said "clever indexing."
the sense you mean is a different sense of the word index - meaning, to crawl. Yes, of course it does that too.
I was not referring to database indexes. That is not pertinent here. I was thinking about the index that Google creates, its locally cached version, that it queries. If you have a locally cached version, you are not going to rifle through them one by one until you find matches, nor are you going to rifle through them and find partial matches and then intersect them all to see if any overlap in your final product. Among other weird assumptions, that final method assumes there is a solution for every query.
Google, no doubt, has a very sophisticated way of querying against their cache of the WWW and it has probably evolved over time. However, it is inappropriate to say Google does a join over the entire internet for one query. It is much more reasonable to say that Google checked your query string against their gigantic index of terms, and it took a while to dig that deep into the pile. The performance hit such a complex query takes is more like unzipping a large archive to get a specific megabyte's worth of info, rather than saying it smashed all the files together and then searched for the exact term like notepad.
Anyway, think about it for a while, it's clearly a cool issue in search, and programs and algorithms do not have to visually search things as humans must.
> what else could they be doing?
I recommend reading the Stanford paper (page 12), which spells out in a lot of detail exactly what they were doing.
In short, your pathological query would have searched for every document which contained one of your words, discarded those which didn't match all, and then sorted by word proximity. I expect for a literal phase search, there would be a final pass to look for the exact phrase in order.
I think it indexes the entire string, no? or would that be too many combinatorics, idk.
Yes, it must, let's be exhaustive because we can and because the problem is solvable and because someone must do it.
Have you accidentally searched Google for a long URL and seen it come up? It is actively caching stuff all the time, and that cache just grows and grows, and it must be a pretty beautiful megastructure that you can run queries against.
Google never had stop words. The original lexicon only included the most popular 14 million words (for fast bucketing), and rare words were processed specially.
It did - I had to use plus signs to force them to use them.
Normally it ignored those words. I am fairly certain of this detail. I must have found a list of those words - how else would I have found the string "from what it is to a"? I had a list of its stop words.
Edit: for proof, here's someone's screenshot of the same - http://farm3.static.flickr.com/2270/2201828252_45a32da7f4.jp...
As you can see, it is Google saying it is ignoring a word because it is too common. It has a list of every site that has that, but that list is huge and it doesn't usually use it.
Stop words are words that are ignored when indexing, not when querying. Since you did find a result, those words must have been indexed.
Interestingly, I wonder if it cached your query.
My same query as you took 0.3s, but if I stripped out one word ("From what it is to") it took 2.2 seconds.
of course it cached my query. :) try it again in a few weeks.
I managed to beat that query with "The thing is what it is"
>and he/she came into the thread to kindly ask us to stop messing up their statistics.
Shouldn't someone with a job in statistics know how to account for outliers?
And also, wouldn't they be interested in getting those queries so they could either fix their performance or block them?
The query was already published on Quora. I assume that Google did patch the issue. They were simply requesting that it not be posted in a public forum, resulting in the potential for denial of service attacks. It wouldn't have been a statistician making a fuss about their pretty graphs being ruined. The problem was the very real performance impact the queries were having on the service, as thousands of visitors copy/pasted from Quora to see for themselves.
This is why companies like Google have bounties for such things. "Please submit bugs and performance issues privately so we can patch them before you disclose the details publicly and hurt our services - we'll even pay you for your discretion!"
This particular issue was posted on Quora, where anyone could pick it up and participate in what is essentially a denial of service attack (whether or not performed intentionally). It wasn't submitted as a private bug report to Google so they could fix the issue. It was spread in a public forum. I think it's fair for Google to politely ask "a few of your own tests to validate an issue you will submit as a bug report is fine, but please don't disclose to the public until we patch it."
When you operate at the scale of Google, everything is expected to be airtight; outliers should not be possible. It wouldn't surprise me if their monitoring systems are built without the ability to "massage" (ie: manipulate) statistics, as it is a terrible practice. I don't think a statistician who relies on ignoring outliers would last long working for Google. They're not doing their job if the only thing they care about is silencing warnings to make pretty graphs that falsely show everything is running smoothly. Their job is to work with the truth - not manufacture little white lies to appease management.
Boss: Median latency is 100ms and 99.9th percentile latency is 1 second.
Nobody ever asks about that 0.1%...
When that 0.1% - or even 0.001% - are 5-60 second requests, you have a bomb waiting to go off. There really is a massive difference when you are operating at the scale of Google. If the median is 100ms, the maximum acceptable time - 100th percentile - is likely below 200ms. A three nines percentile that is 10x the median isn't a good thing at large scale. Perfect consistency is more important than statistics. A small scale service deployed on my-little-unused-tool.com that receives a few requests/minute is an entirely different ballgame.
When it's tried often enough by enough readers of the question, it's not an outlier any more.
Do you have a link to that? I'd be interested in reading it his response and I can't see it by searching Quora.
Nice! Do you still have the link to the Quora question or an example of the query?
linked in https://news.ycombinator.com/item?id=14372977
Something to show for effort. This matters a bit.
One of my young-adulthood colleagues went on to be an early googler who is influential in relevant policies. We built our relationship sharing bugs and analysis techniques. Quite a few years ago some scoundrels whose trust I gained proudly showed me how they were using youtube links to drop malware. Since my old mate worked there, I mentioned it and they were quite interested.
We hadn't shared anything in a while, both of us demonstrating loyalty to our employers and not talking about work details. I said that it would be really cool to have a one dollar check from Google for a bug report. I probably offered to send something cool from my workplace too.
They said, "We don't pay for bugs" Fifty cents? "We don't pay for bugs!"
I felt like I was simply after a piece of paper and the evildoers were a mildly useful source, but I could easily do without them and the souvenir would have been treasured.
I was unreasonably miffed that I couldn't get that piece of paper, though. So I reviewed the links I'd collected and passed some general information but withheld details that would be obviously unique to these attackers. They expressed disappointment with me the next time we spoke. It turns out that what I gave wasn't specific enough to easily identify the lame cross site exploit, despite my actual intent to lead them to the bug.
Interesting they have a bounty program now.
Nice catch. A long time ago the services on the backend were killed by a special URL. And someone found it, and it wasn't filtered by the front end. And of course someone tried to use it, but it never returns since it kills the service, but their client retried ... it was a lot of "what the heck is happening" going on until SRE figured it out and then they immediately patched the front end and the anomalies stopped. It is too bad the person who caused it didn't file for a bug bounty like this person did, they probably would have had something to show for their efforts besides "hey look at this funny thing you can do, oh wait it doesn't do it any more."
That bug is extremely common, and the source is always the use of soft-deletes in the database. When you view the list of items (ex: inbox), the database query includes a "WHERE deleted = false" to exclude rows which have been soft-deleted. When viewing a single item (ex: message) the URL contains a unique identifier, whether an auto-increment integer, UID, etc. The query used to load one item is "WHERE id = :id" instead of the correct "WHERE id = :id AND deleted = false".
Managing soft-deletes on a database table requires an attention to detail, with every single query ever touching that table, that many developers lack the discipline to handle. Discipline aside, it's difficult for every developer on a team to remember which tables use soft-delete, and when checking that flag is or is not necessary. Finally, ORM abstractions often automate soft-delete in such a way that makes it exhausting for developers to validate every query. I've seen this bug over and over again at every company I've worked for. Happens so often it's impossible to keep count.
> Managing soft-deletes on a database table requires an attention to detail.
> Discipline aside, it's difficult for every developer on a team to remember which tables use soft-delete, and when checking that flag is or is not necessary.
That's the case where instead of "try harder not to make mistakes", you design a system so it is not possible to make them. One way would be to rename original table `raw_messages` and `create view messages as select * from raw_messages where not deleted`.
One of the problems with ORMs is that because it lets people forget about the annoying details of their databases, it also makes the forget the useful details of their databases.
Damn you! I wrote a bloody essay in a reply to explain, in superfluous detail, what you summarized in one sentence. Anyone with basic knowledge of the topic would know what you mean. I need to figure out this magic people like you possess. I'm tired of rambling, when nobody will read it. Thank you for the incentive to improve.
That's the kindest spontaneous compliment I've received in a while. Thank you. But: while the pithy comment might farm more imaginary internet points, the essay may actually teach a lesson to the person who doesn't yet get it.
As for writing: it's not magic, but for me it's not consciously applied processes either. If I had to guess how my earlier comment came about, I'd suggest something like this as a generative process:
1. Find two effects with a common cause (provided upthread).
2. State each effect, sharing words and rhythm to bring out contrast.
3. Omit needless words. (Thanks, Strunk/White!)
I don't disagree that this is common, but it doesn't require special skills or extra attention to detail. Your code to load the 'thing' should always use a Loader (or whatever you choose to call it) - some abstraction that loads the thing and checks for permission. It's really not that hard, and it's the bare minimum of competence I expect from a junior web developer. If you have littered your code with SQL statements, you're almost certainly doing it wrong.
The majority of incidents I encounter with this bug occur specifically because of the complexity introduced with abstracted query building via ORMs and DBALs. Developers assume their configuration, such as appending a "deleted = false" clause, applies to every query without manually verifying each one. Yes, it's technically the developers' fault for not understanding how and when the abstraction kicks in, but that doesn't mean it's "simple", or that these cases are avoided.
I've seen abstraction layers where it's impossible to add a default clause to every SELECT query for a model. I've seen other abstractions where "AND deleted = false" can be automatically added to every SELECT query. I've also seen abstractions where that clause is added to all SELECT, UPDATE, and DELETE queries.
Here's a list of problems:
a) Developers bypassing the model, executing a complex JOIN that includes the table in question, and forgetting they need the "deleted = false". Most complex queries wind up being written as raw SQL or a parsed variant, that never executes the model behavior to append the "AND deleted = false" clause. Is it "wrong" to bypass the model? Most of the time, yes! But it happens every day. We're talking about what happens in reality, not what should happen in an ideal fantasy world.
b) Developers missing the case where they should be including soft-deleted rows. When the abstraction layer enforces a "deleted = false" on every query, it can be difficult or impossible to force backtracking to include soft-deleted rows. Back in the MyISAM days (before foreign keys), I found an "account deletion" mechanism that executed a "DELETE FROM messages WHERE userid = :userid AND deleted = false" - soft-deleted rows were not deleted when required, because an abstraction layer excluded soft-deleted rows in a DELETE query and the original developer never noticed.
c) What happens with UPDATE and DELETE queries? I've seen abstractions that only append the soft-delete mechanism to SELECTs, and others that also affect UPDATEs and DELETEs. Again, should every developer on a codebase understand in which situations the abstraction kicks in? Yes. The fact is they don't, because abstractions inherently make developers not inspect the behavior of their code as deeply as they should.
I don't remember soft-deletes being an issue at all - literally non-existent - 10 years ago, when all SQL queries were typed out by hand. When you're forced to write the query yourself, you have time to think about what you are doing. When you delegate the majority of the task to an abstraction layer that magically modifies your queries on the fly, bad things happen. The most stable and maintainable code base I ever worked on had every single query in XML files. It sounds tedious and bloated, as if it's a joke about the "old days", but every query was located somewhere where it could be analyzed, and you actually had to use your brain when writing a new query. I've seen nothing but misery since the introduction of abstracted ORMs and DBALs, where the only way you ever see the queries being executed is in debug dumps and logs.
>> competence I expect from a junior web developer
Sadly, more than half of the senior developers I've met can't handle soft-deletes properly. So no, in the real world, this cannot be expected of junior developers.
It's issues like this that really highlight the benefits of shuffling deleted data to a separate archive table through triggers, or leveraging temporal tables. It may not necessarily be as efficient as maintaining a flag, but it dryastically reduces the mental overhead placed on users of the database.
OTOH from a pro user pov deleted means removed and not made inaccessible. If someone request stuff to be removed, imo it should be removed from archives as well.
You should also publish it. Not out of revenge, but to alert customers of the quality of the software they are using and to put additional pressure on Wickr to fix it. Project Zero does that too (as do most of other security researchers). Of course, you should give them enough time (90 days?) but after that it's publish time...
I found a bug in wickr where I can re-read "deleted" messages.
I submitted it, answered their teams questions about reproducing it.
A couple of weeks later, they said they can't fix it and didn't pay me :(
I got all my wickr contacts to switch to signal, which is much less buggy...
Having worked on a large bounty program myself, and having at least one thing blow up because I dropped the ball on a response, I'll just say that the front-end aspect of it can be extremely chaotic. This guy seems like he's pretty polite and patient, which you generally try to reward with a rapid response and high touch, but sometimes you can get overwhelmed with a burst of reports, distracted by problematic reporters and bogged down by working the bug through the pipeline.
There are systems and processes to help with all of this of course, but at the end of the day it's still a pretty tricky job to get perfect all the time.
What's the general signal to noise ratio for bug reports?
About 10:1 noise:signal.
This comes from a variety of experiences: I used to manage a bug bounty for a mid-size company on Bugcrowd; in 2014 I surveyed people managing a bunch of programs across different sizes; I've participated in bug bounty programs for companies of different sizes.
The more you offer for rewards and the more recognizable your company name, the more you will be spammed by people submitting reports like (I kid you not): "You have the OPTIONS method allowed on your site this is really serious." The last time I looked at the numbers, Google had over 80,000 bug bounty reports per year, with about 10% of them being valid and maybe another order of magnitude being high severity (I'm fuzzy on the last bit). It's probably over 100,000 per year at this point. It's not uncommon for recognizable but smaller companies to receive one or more per day.
I'm aware of full-time security engineers at Facebook and Google who do almost nothing but respond to bug bounty reports. It's a lot like resumes - people who have essentially no qualifications, experience or (most importantly) a real vulnerability finding will nevertheless spam boilerplate bug reports to as many companies as they can. Take a look at the list of exclusions on a given program - you'll see that many of them explicitly call out common invalid findings that are so ridiculous it's kafkaesque.
HackerOne and Bugcrowd provide a lot of technical sophistication to prime companies for success, but there is an organizational component that is very difficult. If your program is very active, it requires dedication to tune it so you're not flushing engineer-hours away responding to nonsense. This is not to say they're bad - quite the opposite, I think they're fantastic. But I generally recommend smaller companies set up a vulnerability disclosure program through a solid third party, and do so without a monetary reward until they can commit to dealing with a reasonable deluge of reports.
My favorite bug bounty report so far read, in its entirety, "try it ASAP".
I've received reports for things like "source code disclosure" where they link to our jQuery.
LOL - I'd like to report that I was able to download the entire source code of your website by right-clicking and selecting "View Page Source..."
If only that were true... modern web pages frequently have basically nothing of any value in the page source; it's all dynamically loaded.
~10% valid submissions still sounds like a fantastic number to me. Sure you have to sort out the bad ones, but it's still a solid stream of valid reports.
It's a pain when you're in the thick of it, but it really is a great way to round out your security program. There's an astonishing number of incredibly skilled and motivated folks out there, and a well-run bounty program can create a nice symbiotic relationship that benefits both.
One other thing that never really gets any press is the fact that a good chunk of the folks sending in reports are young people in impoverished nations. Some of them can be pretty tricky to deal with, but if you hold a hard line on professional expectations you can see them flourish in pretty short order to be some of the best reporters out there.
I only spent a short amount of time on the program I was with, but it was very rewarding. A+++, highly recommended.
That's great about the young people! Thanks for relating this.
Whoever runs this definitely works in a bug bounty program:
Generally in the world of bug bounty programs, the signal-to-noise ratio (SNR) is around 10-20%.
Even at this low rate, it is not too bad. Let's say you receive 10 reports. You can relatively quickly identify the 8-9 noisy reports to find the 1-2 valid ones. Of course, a higher SNR is always better. It saves you time and effort.
On HackerOne, the average SNR across all programs is over 30%. The platform can automatically filter out certain reports that are duplicates or out of scope.
The platform maintains an average signal rating for each hacker (aka security researcher). Companies can limit access to their programs to hackers with a certain signal or higher. This will significantly increase SNR for the program.
Companies can also opt for a HackerOne program with triage included, in which case the SNR rises close to 100%.
> The platform maintains an average signal rating for each hacker (aka security researcher). Companies can limit access to their programs to hackers with a certain signal or higher. This will significantly increase SNR for the program.
So if a new user of the platform, finds a valid or high impact bug, will be unable to report... less noise but a high value bug unreported in that case...
Is there a ticketing system of some kind in play there? I imagine there would be steps like "Respond to user" before resolving/closing.
I'll don my corporate hat and say: those are completely unacceptable excuses for poor client communication.
I'd perhaps accept 'I was so deep in the code!' once from a very junior developer, as a learning experience.
To the person that reported a bug, that one report makes or breaks their entire opinion of your organisation. We lost customers because of poor communication, and on the other hand made some very happy repeat-customers even when we had to say 'we can't fix that yet' - but they were in the loop for the whole process and understood why.
> 10/02/2017 – Google already fixed the issue but forgot to tell me … I contacted them asking for an update
> 19/02/2017 – Got a response, they implemented a short-term fix and forgot to sent my report to the VRP panel …
I hope Google forgetting to follow up on bug bounties and needing to be reminded isn't a common occurrence.
I'm similarly surprised we haven't heard of a AI augmented fuzzer that's been unleashed on random domains to just "try shit out." Seems like a good way to find weird little bugs. Then again, the scope of the "problem" is so massive, and the "rewards" (shit to flag as "yea check this out more") so vague, I don't even know how you'd begin.
If the good people don't do it soon, the bad people will...
Or the curious one. Just make a point&click version of such vulnerability scanner and post in on Reddit; you'll have half of the Internet scanned in no time.
not really, there's enough bad schemes that already work to bother with fancy technical exploits
I think that exists! It's called a vulnerability scanner. Maybe they could be smarter.
Wouldn't those simply scan and try for already known vulnerabilities? I think the point of the AI would be to look for unknown ones.
Its called a fuzzer. many of them have plugin frameworks where you can tensorflow your heart out.
Sure, people thought of it -- Google even sells it as a product, Cloud Security Scanner. The internal version has been running on internal sites for a long time now.
 - https://cloud.google.com/security-scanner/
Very interesting. Does this really implement some intelligence/learning, however? Or is it just going over a list of known vulns like most scanners do?
It is an interesting subject to research but not easy. Finding and the exploiting a bug is art and science.
Augmenting fuzzying with AI is an interesting approach.
So I was thinking recently... with Google (amongst others, of course) themselves pushing towards AI applications, it seems to me that many of these less-advanced* bounty hunts might perhaps be able to be automated with a fuzzer+scraper+AI based approach. The fact that bug bounties are still being awarded does suggest that this is not that trivial, however, but might still be fun to explore nonetheless. I.e. can one train an agent that goes off and tries this sort of things autonomously? Might be fun to translate the HTTP intrusion domain into a deep learning architecture.
Similar things are being applied on the "defensive" side of things already anyway (i.e. Iranian, Turkish, Chinese firewall systems using machine learning to identify and block new patterns), so why not apply this on the offensive side.
*: Not to demean the author in any way; I understand that putting the time in to explore these things is easier said than done in hindsight.
I am a bit jealous :).
I also did a subdomain search on google a few weeks ago. I stumbled upon a lot of login sites.
A subdomain search leaded to 95 subdomains under corp.google.com.
I don't want to get sucked into it, I'm also closing the tab and going back to my terminal :).
Indeed, or sometimes I want to try certain attack vendors and the next second I am thinking I am fooling myself, they're smarter than me, they wouldn't leave such bugs in, queue a few weeks later, someone gets a few $k because they let themselves sucked into it :D.
I guess it's as much mindset as it's skill.
I got a bug bounty once because I reported a bug in Chrome that someone else was complaining about in the comments section of a tech blog.
If instead of just complaining that commenter had taken the time to fill out a bug report they could have easily gotten the bounty instead.
Sometimes it just takes a tiny bit of extra effort to go from noticing something's amiss to actually doing something to get it fixed.
Good idea. Imagine if you can do one bug report a month. 5K is nice income.
What was the security issue?
Basically, Chrome allowed users to use the "Always open files of this type" option with executable files. So if anyone was ever foolish enough to set that option after downloading a `.exe` on Windows, any future site they visited could take over their machine just by initiating a download for a malicious executable.
How did you subdomain search? Was it a brute force / dict search?
Shameless self-plug: You can use Fierce! A DNS reconnaissance tool - https://github.com/mschwager/fierce
DNS recon tool should be able to do it. If you look around for DNS online tool couple dozens of Google subdmains will be revealed. With Certificate Transparency this kind of information is not as secret as it used to be. Last year there was a vulnerability I forgot which it was quite big has something to do with a legacy software and it led me to look at what domains are using my company's cert. Qualy's made a tool out of this.
DNS search. You can use a tool like fierce or subbrute if you're lazy.
riskMi is probably from CA Technologies RiskMinder™.
A few weeks ago? It says a temporary fix was done by February 10.
I discovered the same error/bug a few weeks ago when a co-worker linked "this weird page" to me, I just looked around and thought it's pretty cool too see that part of Google and didn't thought too much of it, closed the tab and went back to my Terminal. :)
In at least two other companies I've worked at we also use query params to enable debug information on live production sites. At one of those companies the only requirement was that you be on a corporate ip address but it actually still works if you're on our guest wifi.
It's a very low percentage, somewhere in the low single digits (if not an even lower order of magnitude). Still high enough for it to be worth paying engineers to maintain the backwards compatibility :)
Good catch! Also studied Google's 404 pages. Seems like they have unified all but a few of them. One of them I found was vulnerable to old utf-7 injection (specified customizable page title before character encoding) and another was vulnerable to XSS. Got a bounty for the XSS one, the utf-7 one targeted too old browsers, out of scope for the program (I do wonder how many IE6 users Google sees).
The page appears to have about 70 characters on a line:
>>> len("which is nothing more than a simple login page (seems to be for Google")
The bigger issue is, I think, font size. I could imagine that on certain displays this font might look rather small.
Looks like a mobile layout that is "responsive" in higher resolutions only by adding some extra elements to surround the fixed-width central container.
I prefer narrow columns for reading personally. Snap the article window to the side and scale it down for maximum enjoyment.
You can make any site have a narrow column like that - the site should give those of us who prefer wider text a way to get that too.
Offtopic: What's with the hyper narrow width on this page? Looks like this on a 1440p monitor (ubuntu, chrome) http://i.imgur.com/m9YWcNj.png
Haven't read one of those in years. Did one come by on HN recently?
This one comes to mind: https://news.ycombinator.com/item?id=14166966
> Raneri questioned my motivation and I said that I want to give the vendor ample time to resolve the issue and then I want to publish academically. He was very threatened by this and made thinly veiled threats that the FBI or other institutions would "protect him". Then he continued with statements including "we want to hire you but you must sign this NDA first." He also recommended that I only make disclosure through FINRA, SDI, NCTFA and other private fraud threat sharing organizations for financial institutions.
Such a refreshing story after countless of security researchers get threatened or sued when they report security vulnerabilities to the company that should have thanked them instead.
I found a bug in Go which turned into this CVE
I applied for a bug bounty, but alas was turned down as Go isn't a Google service and it wasn't in scope for the Patch Reward Program.
I did get into the hall of fame though!
Here's a link to the Google Security Rewards Program
Can you tell what SFFE means? GFE is the Google Front End but I can't find much about SFFE.
fairly sure it just stands for "staticfile frontend".. nothing too exciting.
edit: some support for that at https://bugs.chromium.org/p/chromium/issues/detail?id=548688...
So if you just want to serve 5 TB... there you go.
Would be very interesting to know what SFFE means ;)
I'm pretty certain it's a server name.
It's a similar convention as some of their other names such as GRFE (Groups), bsfe (Blog Search Front-End).
Static File Front-End?
The hostname is static.corp and the service is static-on-bigtable, so...
As a Xoogler who misses the debug stack, this was some fun nostalgia. Good catch!
Am I the only person who had to enlarge the page to read the article? Nice catch though.
thefacebook.com redirected to facebook.com/intern/vod up until last week.
We do supply high quality medical marijuana (indoors and Outdoors) with great Sativa and Indica strong THC and CBD strains, Our products are of the grade A quality type
And lot more
And lot more Contact: email@example.com
Secrets of Super and Professional Hackers | Download free Hackers HandBook http://hackernucleus.com/secrets-of-super-and-professional-h...
I wish I could flag this.
There's invariably one comment like this on bug bounty stories here. One comment that isn't happy with the bug bounty result even when the researcher is and goes off the rails with a weird anti-large company bias and some conspiracy.
This has absolutely nothing to do with your Nexus RMA story, or your cloud SLA story from downthread, or whatever other agenda you have against these large companies.
Do you have any concept of it's like to run a bug bounty at Google's size? Have you ever been involved in managing one? Have you ever participated in one? Can you qualify any of your opinion with something aside from these irrelevant grievances you're throwing out?
You're not contributing to the discussion at all, you're just hijacking the thread so you can perpetuate your soapbox. Human beings make mistakes and bug bounties are an easy place to drop the ball. No one is trying to cheat security researchers out of their rewards.
Some companies do worse than attempt to cheat people out of rewards: they threaten to sue them. But Google doesn't do that.
I find it much more likely that they just dropped the ball on paying them, rather than maliciously trying to deny them a bounty.
Try arguing with them about an SLA violation they had with their cloud infrastructure. The game is setup where they can chisel and have these guffaw and aw garsh moments but every adword and every billable second on their cloud will get paid hell or high water.
Did you seriously come to hijack a story about a Google bug bounty so you could complain about an irrelevant issue?
If I'm being completely fair, how often does HN as a whole discuss related topics or cases in the comment thread? The best example is probably that people kept mentioning Reader shutting down whenever they launched something new, a la "not going to use this, see what they did to Reader".
Not sure it's entirely the same as this case, but hijacking threads with "remember when this happened?" is not that unique.
Sounds like everyone got what they wanted.
Also sounds like you were dealing with humans, who (before escalating) followed company protocols.
Also seems not too far-fetched that those protocols are in place to weed out the whatever% that make bogus claims.
All in all, nothing I'd call "truly sad", "evil" and whatnot. Just big businesses being big businesses.
Curious why you have such high tolerance for this type of behavior, since it appears that you don't actually approve of it.
I can't speak for the parent, but I have a tolerance for it because I've actually managed bug bounty programs; I also know Google receives on the order of 100,000 bug bounty reports annually, while a full order of magnitude less are actually valid.
Similar. Experience tought me that having patience for bad situations leads to more understanding and easier solving of issues.
The fact is that the more people are involved in a process, the slower the cogs turn.
Having patience is more of a defense mechanism than it is anything else. It has served me well and made life easier!
Nice , glad he got paid. Whats truly sad is they did try to chisel him out of his bounty if you read the timeline, he had to prod them to get his bounty. Can any of these top-ten companies that make like 200 million per day (google/alphabet, amazon, microsoft) ever do anything in good faith? One time google stiffed me initially on an RMA with a nexus phone until I stamped my feet a bit complaining that a company that makes 200M/day is stiffing me, the loyal android lover, for a $200 hunk of chinese-made plastic. (and just a short while later with things like SafetyNet I can no longer have root on an $800 pixel that isnt even waterproof). So much for dont be evil. Well, they dropped that in favor of -"do the right thing—follow the law, act honorably, and treat each other with respect" yeah, they can respectably be cheap chiseling money grubbers as long as its legal - like helping locate Falun Gong members in China. And the corporate stewardship -in the hands of the employees - they get blinded by the huge pay and instantly fail to see the bad things that go on. Anyways, glad he got paid. Amazing that a company with 57000 employees (and TONS of contractors, they dont want more employees who get benefits and stock when they can string contractors along) needs some random guy to find holes in the back door.
What do you think the motivation would be to intentionally "forget"? To save money? If that's the motivation why not pay the reporter $500 instead of $5000 for a small info leak. It's illogical that they would intentionally "forget" then pay such a high bounty.
Where is this resentment and skepticism coming from? The facts say otherwise. Google is known to be receptive to bounties and payout.
Most of their consumer-facing services are not, shall we say, famous for being high-touch...
But presumably different people look after the security side of things.
This story from yesterday has raised some resentment about google's support:
A startup’s Firebase bill suddenly increased from $25 to $1750 per month
Their customer/enterprise support is notoriously sketchy, at least from the outside (a la HN) looking in.
That's not the same at all as their bug bounty program, which is generally one of the best out there.
Google's known for their fast triage/response time. They're also fairly generous with their bounties.
Depending on the severity of your finding, your report could wake up a senior security engineer.
When your report is out of scope, Google will not ignore your report. When there is a non-serious bug, you get acknowleged in the bug report they file internally. Finally, when they can not replicate your finding, they will communicate that with you and stay patient until they can either replicate or close your report.
Edit: forgot to add that they raised the bounty with another 2k ("we updated our payouts") and they invited me to their Blackhat booth 1 year later.
Interesting. What'd you find? :)
Apparently I was part of a test where input sanitization was turned off. Reported and fixed before they could push it live.
Very "monkey on a typewriter". I was not even looking for security bugs, but studying usage of maia.css.
I'm surprised that anyone at the big Corp actually bothered to even reply to this guy reporting the bug much less actually give him a bounty!