I think it has more to do with the way that researchers describe their projects to journalists. Whatever you're working on is to complicated to explain properly, but you can at least say what it might eventually be useful for. So you tell that to the journalist, and (if they are good) the journalist publishes "A step towards X" (if they're a bad journalist you'll see "Scientists discover X").
Tip: When you read articles that says "researchers might have..." or "researchers get closer to...", read them as follows:
These folks ran out of their funding and need to renew their grants. For that purpose here's their progress report and no, tech is still quite far away. Watch out for same article title next year same time.
To be clear, there is nothing wrong with this. Some progress takes time and funders just need to know that folks are working hard at it.
Without a non-linear activation function, it wouldn't be an ANN, because multiple linear layers are equivalent to a single layer applying the composition of their transforms.
The article gives no clue about this part. I have no idea if the optical domain can compute non-linear functions.
If you want a non-linear function that can be produced optically, consider an evanescent wave, as used in TIRF microscopy. Whether this is suitable in practise as an activation function I couldn’t say, though the curve does to my eye look like something that could be used in place of relu.
Can it compute anything non linear? I.e. can it actually have any activation function?
Isn't optical Kerr effect (separate from Kerr electro-optic and Magneto-optic Kerr effect) pure optic-optic nonlinearity without electrons?
"If an elderly but distinguished scientist says that something is possible, he is almost certainly right; but if he says that it is impossible, he is very probably wrong." - Arthur C. Clarke
"A platitude is a trite, meaningless, or prosaic statement, often used as a thought-terminating cliché, aimed at quelling social, emotional, or cognitive unease" – Wikipedia
"Touche." - Me.
Optical computation will never become relevant at scale. There are fundamental reasons for this: first, particle size. A photon at usable wavelengths is extremely large, much larger of any modern electron based _devices_ This makes it imossible to scale to usable density. Second, optic-optic (as opposed to electro-optic) non linear effects are based on interaction with electrons, in particular with electron decay from an energy state to another which is tipically extremely slow.
By performing these transformations optically, they primarily get data parallelism (like [GTV]PUs). I expect this to happen. NVIDIA’s ACDC paper provides an FFT-accelerated neural network layer (similar to deep-fried convnets), with an offhand remark that the transformations could be performed optically. I wonder what kind of information bandwidth they can get, though.
Physicists were using optical lenses to do approximate FFTs over a hundred years ago.
Can you use something like wavelength division multiplexing to get different data streams and achieve parallelism there?
I can't read the article as it's behind a paywall, but if you can make a chip that's 100% optical, then it means that when you beam your input data at the input end of the chip, you _instantly_ get the output at the end. No need for cycles for multiplying, adding and so on. Plus it wouldn't heat up like silicon does.
you _instantly_ get the output
That's not how physics works, unfortunately.
It may be my lack of knowledge about optics, but from an ML perspective this seems rather mundane if not useless. Model training involves high levels of parallelism on a large scale for difficult tasks, something I can’t see these optical chips doing. Does anyone have any further information that might enlighten me to otherwise?
Makes me think of the positronic brains from Asimovs books.
related: "All-optical machine learning using diffractive deep neural networks" https://news.ycombinator.com/item?id=17698135