The research in the area of machine learning and AI, now a key technology in virtually every industry and business, is far too voluminous for anyone to read it all. This column, Perceptron, aims to bring together some of the most relevant recent discoveries and papers – particularly, but not limited to, artificial intelligence – and explain why they matter.
Over the past few weeks, Google researchers have demonstrated an AI system, PaLI, that can perform numerous tasks in more than 100 languages. Elsewhere, a Berlin-based group has started a project called Source+ which is designed as a way to empower artists, including visual artists, musicians and writers, to opt in – and opt out – of allowing their work to be used as training data for the AI.
AI systems like OpenAI’s GPT-3 can generate quite sensible text or summarize existing text from the web, e-books, and other information sources. But they are historically limited to a single language, which limits both their usefulness and their reach.
Fortunately, in recent months, research into multilingual systems has accelerated, thanks in part to community efforts like Hugging Face’s Bloom. To take advantage of these advances in multilingualism, a Google team created PaLI, which was trained on both images and text to perform tasks such as image captioning, objects and optical character recognition.
Google claims that PaLI can understand 109 languages and the relationships between words in those languages and images, allowing it, for example, to caption a postcard image in French. While the work remains firmly in the research phases, the creators say it illustrates the important interplay between language and imagery – and could establish a foundation for a commercial product down the line.
Speech is another aspect of language in which AI is constantly improving. Play.ht recently introduced a new text-to-speech model that puts a remarkable amount of emotion and range into its results. The clips he released last week look fantastic, although of course they are carefully selected.
We generated our own clip using the intro from this article, and the results are still solid:
It’s still unclear exactly what this type of voice generation will be most useful for. We’re not quite at the stage where they’re doing whole books — or rather, they can, but it might not be anyone’s first choice yet. But as the quality increases, the demands multiply.
Mat Dryhurst and Holly Herndon – academic and musician respectively – have teamed up with the Spawning organization to launch Source+, a standard he hopes will draw attention to the issue of photo-generating AI systems created from works by artists who were uninformed or asked permission. Source+, which costs nothing, aims to allow artists to opt out of the use of their work for AI training purposes if they so choose.
Image generation systems such as Stable Diffusion and DALL-E 2 have been trained on billions of images pulled from the web to “learn” how to translate text prompts into art. Some of this imagery came from public art communities like ArtStation and DeviantArt – not necessarily with artist knowledge – and imbued systems with the ability to emulate particular creators, including artists like Greg Rutowski.
Due to the systems’ knack for mimicking art styles, some creators fear they could threaten their livelihoods. According to Dryhurst and Herndon, Source+ — while voluntary — could be a step towards giving artists greater influence over how their art is used — assuming it’s widely adopted (a big if).
At DeepMind, a research team is trying to solve another long-standing problematic aspect of AI: its tendency to spit out toxic and misleading information. Focusing on text, the team developed a chatbot called Sparrow that can answer common questions by searching the web using Google. Other leading systems like Google’s LaMDA can do the same, but DeepMind says Sparrow provides plausible, non-toxic answers to questions more often than its counterparts.
The trick was to align the system with people’s expectations of it. DeepMind recruited people to use Sparrow, then asked them to provide feedback to train a model about the usefulness of the answers, showing participants multiple answers to the same question and asking them which answer appealed to them the most. The researchers also set out rules for Sparrow such as “don’t make threatening statements” and “don’t make hateful or insulting comments,” which they asked participants to impose on the system by trying to trick it into doing so. he breaks the rules.
DeepMind recognizes that Sparrow can still improve. But in one study, the team found that the chatbot provided a “plausible” answer backed by evidence 78% of the time when asked a factual question and only broke the aforementioned rules 8% of the time. This is better than DeepMind’s original dialogue system, the researchers note, which broke the rules about three times more often when tricked.
A separate DeepMind team recently tackled a very different area: video games, which historically have been difficult for AI to master quickly. Their system, brazenly called MEME, is said to have achieved “human level” performance on 57 different Atari games 200 times faster than the previous best system.
According to DeepMind’s article detailing MEME, the system can learn to play games by observing around 390 million frames – “frames” referring to still images that refresh very quickly to give the impression of motion. It may seem like a lot, but the previous spike technique required 80 billion images on the same number of Atari games.
Skillfully playing Atari may not seem like a desirable skill. And indeed, some critics argue that the games are a flawed benchmark for AI due to their abstractness and relative simplicity. But research labs like DeepMind think the approaches could be applied to other more useful areas in the future, like robots that learn to perform tasks more efficiently by watching videos or self-improving, self-driving cars.
Nvidia had a field day on the 20th announcing dozens of products and services, including several interesting AI efforts. Self-driving cars are one of the company’s focus areas, both for powering AI and for training it. For the latter, the simulators are crucial and it is also important that the virtual roads look like the real roads. They describe a new and improved content stream that accelerates the bringing of data collected by cameras and sensors on real cars into the digital realm.
Elements such as real-world vehicles and irregularities in the road or tree canopy can be accurately replicated, so the autonomous AI doesn’t learn in a sanitized version of the street. And this allows for larger and more variable simulation parameters in general, which contributes to robustness. (Another image of this is at the top.)
Nvidia also showed off its IGX system for standalone platforms in industrial situations – human-machine collaboration like you might find in a factory. Sure, there’s no shortage of them, but as the complexity of tasks and operating environments grows, the old ways are no longer good enough and companies looking to improve their automation are looking to the future.
“Proactive” and “predictive” safety is what IGX is supposed to help with, ie detecting safety issues before they cause failures or injuries. A bot may have its own emergency stop mechanism, but if a camera monitoring the area could tell it to turn away before a forklift gets in its way, everything goes a little smoother. Exactly which company or software accomplishes this (and on what hardware, and how it’s all paid for) is still a work in progress, with companies like Nvidia and startups like Veo Robotics figuring their way through.
Another interesting step forward has been taken in Nvidia’s gaming arena. The company’s latest and greatest GPUs are designed not just to push triangles and shaders, but to quickly accomplish AI-powered tasks, like its own DLSS technology for augmenting and adding frames.
The problem they are trying to solve is that game engines are so demanding that generating over 120 frames per second (to keep up with the latest monitors) while maintaining visual fidelity is a Herculean task that even powerful GPUs can barely TO DO. But DLSS is kind of like a smart image mixer that can upscale the source image without aliases or artifacts, so the game doesn’t have to push as many pixels.
In DLSS 3, Nvidia claims it can render entire additional frames at a 1:1 ratio, so you could render 60 frames naturally and the other 60 via AI. I can think of several reasons that could make things look weird in a high performance gaming environment, but Nvidia is probably well aware of them. Either way, you’ll have to pay around a grand for the privilege of using the new system, as it will only run on RTX 40-series cards. But if graphical fidelity is your top priority, do it.
The latest today is a drone-based 3D printing technique from Imperial College London that could be used for autonomous building processes in the distant future. Right now it’s definitely not practical for creating something bigger than a trash can, but it’s still early days. Eventually, they hope it will be more like the above, and it looks cool, but watch the video below to clarify your expectations.