Most of the artificial intelligence models we’ve seen in recent months, since the subject has been in the spotlight, are based on language and use words as a starting point. Meta is looking at doing something different: a technology called ImageBind, which combines six types of data.

The new AI model uses text too, but goes further and encompasses audio, visual data, temperature, depth and motion readings.
Meta believes that the work could, in the future, create a generative artificial intelligence of multisensory and immersive experiences. If you read this and thought about the metaverse, know that you are not the only one.
The project is in its initial research phase and has no practical applications. Even so, the code is open, and other experts can get to know better how it works.
This point is interesting: as observed by the VergeOpenAI and Google share very little of their technologies, while Meta has been doing the opposite and opening up its searches.
Talking about six types of data may sound complicated, but it’s less than it sounds. What ImageBind does is bind them all together, just like other generative AIs.
The tools for generating images, for example, were trained with large sets of text and images. Thus, they learned to relate descriptions to photos, drawings, works of art and more. With this, they can understand what you would like to create when you enter an order.
ImageBind goes further and tries to relate texts, images (static and videos), sounds, temperatures, depths and movements.
One of the examples shared by Meta shows the relationship between a train horn, videos of trains arriving at a station, depth data showing the approach of an object, and descriptions such as “train stops at a busy station” and “the wind blows as the train moves through a grassy landscape”.
Other imagined cases serve to illustrate where the Goal wants to go. Combining a pigeon image with engine noise, for example, should bring up an image of birds flying as a motorbike approaches. The screams of penguins could generate an image of the animals.
Meta does not want to stop there. In the blog post with the announcement, the company says that future models could include touch, speech and brain signals obtained by functional MRI.
The idea is that ImageBind arrives in virtual reality. Thus, he could generate digital environments that go beyond audio and video, with movements and ambience.
It seems that, even investing more in artificial intelligence, Meta has not given up on the metaverse idea.