AI converts sound into street-view images using new generative technology

Education
Webp 970n9gzbh0dddn2y4jbktlgl92yk
Jay Hartzell President | University of Texas at Austin

A team of researchers at The University of Texas at Austin has developed a method to convert sounds from audio recordings into street-view images using generative artificial intelligence. This breakthrough suggests that machines can replicate the human ability to connect audio and visual perceptions of environments.

The research, published in Computers, Environment and Urban Systems, involved training a soundscape-to-image AI model with data from urban and rural streetscapes. The model then generated images based on audio recordings. Yuhao Kang, assistant professor of geography and the environment at UT and co-author of the study, explained: “Our study found that acoustic environments contain enough visual cues to generate highly recognizable streetscape images that accurately depict different places.”

To train their AI model, the researchers used YouTube video and audio from cities across North America, Asia, and Europe. They created pairs of 10-second audio clips with image stills from various locations. The AI-generated images were compared to real-world photos using both human and computer evaluations. Human judges achieved an average accuracy rate of 80% when matching generated images to source audio samples.

Kang highlighted the significance of this development: “Traditionally, the ability to envision a scene from sounds is a uniquely human capability... Our use of advanced AI techniques supported by large language models (LLMs) demonstrates that machines have the potential to approximate this human sensory experience.”

The study also found that generated images often maintained architectural styles and reflected lighting conditions such as sunny or cloudy weather accurately. These observations enhance understanding of how multisensory factors influence our perception of a place.

Kang further elaborated: “When you close your eyes and listen, the sounds around you paint pictures in your mind... Each sound weaves a vivid tapestry of scenes, as if by magic, in the theater of your imagination.” His work focuses on geospatial AI's role in studying human-environment interactions. A recent paper published in Nature by Kang explored AI's potential to capture unique city characteristics.