AI converts sound into street-view images using new generative technology

Jay Hartzell President - University of Texas at Austin
Jay Hartzell President - University of Texas at Austin
0Comments

A team of researchers at The University of Texas at Austin has developed a method to convert sounds from audio recordings into street-view images using generative artificial intelligence. This breakthrough suggests that machines can replicate the human ability to connect audio and visual perceptions of environments.

The research, published in Computers, Environment and Urban Systems, involved training a soundscape-to-image AI model with data from urban and rural streetscapes. The model then generated images based on audio recordings. Yuhao Kang, assistant professor of geography and the environment at UT and co-author of the study, explained: “Our study found that acoustic environments contain enough visual cues to generate highly recognizable streetscape images that accurately depict different places.”

To train their AI model, the researchers used YouTube video and audio from cities across North America, Asia, and Europe. They created pairs of 10-second audio clips with image stills from various locations. The AI-generated images were compared to real-world photos using both human and computer evaluations. Human judges achieved an average accuracy rate of 80% when matching generated images to source audio samples.

Kang highlighted the significance of this development: “Traditionally, the ability to envision a scene from sounds is a uniquely human capability… Our use of advanced AI techniques supported by large language models (LLMs) demonstrates that machines have the potential to approximate this human sensory experience.”

The study also found that generated images often maintained architectural styles and reflected lighting conditions such as sunny or cloudy weather accurately. These observations enhance understanding of how multisensory factors influence our perception of a place.

Kang further elaborated: “When you close your eyes and listen, the sounds around you paint pictures in your mind… Each sound weaves a vivid tapestry of scenes, as if by magic, in the theater of your imagination.” His work focuses on geospatial AI’s role in studying human-environment interactions. A recent paper published in Nature by Kang explored AI’s potential to capture unique city characteristics.



Related

Superintendent Matias Segura

Twin brothers from Venezuela prepare for graduation at Northeast Early College High School

Twin brothers José Ignacio and José Emilio Araujo Rodríguez are set to graduate from Northeast Early College High School after relocating from Venezuela last year. They overcame challenges adjusting to life in Austin with support from their community. Both plan different educational paths but say their family bond remains strong.

Superintendent Matias Segura

Women Leading the Way event promotes confidence, empowerment in middle school girls

Women Leading the Way is expanding its leadership conference for middle school girls across all Austin ISD campuses this May. Students will participate in sessions focused on empowerment, practical life skills and community building.

Jeremy Martin, President at Austin Chamber of Commerce

Scaling IT operations remains a key challenge for growing Texas organizations

Texas continues its rapid population growth but faces mounting pressures on IT operations due to increased complexity from expansion across multiple sectors. Standardization of processes along with centralized visibility are emphasized as key strategies for maintaining efficiency amid ongoing development.