Saturday, November 11, 2023

Music From the Machine

The ghost in the machine of the moment is artificial intelligence, most notably as it is applied to text to create human sounding responses. Best exemplified by ChatGPT, the idea is that computers sift through millions of samples of man-made writings and learn to mimic it when prompted. Ask the system how to build a bench or an itinerary for a vacation in Mexico, and rather than get a list of websites you get a dissertation that seems as though it came from a carpenter or a world traveler.

While the initial efforts were focused on words, it didn't take long for developers to widen their focus. The next iteration involved pictures. It was the same idea: scour the internet for images of anything and everything, then use those examples to create new images that mimic real ones. So type "dog with a bone on Mars in the style of Dali" into DreamStudio or Midjourney, and you'll get a picture that looks as if Salvatore was taking his pooch for a walk on the red planet.

These image generators are also why the alarm bells are ringing over the creation of so called "deepfakes." While you can easily type in "picture of fruit floating in water in the style of Picasso" it's just as easy to enter "picture of Joe Biden at a bar doing shots as if taken by a paparazzi."  At this point the result might not be perfect or fool anyone, but as the systems get better it will be harder and harder to tell the fakes from the real.

The latest frontier is with sound. We're already seeing it with voice samples: in New York City they're using artificial intelligence to reach city residents through robocalls in a number of languages. But it's not just a random foreign speaker. They took Mayor Eric Adams' voice, sampled it and recreated it with him speaking in different tongues. So depending on your location in the five boros you might hear Hizzoner in Spanish, Yiddish or Mandarin, none which he actually speaks.

On the melodic side they are using the same approach as ChatGPT, just with music. The developers set their systems to scrape millions and millions of samples of songs available online, from classical to jazz, from pop to rock, from rap to country. They deliberately do not associate the songs with groups or artists for copyright reasons, but rather with a specific genre. And once they have that database, the building begins.

The process is the same as with pictures or text: describe some kind of music, press a button, and sit back to watch the machines build you a riff. Type "meditative song, calming and soothing, with flutes and guitars" into Google's MusicLM program. The computer thinks for a bit, and out comes a 20 second or so track that sounds like it could be from a group like We Dream of Eden or Phil France. Or try "up-tempo jazz that you can dance to in a smooth style" and out comes a Kenny G-esque sample. 

At this point the tracks sound artificial and half formed. But it's important to understand that what you are hearing are not samples of music that fit your description, but rather newly composed tunes never played nor heard by anyone anywhere. It's only a matter of time before the programs improve to the point that when you type in "danceable power pop that has positive vibe as if sung by a former Disney princess" what comes out sounds like the backing track to a Demi Lovato top ten hit.

If you're counting, that's words, pictures and sound that can be created by computers and passed off convincingly as crafted by flesh and blood humans. That leaves touch, smell and taste as the last frontiers for machine generated senses. One wonders if in some garage somewhere there is a tech toying around with his computer connected not to a keyboard, brush or instrument, but to a refrigerator and an oven, and typing in "hot food that blends tomatoes, cheese and spices in an irresistible package similar to but not as greasy as pizza." I bet we'll be eating the result before the end of the decade.

-END-  

Marc Wollin of Bedford has tried creating computer columns, photos and songs. None are that good. Yet. His column appears regularly in The Record-Review, The Scarsdale Inquirer and online at http://www.glancingaskance.blogspot.com/, as well as via Facebook, LinkedIn and Twitter.


No comments: