KT Corp. makes the impossible possible

Tech

KT Corp. makes the impossible possible

April 19, 2022

What if KT Corp. could make the impossible possible?

Saying goodbye to a person that we loved very much is not easy. The event of their death marks the beginning of a long process of forgetting everything about them. The memory of the deceased fades day by day, and we soon do not remember how their voice sounded like. That is what makes the death of our loved ones even more painful to endure and overcome.

But what if artificial intelligence allows us to hear, at least, their voice once again? Not only that, given with sufficient amount of data, recent advancements show AI’s potentials of learning talking traits or personalities of specific people such as a beloved celebrity.

The Return of the Legendary Singer

Shin Hae-chul, a famous Korean singer who passed away in 2014 after a cardiac arrest, had made a “come back” for his old radio show in September. It was not actual Shin that showed up at the radio station, however. Instead, it was an AI software talking in Shin’s voice.

“It has been almost 10 years (since the last episode), right?” AI DJ Shin said at the beginning of his radio show “Shin Hae-chul’s Ghost Station”, where Shin worked as the main DJ from 2001 to 2012. “It feels great to be back to talk to you all,” he continued.

Although the show was initially available to a limited audience through a virtual assistant product built by KT Corp., the second-largest wireless carrier in South Korea, many of his old-time fans welcomed his return in an AI form.

“This show will warm the hearts of Shin’s family and friends because they must miss Shin very much,” a person under the username “nay” commented on a seven minutes long clip of the show uploaded on Naver, the biggest web portal in the country.

“It breaks my heart to hear him saying ‘it has been almost 10 years’, but good to hear his voice again at least,” another user named “Won Chan-woong” left a comment on the same video.

Some fans doubted whether AI DJ Shin managed to talk like “actual” Shin, criticising that the AI could not imitate detailed viewpoints or personalities of his.

“If it were Shin, he wouldn’t have introduced himself as a musician,” a comment left by “KANG” on the same video read. “He would have rather said ‘Hey, I’m back’”, another user named “Googit” replied to his comment, agreeing that AI DJ Shin’s several remarks may bear not so much semblance to the deceased celebrity.

KT Corp. – Power of Deep Learning

Park Jae-han, Project Manager of Text-to-Speech Project at KT Research Centre (KT Corp.), who is also one of the developers of AI DJ Shin, says that AI DJ Shin is not the first attempt to generate a specific individual’s voice using AI.

AI-generated voice software is often powered by text-to-speech, or TTS, an engine or technology that turns texts into speech that sounds natural. However, it has its limitations — a human voice actor had to read in-text characters and messages first, and developers input the recorded voice files into AI tools. AI tools could not do more than repeat or rearrange the voice files.

Park says the introduction of deep learning, which is a subset of machine learning, was a game-changer in improving AI-generated voice software.

Deep learning is a network of multiple layers that can make predictions, optimise such predictions, and make the output more accurate. This network enables AI tools to learn from data, like how human brains do. By using big data — radio show recordings of 11 years, in AI DJ Shin’s case — deep learning can allow AI-generated voices to sound more like a “human”.

“With the use of deep learning, AI knows how to connect similar sentences to form a context — like us humans do,” Park told 4i Magazine. “It makes AI’s speech more natural and human-like, making predictions of flows or rhythms of conversations and synthesising those in-voice fragments.”

Unlike conventional AI-generated voice engines, like the ones we hear from the opposite end from automated calls, recent TTS engines can imitate human-like expressions, such as pauses and intonations, by leveraging deep learning technology.

In 2018, for example, Google introduced Duplex, a feature tool integrated within its virtual assistant, that can make real-world calls to local businesses on behalf of their users. With their “realistic-sounding” yet “artificial” voice — thanks to deep learning — some users of Duplex felt confused if they were talking to a human when they were talking to Duplex.

“With the deep learning network, the developers don’t have to cut or insert fragments of recorded voice files anymore (in AI software). Deep learning networks synthesise those files based on context.”

KT Corp. – When We Want to Hear Their Voice

KT Corp. previously used the TTS technology to give “voices” to people with hearing difficulties, as part of their corporate goal to make philanthropic contributions to society.

“We chose 20 people to give them a new voice to communicate with their family and friends,” Park said. “The team collected voice data from their siblings and other family members and created a new artificial voice. The people could talk in their own voice through a voice app we built, ’Ma-eum Talk (마음톡)’.”

Park explains that recreating Shin’s radio shows is also in the same line with that philanthropic goal of the company. “This time, we wanted to give something to the public in memory of Shin Hae-chul,” Park said.

The script of the recreated Shin’s show, three episodes in total, was written by Bae Soon-tak, a radio scriptwriter who previously worked with Shin. AI-generated voice of Shin read it out as if Shin would have.

During the show, AI DJ Shin talked about issues that he would have thought about if he was alive. For example, he talked about the government’s policies for indie bands and how the bands are suffering amidst the pandemic. He also gave some advice and suggested destinations that musicians should reach out for.

“We received good feedback from the audience — some people said that they shed a tear while listening to the recreated shows,” Park said.

KT Corp. said the full episodes will be uploaded on the company’s YouTube channel soon.

AI-Generated Voice to Take One Step Further

Park says that AI-generated voice will show more advancements in the future, as it showed what it can do with deep learning. “AI-generated voice will meet more various kinds of needs of people than before,” he said.

“Countries across the world, including South Korea, compete to improve the quality of AI-generated voice and discover new uses of the technology,” he added.

However, many people still are not familiar with the concept of using AI-generated voice and do not understand how they can be applied in real-world uses.

“AI is still a difficult concept for the public,” Park said. “It does not seem to be applied in many fields of industries yet, but it will soon be seen more and more every day.”

Just like how the industrial revolution changed our lives, AI provides many promising possibilities for more convenient and efficient lifestyles. Perhaps someday, with the use of AI, we might be able to talk to our loved ones over the phone, preserving the cherished memory forever.