The Robots are Coming to Media Production!

It’s the catch-cry across numerous industries. Humans replaced by robots, by software, by machines. But, artificial intelligence (AI) in content production?

It may closer than you think, according to Bea Alonso-Martinez, Business Development Director, Media Logistics at Ooyala. The Telstra-owned company has been focusing on automation, collaboration, artificial intelligence, and datamining in its quest to bring efficiencies to bear on media production.

“We’ve actually done a bit of a proof of concept together with Microsoft,” says Alonso-Martinez, “to see how cognitive services can index video in real time and analyse faces, sentiments, transcribed speech, text, even translated text and how that can be leveraged not only through the production process but also for delivery and including recommendations and advertising insertion.”

C+T: How far away do you think that is from being implemented?

“From the proof of concept that we’ve seen, I believe that we can start seeing some useful application of this probably within about 6-8 months from now. It is in its infancy today.”

C+T: As quickly as that?

“We can already deploy artificial intelligence to make decisions around the way that the infrastructure is used, an example being the growth of cloud storage. We can use elastic deployment by increasing the amount of computer power or the amount of storage that is needed based on your production needs. So, with Media Logistics, one of the things you can do is whenever you enter a new production with the system you can say we’re going to have a peak of storage requirements sometime in the middle of this production. So, artificial intelligence we can call it, or just technology orchestration could help already today do that in a kind of dynamic way.

“Around collating metadata based on the transcription of the speech, the sentiment, the face recognition, objects within the image, that needs a bit of work. But, machines are learning fast and one of the abilities that the video index for Microsoft has is, if there is a face that it doesn’t recognise, you can actually tell it, this is Al Pacino and next time it will recognise Al Pacino. So, I do think within – and in fact, we are talking already to a number of customers to start deploying this at least as a proof of concept. We think within the next 6-8 months, it’ll be useful to some extent. And the uses it will have is mostly for productions with a lot of content like reality TV or talent shows, baking, chef shows with a lot of footage that need a lot of speech translations so that someone can easily locate a specific quote. That probably would be one of the earlier applications and saving quite a lot of time from somebody who’s got to actually watch the footage and log throughout metadata and tracks.”

C+T: Wouldn’t an AI system need ongoing instruction from a human?

“There may be a future within 10 years from now where the machine learning is so good that it can recognise most faces by itself, but there will always be a need to – at least today and for the next five years – where we still have to teach the cognitive services to recognise specific faces or sentiments.

“Sentiment is somewhere where I see that there’s still an awful lot of learning to do. Sentiment is a combination of the tone of the voice, the words that are being used and the face. Sometimes it may say that the sentiment is positive, but actually the face is quite serious or sad, but it could be about the actual words or the tone that is being used. I think there’s still a bit of a gateway that at some point a human will have to, perhaps, review some of that content to validate its usability.”


Subscribe to Content+Technology magazine here