Making sense of artificial intelligence in audio-visual applications
June 12, 2024 - AI — artificial intelligence — is a hot topic. Media coverage of its possibilities has ranged from positive (“It can act as a wonderful virtual assistant.”) to negative (“Kids are using it to do their homework for them.”) to downright terrifying (“After the robots take our jobs, they’ll kill us all.”).
While there’s great potential for both good and bad inherent in the aspects of the technology, it’s best to remember that AI is a tool — a tool that can be put to very good use.
And that’s especially true in the audio-visual systems we use in the modern hybrid workplace.
For our part, Crestron’s AI solutions feature a line of 1 Beyond intelligent cameras and the Crestron Automate VX voice-activated speaker tracking solution. These products deliver an outstanding videoconferencing experience through their application of what’s called “Visual AI,” and they work brilliantly with platforms such as Microsoft Teams® Rooms and Zoom Rooms® software, leveraging all the AI solutions they bring to the table.
What’s all that mean, exactly? Let’s break it down by answering the three most common questions we hear:
What is “Visual AI,” and is it different from intelligent video?
You may have seen the terms “intelligent video” and “Visual AI” used interchangeably. A more accurate way to frame the two concepts: Visual AI enables the experiences we call intelligent video. The result is a system that can automatically track and frame a presenter in a room based on facial and motion detection — which is incredibly important when a meeting includes remote participants. You want those virtual attendees to see the gestures and expressions of those in the room where the meeting is based. Remote workers remain much more engaged when they can receive all those non-verbal cues.
Crestron’s Rony Sebok, director of product management, intelligent video, explained the power of this technology in an article for the online publication AI for All:
Visual AI can be used to create a variety of experiences, including “group framing” (adjusting the frame to show all participants), “auto-framing” (adjusting the frame as one person speaks), and “presenter tracking” (following a moving presenter around a space). It can further automatically switch between active talkers in the room (“speaker tracking”), provide a composite of more than one view of the room into a single video feed, and more.
Just like other examples of AI, Visual AI is getting better. “AI has been built into unified communications for a while now, but even more effective ‘robot director in a box’ solutions are being developed,” says Crestron’s Senior Director of Product Marketing Sam Kennedy. AI is being applied to audio solutions, too, gaining the ability to block extraneous noise and even identify people by their voices.
Soon, AI will help these systems “read the room” — in other words, gather a lot more info on the space. “These programs are learning to see if a room has a whiteboard and how the system’s cameras need to adjust to make that board visible for everyone joining remotely,” says Kennedy. “Soon, AI will notice if that board — or even the room itself — needs to be cleaned up for the next meeting.”
These systems will soon be able to gather more environmental info, says Kennedy: “Do the shades need to come down for a presentation? Does the room need to be cooled better when the system senses that the space is full of people?” Ultimately, AI impacts both the remote and the in-room experience.
What do I need hardware-wise?
There are several options. The most basic solutions are often found in video bars — some of which are outfitted with multiple cameras that can cut between speakers. Larger systems — those built for your most impactful meeting spaces — can be driven by cameras with intelligent video capability or combined with a speaker-tracking solution that keys on signals from microphones to follow a presenter or a conversation.
Crestron offers all these options, including our 1 Beyond intelligent PTZ cameras with optical zoom designed to capture every participant in the room — even those up to 60 feet away from the lens. Optical zoom occurs within the physical lens of a camera, while digital zoom enlarges and crops an image in close-up. Digital zoom reduces the pixel density of an image, decreasing its clarity as distances increase, thereby reducing the camera's ability to pick up those critical nonverbal cues.
Another option is the Crestron Automate VX voice-activated speaker tracking solution. This system is best for larger spaces, as you can configure up to 12 cameras to handle high-impact rooms.
The goal is to achieve smooth, Visual AI tracking and framing that delivers clear close-ups and multiple angles, creating a superior, broadcast-quality video image for remote participants. The Automate VX solution auto-frames the speaker, centering them in the frame even if they move from a position where a microphone has been sending location data. Participants can move around freely without worrying about “staying in the frame.”
The Automate VX solution also has a “reframing” function that centers people in the shot. AI plays an important function here, as it can discern between large and small movements. “If someone shifts slightly in their seat, for example, the AI doesn’t read that as a need to reframe the shot,” says Kennedy. Reducing all those unnecessary camera movements keeps people from getting queasy from the constant motion.”
What do I need to be concerned about with these systems?
Short answer: Privacy and security, and they’re both moving targets.
On the privacy front, Visual AI doesn’t start to raise red flags until it begins to recognize individual people. Those functions branch into other aspects beyond visually tracking people: When transcripts and summaries are generated, questions arise. For example, if an AI program identifies your face, is that a violation of privacy? What about the ethics of a program reporting on the “mood” of a meeting? Does AI “get” sarcasm — and can it tell the difference between a joke and a comment that’s meant to be truly negative?
...