I was chatting with my daughter about the challenges of doing hybrid Zoom or Teams meetings. She was not allowed to go to school for a few days and had to follow lessons online, with the teacher and most students in the class. And I was still stuck in my attic, organizing my own university teaching and meetings remotely. Recently I went to work a few times for meetings, but only a few people came to work in person, and most attended online through Zoom. This is similar to the current school situation for my daughter, where most kids attend in person but some attend online on Teams. I expect that we will have these hybrid online/in-person meetings for quite some time to come; perhaps they might even become the new “normal”.
The challenge with hybrid in-person and online meetings is mainly in the physical room where multiple people are attending in person. The online attendees simply connect to the online meeting the same way as if it were a 100% online meeting. The people present in real life also have their laptops in front of them with the webcam on, but with the speakers and microphone muted. This allows online attendees to see everyone, also those people in the physical room. Only one person in the physical room unmutes the speakers and microphone. This allows the noise- and feedback-suppression of the video conferencing system to do its work and not to amplify the voice of the local attendees through the speakers. If you would have multiple laptops with the speakers and microphones on, you will hear echo’s, and the sound will start feeding back, creating lots of noise.
Amplifying the audio from the online attendees to the people in the physical room is easy, e.g., using some external speakers connected to the laptop. The problem however is with picking up the voice from the attendees in the physical room. In smaller meeting rooms at the university we use table microphones, like the USB Samson UB1 or the analog Philips LFH 9172 which can be daisy-chained. We also have one room with a Polycom video conferencing setup, and are experimenting with microphone arrays for the larger meeting rooms. However, these microphone systems are still quite expensive, not so portable due to the required cabling, and they work best when placed in the middle of a round table with an equal distance to all speakers. I.e., these systems are OK for traditional conference rooms, but not for classrooms or more ad-hoc meeting setups with multiple people in complex spatial arrangements, or when people have to keep a distance from each other.
What if we could give everyone in the room a wireless clip-on lapel microphone? Companies like Sennheiser have stage performance wireless microphone systems, but these don’t scale well to a large number of in-person attendees in a classroom or meeting, at least not financially: imagine equipping all kids in the classroom with a €300 microphone.
Pondering about this problem, I identified some requirements:
- it has scale to a classroom with 20 or 30 attendees
- it has to be cheap per microphone, rather in the range of €10 than €100
- it has to be simple to use, as there is no sound technician to control a mixing console
- it has to integrate with online meeting software as if it were a regular micophone
- it has to be portable, so that I can take it to any class or meeting room
- it has to be DIY and easy to build with already available components
I came up with a design that consists of rechargeable clip-on microphones that transmit audio wirelessly to a base station. The base or docking station is connected to the central laptop/computer as if it is a single external microphone. Bluetooth lapel microphones exist, but Bluetooth does not allow connecting a lot of microphones to the same computer. Proprietary radio systems such as used by audio companies like Sennheiser are not DIY friendly. There are easy to use RF modules, but those are more suited for IoT applications and not streaming audio. This actually sounds like an ideal application for a 5G device-to-device network, but components for those are not easily available yet.
I think wifi would have enough bandwidth and would be able to support a large number of clip-on microphones: a dedicated wifi access point has no problems dealing with 50 to 100 connected clients. It can be a dedicated/closed network since there is no reason to have the microphones connected to the internet, except perhaps to receive software updates. Also, other devices such as laptops don’t have to connect to this wifi network, except when a web interface is considered for configuration and audio mixing (see below).
For the clip-on microphones, I am considering using an ESP32 connected to an I2S MEMS microphone and a small (e.g 500 mAh) LiPo battery. These can be housed in a custom 3D printed case with a clip to attach it to the clothing, and a hiddeon connector at the bottom for charging in the docking bay. The ESP32 needs firmware that sets up and maintains the wifi connection, processes the I2S audio, does threshold detection and, when loud enough, transmits the audio over wifi.
For the base or docking station and wifi access point, I am considering a Raspberry Pi Zero W combined with a HifiBerry DAC+ Zero. The line-level output of the HifiBerry would be provided on a standard 3.5 mm female jack, such that a standard TRS or TRRS jack-to-jack cable can be used to connect the base station output to the laptop microphone input. The base station requires software that receives the (UDP?) wifi input streams of all ESP32 modules, normalizes them, and mixes them into a single audio output.
Some additional features I was thinking of for the base station are a volume indicator, e.g., a Neopixel that turns green-orange-red). Furthermore there could be a mute button for every microphone, a solo button (muting all but the one that has been selected), and knobs to adjust the volume level for each channel. These could be implemented using physical buttons placed next to the charging bays in the dock, but also through a web interface.
Usability in a classroom or meeting room by people that have no technical understanding of the system is also crucial. If the base station would have physical mute buttons and/or volume knobs for each of the channels, the clip-on microphone modules must be clearly labeled/numbered. Possibly they could all be 3D printed in a different color and the corresponding charging bay (with the knob/button next to it) in the base station would then have the same color. The individual microphones don’t have to be recognizable if audio mixing or muting is not needed.
Considering that this system might be used at the same time in multiple neighboring classrooms, the wifi signal amplitude should be strong enough to have within classroom reception, but as weak as possible to not interfere between classrooms and with the regular internet wifi.
I think that the base station could be made for about 50-100 euro (hardware costs only, and mainly depending on whether it has buttons and knobs for each channel) and that each clip-on microphone can be made for about 10-15 Euro. For a system comprised of 30 clip-on microphones to accommodate a complete classroom that would amount to €350-550. For a smaller meeting room system with 8 clip-on microphones, it would be around €150.
There is quite some development and testing needed for this. For prototyping I have ordered an Adafruit HUZZAH32 and a I2S microphone breakout board from a local and fast (and also more expensive) supplier and some comparable but cheaper components from Aliexpress. Let’s start with a single microphone, similar to this or this baby monitor. If I can get that to work with a Raspberry Pi, the next step would be to check how well that scales to a larger number of ESP32 microphones.