Artwork

Content provided by Matt Mullenweg. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Matt Mullenweg or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://player.fm/legal.
Player FM - Podcast App
Go offline with the Player FM app!

Episode 25: Davit Baghdasaryan on the Science of Sound in a Distributed Work World

42:53
 
Share
 

Manage episode 274531185 series 2508276
Content provided by Matt Mullenweg. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Matt Mullenweg or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://player.fm/legal.
Subscribe to Distributed at Pocket Casts, Apple Podcasts, Spotify, RSS, or wherever you like to listen.

Trying to sound your best as you work away from an office more than ever before?

As audio and video conferencing surge worldwide, Matt talks about the science of sound with Davit Baghdasaryan, the CEO of Krisp, a fast-growing company offering an AI-powered noise cancellation app for removing background noise on any conferencing platform. Krisp’s technology, including its proprietary deep neural network krispNet DNN, processes audio securely on the user’s computer.

Find out how Krisp started, why Davit foresees his company returning to a hybrid work model, and what it means to Work from Forest.

With employees in the United States and Armenia that shifted to working from home in 2020, Krisp surged this challenging year, announcing a $5M Series A round in August and growing to 600 Enterprise customers despite continuing to focus on consumer users. Check out this demo of how Krisp works in meeting room.)

A native of Armenia, Davit spends time in both countries leading Krisp. Prior to co-founding Krisp, Davit was a Security Product Lead at Twilio in San Francisco, among other security-focused technology leadership roles.

The full episode transcript is below.

***

(Intro Music)

MATT MULLENWEG: Howdy everybody. Today we are going to talk to the Co-founder and CEO of a company whose technology makes it easier for those of us working from home to hear each other, even with all of life’s noisy distractions going on in the background behind us.

At Automattic we say, “Communication is oxygen.” We are advocates of anything that makes communication easier and more effective. And one of the tools I find myself recommending over and over again is Krisp, which is an app that uses machine learning to mute background noise in just about any communication apps you use.

For Krisp’s Davit Baghdasaryan, there is even more to the story. He is leading a young and fast-growing company through the challenges and opportunities of this year, balancing his own company’s transition to a remote workforce and a surge in demand for Krisp. He is a native of Armenia and also a global citizen and experienced technology leader at great companies Twilio and he has made his own adjustments to working and leading from a distributed point of view. So today we are going to chat. And thank you so much for being here.

DAVIT BAGHDASARYAN: Thank you, Matt. Thanks for the intro. Hi, everyone. I’m Davit, CEO and Cofounder at Krisp, as Matt mentioned. I’m so happy to join this podcast.

MATT: Was there any key biographical detail that was missed that you’d love to share, things that people usually don’t know?

DAVIT: Absolutely. I think that was a great introduction. I was born in Armenia, I’ve lived in ten years in the U.S. Right now I’m back in Armenia. I’m sure we will go deeper on my background and biography. I’m happy to share as much as needed.

MATT: In 2018, when you started Krisp, what was the thing that you were seeing? Because people weren’t on calls or Zoom nearly as much back then. What was the need you were seeing?

DAVIT: Yes, absolutely. Well the story behind Krisp is very personal. I was actually working at Twilio, which is a big communication platform, and actually at Squadcast. I just figured that Squadcast is powered by Twilio. But because my family and my friends were in Armenia, I was traveling a lot to Armenia at every chance, I guess.

And because of the time difference, almost 12 hours of difference, when I was connecting to meetings, it was evening time here. And in the evenings you want to go out with friends and family but that was the time that I needed to join meetings, like my daily meetings. I was heading the Product Security at Twilio so that means I have many meetings with different teams. And I always wished there was a button I clicked and get some privacy, like people don’t know where I am, [laughs] they don’t know that I’m joining from bars. Not necessarily bars of course, but still.

MATT: So almost like a virtual Zoom background but for audio.

DAVIT: Exactly. So I had the need but I had no idea how to build the technology. And I knew it must be done with machine learning. I knew about voice but not machine learning. But I mean that’s where I met my cofounder and that’s how things have started.

MATT: I think I first came across Krisp actually on the NVIDIA machine learning blog. It was very early on, it felt like the company was.. I think it was all still free at that time.

DAVIT: Yes. Well actually Krisp wasn’t released at that time yet, or maybe just launched. And then that blog post was very important for us. We worked on it for a very long time and that was the first exposure that our company received. And the blog post got actually a lot of visibility. So it was at some point I believe the most shared and visited blog post on NVIDIA developer AI section. So yeah, it brought us a lot of visibility.

MATT: I actually made a mistake early on when I was advocating for Krisp. I told people it was from NVIDIA, or spun out of NVIDIA, I was so.. Because the post had seemed so great I couldn’t imagine that it was a guest post.

DAVIT: Yeah. Well there is a fun story actually behind that. When we did that post and it was successful, we thought that we needed to put that post on Hacker News. And we put a title which sort of implied that it was from NVIDIA so that people open it more. It was a small hack from us and it worked out because Krisp, that blog post was in the top five of Hacker News that day. Yeah, exciting times.

MATT: That might’ve been where I saw it too. [laughter] I don’t recall exactly but that would certainly be plausible. So I imagine you’re able to kind of turn Krisp on and off on your set up right now. Can you demonstrate how it works?

DAVIT: Yes, absolutely. So Krisp is on right now. I’m going to clap. I’m clapping right now. And when I do this with video it’s much more impressive. And now I’m going to go, it’s a single button, when I turn it off and then I clap [clapping] you hear the clap. Right?

MATT: Yeah.

DAVIT: Yeah, that’s the easiest way to demonstrate it. But Krisp is.. with Covid and with everything that happened lately, people moving to home, Krisp was very handy with kids at home, with dogs barking at home. So it does a great job at removing noise. And I’m happy to actually dig more into how that works and where Krisp is going.

MATT: It reminds me of the Zen Koan, what’s the sound of one hand clapping. I guess it’s like Krisp. [laughter] Oh, one reason I have been advocating for it a lot is that for a good meeting you don’t need video, you could turn video off its not working, we’re not using any video now obviously, but if audio doesn’t work, the meeting stops. A meeting with video.. unless I guess you’re really good at American Sign Language or something, you really do need great audio.

And I find it so distracting when folks have just a ton going on in the background. But I also feel for them because we are all home, we have kids working from home, all sorts of things. What sort of Covid boost have you all seen?

DAVIT: Yes, absolutely. Well voice is, we believe that voice is going to continue being a key means of communication and it’s going to grow, actually, way bigger than it is now. With Covid we saw a very large boost in increased downloads and usage. I believe it’s now like.. It’s been 7X growth for Krisp.

MATT: Wow.

DAVIT: Because – yeah – there was no technology like this in the world. And when we were just starting, people didn’t really.. Every person that was seeing the download, they could relate so much to the app, to the problem. But they didn’t really know that the problem existed because we are so used to what we have. So it took us a while to market this. And early on, we were having a lot of struggles to explain that there is actually a pain here.

But with Covid things have changed because all of a sudden this has become a big problem because everyone is home and their kids are crying and there is just a lot of noise coming from the kitchen and everything. So yeah, people have gradually started spreading the word and most of the growth has been done by word of mouth. So yeah, from a business perspective there was a lot of growth during this time.

MATT: Let’s dive a little bit into how Krisp works. It uses machine learning and what sort of a learning technique does it use?

DAVIT: Let me do a short intro into noise cancellation in general, the state of the art before machine learning. People usually use multiple microphones to try to remove or cancel noise. Our phones have multiple microphones on them. One of the microphones is close to your mouth, the way you hold the phone, and the other microphones are very far from that microphone, from your mouth.

MATT: Like there’s one on the back of most phones, right?

DAVIT: Which ones?

MATT: There is usually a microphone on the back, like where the camera is.

DAVIT: Yes, exactly, exactly. It must be as far as possible so that you can.. by subtracting the two audios from each other, I’m just simplifying it, you can isolate the human voice. And this technology is deployed on every phone out there, I guess, like more or less expensive phone. And that technology also exists on our laptops but it just doesn’t work because your mouth, the person is very far from the laptop.

So it has two problems. One problem is that it requires multipole microphones, so it requires specific hardware. And the second is it has limitations on how much noise it can remove. Usually it’s great with removing stationary noise, like static noise, but when the noise comes and goes, like clapping, barking, it’s just not possible to adopt to these sort of noises.

And then in the last five years, as machine learning has started to grow, people have started, like in academia they started machine learning for noise cancelation. And we were very early on in this problem. So when I met my to be co-founder and we started talking about this, we knew that we needed to solve this with machine learning just by intuition, right? And we started looking at this, what’s out there.

As a technology company, we were the first to actually design and implement such technology which purely uses machine learning for this problem. So the way it works is we have a very large data set of background noises, which we had to find from somewhere. It was tough to do that. [laughter] But we were clever I think with that.

We tried some interesting.. we found the right sources for that. And these are very different types of noises, like 10,000+ type of noises. And then we also have collected a lot of clean studio recordings where there is no noise at all, so we have a lot of such data. And when we mix them together with different sound to noise ratios, we get pretty much an infinite data set of noisy speech for which we have the clean speech because we used that data set.

And then what we do, we have designed this special neural net for which during the training we say well this is a noisy space, this is a clean space, noisy space, clean space, noisy space, clean and we do it for all these artificially generated noises page. And then it starts to learn what is human speech, what’s clean speech, what’s noise. And then doing the inference, like when you start using it, even if it sees noise types that it never saw before, it is able to recognize them and separate them from each other.

So this is a very simplified explanation of how it works. Obviously there is a lot of IP. Audio is very difficult, it turned out. If we knew what.. I mean we were not audio experts. Our team is very strong at math but we didn’t have any experience in audio. And I would say, I always say, if we knew how difficult audio is we would be just scared of it and we wouldn’t start this. And yeah, we were lucky that we didn’t know that because many teams who have prior audio experience, they are still struggling with this.

MATT: What do you think it was about not knowing audio that allowed you to take a different approach or succeed where others haven’t been able to yet?

DAVIT: This is sort of the classical approach to the audio problems, like to digital signal processing problems. Like, DSP, digital signal processing, the theory and like everything is there for three years or three or four years, it has been out there. And if you are a DSP engineer or audio engineer, building microphones and speakers, you are trained to think from those constrained perspectives, from this classical theory perspective, from this classical algorithm perspective. If you need to solve something, that’s where your brain goes by default.

And from our team perspective, like when we started the company we were seven people and six of them have PhDs in math and physics, I am the only one who doesn’t hold that. So we have a lot of experience in math. We understand the math required for dealing with audio and machine learning, but we didn’t know the existing theories. So that was easier for us to start doing new things, which was required because when you do.. with a machine learning approach, you don’t necessarily need to use a lot of the old stuff that has been developed for four years. And I think that was a key difference.

MATT: It sounds like you’re compressing the audio a bit and maybe doing some low pass filter?

DAVIT: We are not. What we do in Krisp is.. Well, even today, yes, Krisp is only working with wide band audio, up to wide band audio, which means like 16 kilohertz of sampling. Great. That’s great for.. Well, I am currently using a Bluetooth headset.

MATT: And Bluetooth compresses a ton, right?

DAVIT: Yes, exactly. So it does it by default. But even if I use a full band microphone, Krisp today would down sample to wide band before doing the processing. And the reason for that is we have spent a lot of time on optimizing our technology for CPUs. There is no such technology running on CPUs. People can (run those algorithms?) on GPUs easily but for CPU it’s very hard to squeeze that. So we have spent a lot of time on doing this. That’s one of the reasons why we decided to stick with wide band.

At the same time though we are in a week –

MATT: Stick with wide band as opposed to what?

DAVIT: To full band. So, down sample to wide band. In a week’s time frame we are going to.. I believe it’s in a week.. we are shipping a new version of Krisp that is going to support full band as well. That has been a very long effort for us to squeeze these neural nets to understand the higher frequencies of voice as well and then but at the same time be able to run on CPU.

MATT: Do you use a GPU if it’s available?

DAVIT: No, we only support CPUs.

MATT: Why is that?

DAVIT: Well, two reasons. We could support in video GPUs and they are very powerful, it’s very easy to run neural nets for instance on these GPUs. And when you do that, the CPU is off loaded, that’s great. But Krisp is used in enterprise by a lot of professional users who don’t have GPUs. So most of our population of users, they have just CPUs. And the GPUs they have, like the [00:17:22.06] GPUs I have on my Mac, is just not capable of running this neural net. It’s just too small for that. So we decided to spend this extra effort, a lot of effort actually to support everything out there rather than just focus on one type of hardware deployment.

MATT: And to also be clear when you’re talking about squeezing the neural nets, Krisp all runs locally on your computer, right?

DAVIT: Yes, absolutely.

MATT: Which is awesome so there’s not the latency of going to the network and the audio data is not being sent anywhere else, it’s all happening locally. What does it mean to squeeze it down? Are you worried about download size or the runtime or how much GPU it’s going to use…?

DAVIT: Privacy has been very important for us and we are very, very happy that we were able to actually run this locally. We don’t think the audio should through a server, especially in this world, privacy is very important.

So to explain what it means, this quiz for the CPU, like as Krisp is a virtual microphone, it sits between the actual microphone and the app. In this case, it’s the browser app running Squadcast. So Krisp is between them. And it needs to run its neural net on every other frame in real time and without introducing too much latency, so that means that it keeps receiving these frames and it needs to not only not look too much forward in the audio so that it doesn’t introduce this artificial latency but also not spend too much CPU power so that it can keep up with our speech.

So that is very constraining from an engineering perspective. And that means that you need to squeeze and make your neural net smaller and more efficient and use, I don’t know, the right library that fits best for this kind of mathematical problem so that it runs properly on the target CPU. Does that make sense?

MATT: Yes. And it’s only about 70 megabytes now. When it has this new full band neural network will the download get larger?

DAVIT: Yes, it will get a bit larger. Even in the 70 megabytes we have multiple neural nets. So we have neural nets that work for like the eight kilohertz sampling grade, we have neural net that supports larger. And then you know that with Krisp, Krisp works directionally, so if I have Krisp, I can remove the noise coming from you and that has a different neural net.

So there’s a lot of engineering actually in this simple app. I believe we have like there or four models, like neural net model shipped today and with this new version we are going to have, like, six or seven models shipped. So a lot of… yeah.

MATT: Wow, that’s actually a fun feature a lot of people don’t know about Krisp is that if someone is annoying you with bad audio, you can actually filter them as well so it sounds good and you would never even know that they have a dog barking in the background or something.

DAVIT: Yes.

MATT: Was that in the original version or did that come up when you were talking to people who weren’t using Krisp yet?

DAVIT: No, that was in the original version. And in the very original version, one of the challenges we had, we didn’t know how to structure Krisp. We didn’t know whether they will be using more of this inbound noise cancellation or outbound. Like, what is more important for people? And that was such an interesting question. Like, do you worry more about your noise or do you.. are you willing to pay for canceling your noise or are you just.. you don’t care about that and you are willing to pay for other’s noise.

So when we shipped in the very original version, inbound was entirely free and then the outbound was a pro feature. [laughter] But then we changed. Now it’s a freemium product. Krisp comes with two hours free every week and then if you go to pro it becomes unlimited. And the pro is going to get some more very cool things very soon.

MATT: It is such a good deal. What is the latest pricing on it?

DAVIT: Right now the pricing is $40 a year. That’s going to change soon because this was a Covid pricing. When Covid started we started a program with which all the students (in universities work?), universities, garment workers, hospital workers, would get Krisp for free for six months. And we also dropped the price by 20 percent, 30 percent. Actually we went with that for like seven months now and we are bringing back the price, it’s going to be $60 per year.

MATT: Even at $60, when you look at how much money I’ve had to spend to make the room quieter.. you’re basically getting a full studio and a really great microphone and everything.

DAVIT: Yeah, absolutely. We plan to keep that price but we are going to add some very cool things in the near future around virtual backgrounds and more even greater noise cancellation. So yeah, we are working hard on this.

MATT: Cool. I can’t wait. I will be a top customer. How much latency does that introduce right now?

DAVIT: On the algorithmic side the latency is between 20 and 30 milliseconds. On the app side, the application introduces an additional 20 to 30 milliseconds. So overall it’s around 60 milliseconds.

MATT: I did do a video where I posted.. I think I recorded just using QuickTime video and used Krisp to take out some background noise and people could tell that my mouth was just a little bit off. So if you had a way to also introduce the delay to the video so it synched up, I think that would be pretty nice.

DAVIT: I’m not sure that was Krisp. Usually with Krisp you wouldn’t notice the difference. Video is doing a lot of things which might contribute to that. Like, we are using Krisp everyday with video, obviously, with Zoom, we have never noticed that. There are a lot of reasons why you might have latency but I.. I mean, everything adds up, obviously, and this 50 to 60 milliseconds might contribute at the end if there is enough latency but that is just not enough to be noticeable.

MATT: Yeah, on Zoom I’ve never noticed it, it was only when I was recording this video. So you’re right, there might have been something else maybe in the HTMI conversion or something where it just felt a little bit out of synch.

DAVIT: Yes.

MATT: I actually didn’t notice it at all and then I started getting some comments about it and I was like, ohh I kind of see it. Kind of like when you use a sound bar with a TV, sometimes it can be just a little bit off.

DAVIT: Yeah, yeah, it’s interesting. We won’t see that, like our eyes usually are.. get adopted. But for example, when you use a virtual background in Zoom or Microsoft, you start noticing it. It’s there. You start seeing that. And even when you move there is latency, yeah.

MATT: What’s the latest going on with hardware? So for example, I know that old MacBooks had a terrible, terrible built in microphone and the latest 16 actually sounds pretty good. I think John Gruber on Daring Fireball posted an audio file just straight off the Mac 16 microphone and it sounds like they’re doing something that’s better. So what are they changing and what do you think about it?

DAVIT: You mean from an audio perspective?

MATT: Yes.

DAVIT: Oh, I’m not really sure. I think Mac is a high end, like usually using a high end microphone and speaker, although we are not.. I don’t think people are very happy with what that high end means. So yeah, I’m actually using it.. I mean, I don’t do podcasts, obviously, that’s a more important question when you do podcasts, but yeah, for everyday communication what they have works great. But I’m not really following what’s happening on the platform.

MATT: Check it out, I’m kind of curious. And I’m also curious more broadly how much do you think some of this gets built in by the phone makers, the laptop makers versus [00:26:52.07] software?

DAVIT: Yeah, I have no doubt that this kind of technology is going to be there in every device in the next three to four years. Typically phone venders, they don’t like to make changes like these kind of changes, they are a bit slow on that. Because as you can imagine, they already know how to build these multi-microphone systems on the phone and they have everything, like the lines set up for that, they know what the yield is and everything. So any change there is going to take time.

But I have no doubt it’s going to happen. So but it also depends what is that that’s going to happen. Like noise cancellation, even today, Krisp is not perfect. So we are spending a lot of time on improving what’s out there and we.. In the next year, hopefully in the next six months, we are going to shift something inside Krisp that is going to be just revolutionary in terms of noise cancellation because it’s just going to take this to the next level. And I don’t think something like that is going to come to hardware soon enough. It’s going to take some more time.

But in terms of when these devices will have noise cancelation, I have no doubt that in the next two or three years everyone is going to have some version of noise.. like ML based, machine learning based, noise cancellation, no doubt.

MATT: And how do beam forming mics work? I know some of the new headphones, like the Bose and also the Facebook Portal and or the Alexa devices have these mics that seem to be able to pick you up from all the way across the room.

DAVIT: Yeah, the way they work, they have multiple microphones on the device. And when you turn on the device and you start talking to the device, it starts to calibrate and it starts to sort of.. given that these microphones are far from each other, they start to understand where the direction is coming from, where the voice direction is coming from, and they start to focus only on that direction. And again, like using the same technique that I explained, they are trying to ignore anything else that is not coming from that direction. That’s beam forming.

The problem with that is when you keep moving around, it needs to recalibrate again and again and I’m not sure that it’s.. the technology is able to fix this problem just by its own. It might be very useful for far field, like for Portal or Alexa, which is in a big room and they need to fix the noise problem, but I’m not sure how efficient it is. We are not dealing with this problem. Actually we are not dealing with far field as well, that is a very, very different problem. Although it might be similar but in audio every problem is so unique and we are not dealing with that.

MATT: That’s interesting. And one more technical question. You had mentioned full band and wide band, how should people think about their Bluetooth headset versus a USB headset versus other things and what type of audio is being captured by the computer that Krisp is receiving?

DAVIT: Yeah.. By the way, I am not an audio expert, I should.. [laughs] I should say that.

MATT: Oh, sorry.

DAVIT: We have a lot of audio experts in the company. But I know as much as I know. In terms of Krsip, Krisp doesn’t really matter where the audio is coming from and that’s one of the beauties of these machine learning based algorithms. You can even, what we have, you can even run it in the cloud because it really has no hardware dependency.

Let me give the example of inbound. So imagine I have Krisp here, you talk and I can cancel the noise coming from your audio. So when you talk there is so much transformation happening to your voice starting from the microphone, that the microphone has its own transformations, including noise cancellation and then the browser gets it and sends it over webRTC with all the codec and everything. And then it receives here on [00:32:09.24] and gives to Krisp and then Krisp runs its technology on it.

So pretty much it doesn’t matter where you run. You could easily run it in the cloud. So from that perspective it doesn’t matter whether it’s a USB microphone or a Bluetooth microphone or just a wired microphone, the (ordinary?) microphones. It doesn’t matter. Obviously we have to add the support for all of these but from an audio perspective it doesn’t matter.

Usually Bluetooth audio has more latency, it’s just there with the Bluetooth transport. You might notice that with Air Pods. If you have Air Pods usually the.. I don’t know why but something doesn’t work very well with them when it comes to latency. Sometimes it’s just too much latency with Air Pods and without Krisp.

MATT: Maybe because they have to connect to each other as well as.. as far as [00:33:17.27].

DAVIT: Yeah, I mean, the connection is one time. You connect and then there is a connection. But sometimes the latency adds up. I don’t know what they did wrong there but I hope they will fix it. But with USB, it’s usually more powerful, less convenient. But yeah, I mean, in general that’s the.. Krisp doesn’t care really where the audio is coming from.

There is one more thing, when you use a Bluetooth headset, if you just listen to music, it’s using its highest frequency, like it uses all the frequencies possible –

MATT: Yes, a higher codec, right?

DAVIT: Exactly, yes. But when you turn on, when you start a call, when you start using the microphone off your headset, it brings everything down to wide band typically. Like some –

MATT: Wide band is 16 kilohertz?

DAVIT: Yes, like 16 kilohertz, exactly. And the prior version of these Bluetooth headsets, they would bring your voice to eight kilohertz. You know, that’s not great. But –

MATT: And it’s kind of like using fewer colors to paint a picture, right?

DAVIT: Yeah, yeah, exactly.

MATT: So it doesn’t build sound as full or as natural.

DAVIT: Yeah, yeah. Every time we use a telephone, when you call a phone number, most of the world is still using eight kilohertz codec because they just.. to transmit less data. So that’s, we are used to that. Bu t when you hear full band and then narrow band, which is eight kilohertz, you will see the difference. It’s a big difference.

But so if you are using a Bluetooth headset, there is a big, big chance that it will down sample it to wide band in the calls. And this is because they need to use.. from an energy perspective, from a processing power perspective, they need to keep it efficient.

MATT: It’s actually amazing. I recommend folks.. If you have an iPhone, try calling a friend who also has an iPhone using FaceTime audio versus just a normal phone call. It is astounding how much better you can hear them and understand them. It actually makes phone calls pleasant again.

DAVIT: Yeah, yeah, yeah. Obviously it’s using VoIP and like the best codecs out there. I mean telephony still wins I think because of the network, like the service providers, like the AT&Ts of the world, they dedicate the bandwidth to voice channels, like telephony voice channel, and then the rest is used by everyone else, like the data channels. So your VoIP is going to impact it if the signal is not strong enough but the telephony voice will still be there.

So I guess that’s more important than the higher frequencies of the voice because you can at least hear each other. But that’s going to fix, it will be fixed with 5G. So once 5G is deployed, I’m sure those problems will go away. And everyone will switch to VoIP.

MATT: Cool. Actually I think even now they use voice over LTE by default for a lot of.. by default on the new iPhones.

DAVIT: Do they? I don’t know if that technology is even live. I don’t know if there has been any VoLTE deployment. Maybe I’m wrong but I thought…

MATT: It definitely is on like AT&T and Verizon here in the U.S. but probably not internationally.

DAVIT: Oh, okay.

MATT: It does sound a lot better if both sides are on it. But well I feel like cell phone calls drop so much anyway…

DAVIT: Yes.

MATT: It’s funny, when I was a kid, I remember spending lots of time, hours on the phone. And then your parents would pick up the phone and you’re like, Mom, I’m on the phone. But now I feel like people don’t do phone calls as much anymore partially because the quality is so bad, it’s very frustrating.

DAVIT: Yes, I think it depends on which part of the world you are in. As I know, like in Japan and South Korea, like, I guess nobody is using telephony anymore. Everyone is on VoIP. And it also depends on the network connection. That’s why like if 5G is there, if there is enough bandwidth, why would people use the telephony.

There is one use case though that I really believe is going to still thrive is phone numbers. Phone numbers are such a cool concept. We don’t appreciate them that much but they are the most deployed, known, understood, like handles that we have. And everyone can reach out to you, although it’s spam of course with the promo, spam. But I think that technology is not going to go away. I thought about that a very long time. And I think it’s going to stay around. I think there’s a lot of things that you can build on top of phone numbers and it’s going to thrive.

MATT: I can tell some of your Twilio days coming through.

DAVIT: I know, I know it’s definitely coming from there. [laughs] Yes.

MATT: I really love that we’re able to do a deep dive into Krisp. I can’t wait to see what you’re launching in a week or two. I’m looking forward to the update. I’m going to do a ton of audio tests and record things with different mics and try it all out. So thank you so much.

You are also running a company. And I know that you all were mostly in person I think in Armenia before. How have you adapted and how has it been and what are you planning to do once we can be safely in offices again?

DAVIT: So our company is distributed between the U.S. and Armenia and we have a team member in Germany as well. So we are distributed. We have a big presence in Armenia. I spend a lot of time in Armenia and my co-founder as well. So before Covid, we were spending a lot of time in the Armenia office, although everyone else was remote. And after Covid obviously we are working remotely.

And right after I think Covid started, I guess two weeks in, we decided that Krisp must become a remote-first company. And it’s not because of Covid, we always had that idea because it just makes so much sense especially for us because we are focused on building a tool that helps remote folks. So we always had that idea but Covid catalyzed that.

And yeah so I think at the end we are going to become a hybrid company because we have the office and a lot of people actually enjoy being in the office a couple days a week. They might not have the right set up in the house, they might have kids. And so just like I personally like to come sometimes to the office because that’s how my brain works, it needs that environment.

But at the end of the day, 80-85 percent of our workforce today just doesn’t come to the office because they enjoy working from home. And actually one fun fact – we also have a program called Work From Forest. So every week [laughing] we have 10-15 people taken to some nice place outside, you know, countryside, and they have internet, power, and they do hikes. And they actually work as well and it’s very efficient and productive work happening.

MATT: That’s so cool.

DAVIT: Yes, I’m in love with this program. So yeah, so like –

MATT: So this is happening right now? You just are able to do it in a distanced fashion, Work From Forest?

DAVIT: Yeah, yeah, yeah.

MATT: Well that’s a new one for us, I haven’t heard that one before. I think it’s a great idea.

DAVIT: Yeah, I highly recommend that. It’s quite popular now in our office. And I think it’s going to evolve. One problem with that is the weather, if the weather is good or not. But I think we will find a solution for that as well. It’s a great way for people to still gather together and see.. when you see each other in person, and that’s a very important part of building a culture and relationships, but they don’t have to come and sit in the office for that.

And yeah, we are determined to continue growing as a remote force, global company. So we are very excited about that.

MATT: Awesome. Well I can’t wait to see what’s next. Davit, thank you so much for coming on. If you’re listening, please get Krisp.ai as soon as possible so your calls sound better. If you get in soon, you can get their $40 pricing but it’s a good deal even at $60. And yeah, as you go distributed, make sure to check out the other episodes of the Distributed Podcast, there might be some good tips for you or your managers there. So thank you again for coming on.

DAVIT: Thanks a lot, Matt.

MATT: All right, you have been listening to Distributed with Matt Mullenweg. Please subscribe or tell your friends or rate it and we’ll keep doing this. See you next time. Bye-bye.

  continue reading

38 episodes

Artwork
iconShare
 
Manage episode 274531185 series 2508276
Content provided by Matt Mullenweg. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Matt Mullenweg or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://player.fm/legal.
Subscribe to Distributed at Pocket Casts, Apple Podcasts, Spotify, RSS, or wherever you like to listen.

Trying to sound your best as you work away from an office more than ever before?

As audio and video conferencing surge worldwide, Matt talks about the science of sound with Davit Baghdasaryan, the CEO of Krisp, a fast-growing company offering an AI-powered noise cancellation app for removing background noise on any conferencing platform. Krisp’s technology, including its proprietary deep neural network krispNet DNN, processes audio securely on the user’s computer.

Find out how Krisp started, why Davit foresees his company returning to a hybrid work model, and what it means to Work from Forest.

With employees in the United States and Armenia that shifted to working from home in 2020, Krisp surged this challenging year, announcing a $5M Series A round in August and growing to 600 Enterprise customers despite continuing to focus on consumer users. Check out this demo of how Krisp works in meeting room.)

A native of Armenia, Davit spends time in both countries leading Krisp. Prior to co-founding Krisp, Davit was a Security Product Lead at Twilio in San Francisco, among other security-focused technology leadership roles.

The full episode transcript is below.

***

(Intro Music)

MATT MULLENWEG: Howdy everybody. Today we are going to talk to the Co-founder and CEO of a company whose technology makes it easier for those of us working from home to hear each other, even with all of life’s noisy distractions going on in the background behind us.

At Automattic we say, “Communication is oxygen.” We are advocates of anything that makes communication easier and more effective. And one of the tools I find myself recommending over and over again is Krisp, which is an app that uses machine learning to mute background noise in just about any communication apps you use.

For Krisp’s Davit Baghdasaryan, there is even more to the story. He is leading a young and fast-growing company through the challenges and opportunities of this year, balancing his own company’s transition to a remote workforce and a surge in demand for Krisp. He is a native of Armenia and also a global citizen and experienced technology leader at great companies Twilio and he has made his own adjustments to working and leading from a distributed point of view. So today we are going to chat. And thank you so much for being here.

DAVIT BAGHDASARYAN: Thank you, Matt. Thanks for the intro. Hi, everyone. I’m Davit, CEO and Cofounder at Krisp, as Matt mentioned. I’m so happy to join this podcast.

MATT: Was there any key biographical detail that was missed that you’d love to share, things that people usually don’t know?

DAVIT: Absolutely. I think that was a great introduction. I was born in Armenia, I’ve lived in ten years in the U.S. Right now I’m back in Armenia. I’m sure we will go deeper on my background and biography. I’m happy to share as much as needed.

MATT: In 2018, when you started Krisp, what was the thing that you were seeing? Because people weren’t on calls or Zoom nearly as much back then. What was the need you were seeing?

DAVIT: Yes, absolutely. Well the story behind Krisp is very personal. I was actually working at Twilio, which is a big communication platform, and actually at Squadcast. I just figured that Squadcast is powered by Twilio. But because my family and my friends were in Armenia, I was traveling a lot to Armenia at every chance, I guess.

And because of the time difference, almost 12 hours of difference, when I was connecting to meetings, it was evening time here. And in the evenings you want to go out with friends and family but that was the time that I needed to join meetings, like my daily meetings. I was heading the Product Security at Twilio so that means I have many meetings with different teams. And I always wished there was a button I clicked and get some privacy, like people don’t know where I am, [laughs] they don’t know that I’m joining from bars. Not necessarily bars of course, but still.

MATT: So almost like a virtual Zoom background but for audio.

DAVIT: Exactly. So I had the need but I had no idea how to build the technology. And I knew it must be done with machine learning. I knew about voice but not machine learning. But I mean that’s where I met my cofounder and that’s how things have started.

MATT: I think I first came across Krisp actually on the NVIDIA machine learning blog. It was very early on, it felt like the company was.. I think it was all still free at that time.

DAVIT: Yes. Well actually Krisp wasn’t released at that time yet, or maybe just launched. And then that blog post was very important for us. We worked on it for a very long time and that was the first exposure that our company received. And the blog post got actually a lot of visibility. So it was at some point I believe the most shared and visited blog post on NVIDIA developer AI section. So yeah, it brought us a lot of visibility.

MATT: I actually made a mistake early on when I was advocating for Krisp. I told people it was from NVIDIA, or spun out of NVIDIA, I was so.. Because the post had seemed so great I couldn’t imagine that it was a guest post.

DAVIT: Yeah. Well there is a fun story actually behind that. When we did that post and it was successful, we thought that we needed to put that post on Hacker News. And we put a title which sort of implied that it was from NVIDIA so that people open it more. It was a small hack from us and it worked out because Krisp, that blog post was in the top five of Hacker News that day. Yeah, exciting times.

MATT: That might’ve been where I saw it too. [laughter] I don’t recall exactly but that would certainly be plausible. So I imagine you’re able to kind of turn Krisp on and off on your set up right now. Can you demonstrate how it works?

DAVIT: Yes, absolutely. So Krisp is on right now. I’m going to clap. I’m clapping right now. And when I do this with video it’s much more impressive. And now I’m going to go, it’s a single button, when I turn it off and then I clap [clapping] you hear the clap. Right?

MATT: Yeah.

DAVIT: Yeah, that’s the easiest way to demonstrate it. But Krisp is.. with Covid and with everything that happened lately, people moving to home, Krisp was very handy with kids at home, with dogs barking at home. So it does a great job at removing noise. And I’m happy to actually dig more into how that works and where Krisp is going.

MATT: It reminds me of the Zen Koan, what’s the sound of one hand clapping. I guess it’s like Krisp. [laughter] Oh, one reason I have been advocating for it a lot is that for a good meeting you don’t need video, you could turn video off its not working, we’re not using any video now obviously, but if audio doesn’t work, the meeting stops. A meeting with video.. unless I guess you’re really good at American Sign Language or something, you really do need great audio.

And I find it so distracting when folks have just a ton going on in the background. But I also feel for them because we are all home, we have kids working from home, all sorts of things. What sort of Covid boost have you all seen?

DAVIT: Yes, absolutely. Well voice is, we believe that voice is going to continue being a key means of communication and it’s going to grow, actually, way bigger than it is now. With Covid we saw a very large boost in increased downloads and usage. I believe it’s now like.. It’s been 7X growth for Krisp.

MATT: Wow.

DAVIT: Because – yeah – there was no technology like this in the world. And when we were just starting, people didn’t really.. Every person that was seeing the download, they could relate so much to the app, to the problem. But they didn’t really know that the problem existed because we are so used to what we have. So it took us a while to market this. And early on, we were having a lot of struggles to explain that there is actually a pain here.

But with Covid things have changed because all of a sudden this has become a big problem because everyone is home and their kids are crying and there is just a lot of noise coming from the kitchen and everything. So yeah, people have gradually started spreading the word and most of the growth has been done by word of mouth. So yeah, from a business perspective there was a lot of growth during this time.

MATT: Let’s dive a little bit into how Krisp works. It uses machine learning and what sort of a learning technique does it use?

DAVIT: Let me do a short intro into noise cancellation in general, the state of the art before machine learning. People usually use multiple microphones to try to remove or cancel noise. Our phones have multiple microphones on them. One of the microphones is close to your mouth, the way you hold the phone, and the other microphones are very far from that microphone, from your mouth.

MATT: Like there’s one on the back of most phones, right?

DAVIT: Which ones?

MATT: There is usually a microphone on the back, like where the camera is.

DAVIT: Yes, exactly, exactly. It must be as far as possible so that you can.. by subtracting the two audios from each other, I’m just simplifying it, you can isolate the human voice. And this technology is deployed on every phone out there, I guess, like more or less expensive phone. And that technology also exists on our laptops but it just doesn’t work because your mouth, the person is very far from the laptop.

So it has two problems. One problem is that it requires multipole microphones, so it requires specific hardware. And the second is it has limitations on how much noise it can remove. Usually it’s great with removing stationary noise, like static noise, but when the noise comes and goes, like clapping, barking, it’s just not possible to adopt to these sort of noises.

And then in the last five years, as machine learning has started to grow, people have started, like in academia they started machine learning for noise cancelation. And we were very early on in this problem. So when I met my to be co-founder and we started talking about this, we knew that we needed to solve this with machine learning just by intuition, right? And we started looking at this, what’s out there.

As a technology company, we were the first to actually design and implement such technology which purely uses machine learning for this problem. So the way it works is we have a very large data set of background noises, which we had to find from somewhere. It was tough to do that. [laughter] But we were clever I think with that.

We tried some interesting.. we found the right sources for that. And these are very different types of noises, like 10,000+ type of noises. And then we also have collected a lot of clean studio recordings where there is no noise at all, so we have a lot of such data. And when we mix them together with different sound to noise ratios, we get pretty much an infinite data set of noisy speech for which we have the clean speech because we used that data set.

And then what we do, we have designed this special neural net for which during the training we say well this is a noisy space, this is a clean space, noisy space, clean space, noisy space, clean and we do it for all these artificially generated noises page. And then it starts to learn what is human speech, what’s clean speech, what’s noise. And then doing the inference, like when you start using it, even if it sees noise types that it never saw before, it is able to recognize them and separate them from each other.

So this is a very simplified explanation of how it works. Obviously there is a lot of IP. Audio is very difficult, it turned out. If we knew what.. I mean we were not audio experts. Our team is very strong at math but we didn’t have any experience in audio. And I would say, I always say, if we knew how difficult audio is we would be just scared of it and we wouldn’t start this. And yeah, we were lucky that we didn’t know that because many teams who have prior audio experience, they are still struggling with this.

MATT: What do you think it was about not knowing audio that allowed you to take a different approach or succeed where others haven’t been able to yet?

DAVIT: This is sort of the classical approach to the audio problems, like to digital signal processing problems. Like, DSP, digital signal processing, the theory and like everything is there for three years or three or four years, it has been out there. And if you are a DSP engineer or audio engineer, building microphones and speakers, you are trained to think from those constrained perspectives, from this classical theory perspective, from this classical algorithm perspective. If you need to solve something, that’s where your brain goes by default.

And from our team perspective, like when we started the company we were seven people and six of them have PhDs in math and physics, I am the only one who doesn’t hold that. So we have a lot of experience in math. We understand the math required for dealing with audio and machine learning, but we didn’t know the existing theories. So that was easier for us to start doing new things, which was required because when you do.. with a machine learning approach, you don’t necessarily need to use a lot of the old stuff that has been developed for four years. And I think that was a key difference.

MATT: It sounds like you’re compressing the audio a bit and maybe doing some low pass filter?

DAVIT: We are not. What we do in Krisp is.. Well, even today, yes, Krisp is only working with wide band audio, up to wide band audio, which means like 16 kilohertz of sampling. Great. That’s great for.. Well, I am currently using a Bluetooth headset.

MATT: And Bluetooth compresses a ton, right?

DAVIT: Yes, exactly. So it does it by default. But even if I use a full band microphone, Krisp today would down sample to wide band before doing the processing. And the reason for that is we have spent a lot of time on optimizing our technology for CPUs. There is no such technology running on CPUs. People can (run those algorithms?) on GPUs easily but for CPU it’s very hard to squeeze that. So we have spent a lot of time on doing this. That’s one of the reasons why we decided to stick with wide band.

At the same time though we are in a week –

MATT: Stick with wide band as opposed to what?

DAVIT: To full band. So, down sample to wide band. In a week’s time frame we are going to.. I believe it’s in a week.. we are shipping a new version of Krisp that is going to support full band as well. That has been a very long effort for us to squeeze these neural nets to understand the higher frequencies of voice as well and then but at the same time be able to run on CPU.

MATT: Do you use a GPU if it’s available?

DAVIT: No, we only support CPUs.

MATT: Why is that?

DAVIT: Well, two reasons. We could support in video GPUs and they are very powerful, it’s very easy to run neural nets for instance on these GPUs. And when you do that, the CPU is off loaded, that’s great. But Krisp is used in enterprise by a lot of professional users who don’t have GPUs. So most of our population of users, they have just CPUs. And the GPUs they have, like the [00:17:22.06] GPUs I have on my Mac, is just not capable of running this neural net. It’s just too small for that. So we decided to spend this extra effort, a lot of effort actually to support everything out there rather than just focus on one type of hardware deployment.

MATT: And to also be clear when you’re talking about squeezing the neural nets, Krisp all runs locally on your computer, right?

DAVIT: Yes, absolutely.

MATT: Which is awesome so there’s not the latency of going to the network and the audio data is not being sent anywhere else, it’s all happening locally. What does it mean to squeeze it down? Are you worried about download size or the runtime or how much GPU it’s going to use…?

DAVIT: Privacy has been very important for us and we are very, very happy that we were able to actually run this locally. We don’t think the audio should through a server, especially in this world, privacy is very important.

So to explain what it means, this quiz for the CPU, like as Krisp is a virtual microphone, it sits between the actual microphone and the app. In this case, it’s the browser app running Squadcast. So Krisp is between them. And it needs to run its neural net on every other frame in real time and without introducing too much latency, so that means that it keeps receiving these frames and it needs to not only not look too much forward in the audio so that it doesn’t introduce this artificial latency but also not spend too much CPU power so that it can keep up with our speech.

So that is very constraining from an engineering perspective. And that means that you need to squeeze and make your neural net smaller and more efficient and use, I don’t know, the right library that fits best for this kind of mathematical problem so that it runs properly on the target CPU. Does that make sense?

MATT: Yes. And it’s only about 70 megabytes now. When it has this new full band neural network will the download get larger?

DAVIT: Yes, it will get a bit larger. Even in the 70 megabytes we have multiple neural nets. So we have neural nets that work for like the eight kilohertz sampling grade, we have neural net that supports larger. And then you know that with Krisp, Krisp works directionally, so if I have Krisp, I can remove the noise coming from you and that has a different neural net.

So there’s a lot of engineering actually in this simple app. I believe we have like there or four models, like neural net model shipped today and with this new version we are going to have, like, six or seven models shipped. So a lot of… yeah.

MATT: Wow, that’s actually a fun feature a lot of people don’t know about Krisp is that if someone is annoying you with bad audio, you can actually filter them as well so it sounds good and you would never even know that they have a dog barking in the background or something.

DAVIT: Yes.

MATT: Was that in the original version or did that come up when you were talking to people who weren’t using Krisp yet?

DAVIT: No, that was in the original version. And in the very original version, one of the challenges we had, we didn’t know how to structure Krisp. We didn’t know whether they will be using more of this inbound noise cancellation or outbound. Like, what is more important for people? And that was such an interesting question. Like, do you worry more about your noise or do you.. are you willing to pay for canceling your noise or are you just.. you don’t care about that and you are willing to pay for other’s noise.

So when we shipped in the very original version, inbound was entirely free and then the outbound was a pro feature. [laughter] But then we changed. Now it’s a freemium product. Krisp comes with two hours free every week and then if you go to pro it becomes unlimited. And the pro is going to get some more very cool things very soon.

MATT: It is such a good deal. What is the latest pricing on it?

DAVIT: Right now the pricing is $40 a year. That’s going to change soon because this was a Covid pricing. When Covid started we started a program with which all the students (in universities work?), universities, garment workers, hospital workers, would get Krisp for free for six months. And we also dropped the price by 20 percent, 30 percent. Actually we went with that for like seven months now and we are bringing back the price, it’s going to be $60 per year.

MATT: Even at $60, when you look at how much money I’ve had to spend to make the room quieter.. you’re basically getting a full studio and a really great microphone and everything.

DAVIT: Yeah, absolutely. We plan to keep that price but we are going to add some very cool things in the near future around virtual backgrounds and more even greater noise cancellation. So yeah, we are working hard on this.

MATT: Cool. I can’t wait. I will be a top customer. How much latency does that introduce right now?

DAVIT: On the algorithmic side the latency is between 20 and 30 milliseconds. On the app side, the application introduces an additional 20 to 30 milliseconds. So overall it’s around 60 milliseconds.

MATT: I did do a video where I posted.. I think I recorded just using QuickTime video and used Krisp to take out some background noise and people could tell that my mouth was just a little bit off. So if you had a way to also introduce the delay to the video so it synched up, I think that would be pretty nice.

DAVIT: I’m not sure that was Krisp. Usually with Krisp you wouldn’t notice the difference. Video is doing a lot of things which might contribute to that. Like, we are using Krisp everyday with video, obviously, with Zoom, we have never noticed that. There are a lot of reasons why you might have latency but I.. I mean, everything adds up, obviously, and this 50 to 60 milliseconds might contribute at the end if there is enough latency but that is just not enough to be noticeable.

MATT: Yeah, on Zoom I’ve never noticed it, it was only when I was recording this video. So you’re right, there might have been something else maybe in the HTMI conversion or something where it just felt a little bit out of synch.

DAVIT: Yes.

MATT: I actually didn’t notice it at all and then I started getting some comments about it and I was like, ohh I kind of see it. Kind of like when you use a sound bar with a TV, sometimes it can be just a little bit off.

DAVIT: Yeah, yeah, it’s interesting. We won’t see that, like our eyes usually are.. get adopted. But for example, when you use a virtual background in Zoom or Microsoft, you start noticing it. It’s there. You start seeing that. And even when you move there is latency, yeah.

MATT: What’s the latest going on with hardware? So for example, I know that old MacBooks had a terrible, terrible built in microphone and the latest 16 actually sounds pretty good. I think John Gruber on Daring Fireball posted an audio file just straight off the Mac 16 microphone and it sounds like they’re doing something that’s better. So what are they changing and what do you think about it?

DAVIT: You mean from an audio perspective?

MATT: Yes.

DAVIT: Oh, I’m not really sure. I think Mac is a high end, like usually using a high end microphone and speaker, although we are not.. I don’t think people are very happy with what that high end means. So yeah, I’m actually using it.. I mean, I don’t do podcasts, obviously, that’s a more important question when you do podcasts, but yeah, for everyday communication what they have works great. But I’m not really following what’s happening on the platform.

MATT: Check it out, I’m kind of curious. And I’m also curious more broadly how much do you think some of this gets built in by the phone makers, the laptop makers versus [00:26:52.07] software?

DAVIT: Yeah, I have no doubt that this kind of technology is going to be there in every device in the next three to four years. Typically phone venders, they don’t like to make changes like these kind of changes, they are a bit slow on that. Because as you can imagine, they already know how to build these multi-microphone systems on the phone and they have everything, like the lines set up for that, they know what the yield is and everything. So any change there is going to take time.

But I have no doubt it’s going to happen. So but it also depends what is that that’s going to happen. Like noise cancellation, even today, Krisp is not perfect. So we are spending a lot of time on improving what’s out there and we.. In the next year, hopefully in the next six months, we are going to shift something inside Krisp that is going to be just revolutionary in terms of noise cancellation because it’s just going to take this to the next level. And I don’t think something like that is going to come to hardware soon enough. It’s going to take some more time.

But in terms of when these devices will have noise cancelation, I have no doubt that in the next two or three years everyone is going to have some version of noise.. like ML based, machine learning based, noise cancellation, no doubt.

MATT: And how do beam forming mics work? I know some of the new headphones, like the Bose and also the Facebook Portal and or the Alexa devices have these mics that seem to be able to pick you up from all the way across the room.

DAVIT: Yeah, the way they work, they have multiple microphones on the device. And when you turn on the device and you start talking to the device, it starts to calibrate and it starts to sort of.. given that these microphones are far from each other, they start to understand where the direction is coming from, where the voice direction is coming from, and they start to focus only on that direction. And again, like using the same technique that I explained, they are trying to ignore anything else that is not coming from that direction. That’s beam forming.

The problem with that is when you keep moving around, it needs to recalibrate again and again and I’m not sure that it’s.. the technology is able to fix this problem just by its own. It might be very useful for far field, like for Portal or Alexa, which is in a big room and they need to fix the noise problem, but I’m not sure how efficient it is. We are not dealing with this problem. Actually we are not dealing with far field as well, that is a very, very different problem. Although it might be similar but in audio every problem is so unique and we are not dealing with that.

MATT: That’s interesting. And one more technical question. You had mentioned full band and wide band, how should people think about their Bluetooth headset versus a USB headset versus other things and what type of audio is being captured by the computer that Krisp is receiving?

DAVIT: Yeah.. By the way, I am not an audio expert, I should.. [laughs] I should say that.

MATT: Oh, sorry.

DAVIT: We have a lot of audio experts in the company. But I know as much as I know. In terms of Krsip, Krisp doesn’t really matter where the audio is coming from and that’s one of the beauties of these machine learning based algorithms. You can even, what we have, you can even run it in the cloud because it really has no hardware dependency.

Let me give the example of inbound. So imagine I have Krisp here, you talk and I can cancel the noise coming from your audio. So when you talk there is so much transformation happening to your voice starting from the microphone, that the microphone has its own transformations, including noise cancellation and then the browser gets it and sends it over webRTC with all the codec and everything. And then it receives here on [00:32:09.24] and gives to Krisp and then Krisp runs its technology on it.

So pretty much it doesn’t matter where you run. You could easily run it in the cloud. So from that perspective it doesn’t matter whether it’s a USB microphone or a Bluetooth microphone or just a wired microphone, the (ordinary?) microphones. It doesn’t matter. Obviously we have to add the support for all of these but from an audio perspective it doesn’t matter.

Usually Bluetooth audio has more latency, it’s just there with the Bluetooth transport. You might notice that with Air Pods. If you have Air Pods usually the.. I don’t know why but something doesn’t work very well with them when it comes to latency. Sometimes it’s just too much latency with Air Pods and without Krisp.

MATT: Maybe because they have to connect to each other as well as.. as far as [00:33:17.27].

DAVIT: Yeah, I mean, the connection is one time. You connect and then there is a connection. But sometimes the latency adds up. I don’t know what they did wrong there but I hope they will fix it. But with USB, it’s usually more powerful, less convenient. But yeah, I mean, in general that’s the.. Krisp doesn’t care really where the audio is coming from.

There is one more thing, when you use a Bluetooth headset, if you just listen to music, it’s using its highest frequency, like it uses all the frequencies possible –

MATT: Yes, a higher codec, right?

DAVIT: Exactly, yes. But when you turn on, when you start a call, when you start using the microphone off your headset, it brings everything down to wide band typically. Like some –

MATT: Wide band is 16 kilohertz?

DAVIT: Yes, like 16 kilohertz, exactly. And the prior version of these Bluetooth headsets, they would bring your voice to eight kilohertz. You know, that’s not great. But –

MATT: And it’s kind of like using fewer colors to paint a picture, right?

DAVIT: Yeah, yeah, exactly.

MATT: So it doesn’t build sound as full or as natural.

DAVIT: Yeah, yeah. Every time we use a telephone, when you call a phone number, most of the world is still using eight kilohertz codec because they just.. to transmit less data. So that’s, we are used to that. Bu t when you hear full band and then narrow band, which is eight kilohertz, you will see the difference. It’s a big difference.

But so if you are using a Bluetooth headset, there is a big, big chance that it will down sample it to wide band in the calls. And this is because they need to use.. from an energy perspective, from a processing power perspective, they need to keep it efficient.

MATT: It’s actually amazing. I recommend folks.. If you have an iPhone, try calling a friend who also has an iPhone using FaceTime audio versus just a normal phone call. It is astounding how much better you can hear them and understand them. It actually makes phone calls pleasant again.

DAVIT: Yeah, yeah, yeah. Obviously it’s using VoIP and like the best codecs out there. I mean telephony still wins I think because of the network, like the service providers, like the AT&Ts of the world, they dedicate the bandwidth to voice channels, like telephony voice channel, and then the rest is used by everyone else, like the data channels. So your VoIP is going to impact it if the signal is not strong enough but the telephony voice will still be there.

So I guess that’s more important than the higher frequencies of the voice because you can at least hear each other. But that’s going to fix, it will be fixed with 5G. So once 5G is deployed, I’m sure those problems will go away. And everyone will switch to VoIP.

MATT: Cool. Actually I think even now they use voice over LTE by default for a lot of.. by default on the new iPhones.

DAVIT: Do they? I don’t know if that technology is even live. I don’t know if there has been any VoLTE deployment. Maybe I’m wrong but I thought…

MATT: It definitely is on like AT&T and Verizon here in the U.S. but probably not internationally.

DAVIT: Oh, okay.

MATT: It does sound a lot better if both sides are on it. But well I feel like cell phone calls drop so much anyway…

DAVIT: Yes.

MATT: It’s funny, when I was a kid, I remember spending lots of time, hours on the phone. And then your parents would pick up the phone and you’re like, Mom, I’m on the phone. But now I feel like people don’t do phone calls as much anymore partially because the quality is so bad, it’s very frustrating.

DAVIT: Yes, I think it depends on which part of the world you are in. As I know, like in Japan and South Korea, like, I guess nobody is using telephony anymore. Everyone is on VoIP. And it also depends on the network connection. That’s why like if 5G is there, if there is enough bandwidth, why would people use the telephony.

There is one use case though that I really believe is going to still thrive is phone numbers. Phone numbers are such a cool concept. We don’t appreciate them that much but they are the most deployed, known, understood, like handles that we have. And everyone can reach out to you, although it’s spam of course with the promo, spam. But I think that technology is not going to go away. I thought about that a very long time. And I think it’s going to stay around. I think there’s a lot of things that you can build on top of phone numbers and it’s going to thrive.

MATT: I can tell some of your Twilio days coming through.

DAVIT: I know, I know it’s definitely coming from there. [laughs] Yes.

MATT: I really love that we’re able to do a deep dive into Krisp. I can’t wait to see what you’re launching in a week or two. I’m looking forward to the update. I’m going to do a ton of audio tests and record things with different mics and try it all out. So thank you so much.

You are also running a company. And I know that you all were mostly in person I think in Armenia before. How have you adapted and how has it been and what are you planning to do once we can be safely in offices again?

DAVIT: So our company is distributed between the U.S. and Armenia and we have a team member in Germany as well. So we are distributed. We have a big presence in Armenia. I spend a lot of time in Armenia and my co-founder as well. So before Covid, we were spending a lot of time in the Armenia office, although everyone else was remote. And after Covid obviously we are working remotely.

And right after I think Covid started, I guess two weeks in, we decided that Krisp must become a remote-first company. And it’s not because of Covid, we always had that idea because it just makes so much sense especially for us because we are focused on building a tool that helps remote folks. So we always had that idea but Covid catalyzed that.

And yeah so I think at the end we are going to become a hybrid company because we have the office and a lot of people actually enjoy being in the office a couple days a week. They might not have the right set up in the house, they might have kids. And so just like I personally like to come sometimes to the office because that’s how my brain works, it needs that environment.

But at the end of the day, 80-85 percent of our workforce today just doesn’t come to the office because they enjoy working from home. And actually one fun fact – we also have a program called Work From Forest. So every week [laughing] we have 10-15 people taken to some nice place outside, you know, countryside, and they have internet, power, and they do hikes. And they actually work as well and it’s very efficient and productive work happening.

MATT: That’s so cool.

DAVIT: Yes, I’m in love with this program. So yeah, so like –

MATT: So this is happening right now? You just are able to do it in a distanced fashion, Work From Forest?

DAVIT: Yeah, yeah, yeah.

MATT: Well that’s a new one for us, I haven’t heard that one before. I think it’s a great idea.

DAVIT: Yeah, I highly recommend that. It’s quite popular now in our office. And I think it’s going to evolve. One problem with that is the weather, if the weather is good or not. But I think we will find a solution for that as well. It’s a great way for people to still gather together and see.. when you see each other in person, and that’s a very important part of building a culture and relationships, but they don’t have to come and sit in the office for that.

And yeah, we are determined to continue growing as a remote force, global company. So we are very excited about that.

MATT: Awesome. Well I can’t wait to see what’s next. Davit, thank you so much for coming on. If you’re listening, please get Krisp.ai as soon as possible so your calls sound better. If you get in soon, you can get their $40 pricing but it’s a good deal even at $60. And yeah, as you go distributed, make sure to check out the other episodes of the Distributed Podcast, there might be some good tips for you or your managers there. So thank you again for coming on.

DAVIT: Thanks a lot, Matt.

MATT: All right, you have been listening to Distributed with Matt Mullenweg. Please subscribe or tell your friends or rate it and we’ll keep doing this. See you next time. Bye-bye.

  continue reading

38 episodes

All episodes

×
 
Loading …

Welcome to Player FM!

Player FM is scanning the web for high-quality podcasts for you to enjoy right now. It's the best podcast app and works on Android, iPhone, and the web. Signup to sync subscriptions across devices.

 

Quick Reference Guide