Amir Haramaty ADT podcast cover
Episode: 118

Amir Haramaty - Evolution of speech technology in transforming enterprise operations

Posted on: 28 Dec 2023
Amir Haramaty ADT podcast cover

Amir Haramaty is the CEO and co-founder of aiOla, an AI speech technology that helps businesses streamline inspections.

In this episode, we discuss how the evolution in AI speech technology is transforming enterprise operations. We talk about the real-life applications of benefits of streamlined speech technology, and Amir shares the story of aiOla as well as some success stories of how it compares to existing off-the-shelf solutions.

 

Links & mentions:

Transcript

"The biggest challenge, more than 50% of the vocabulary used in business, it's not language, it's jargon. Of a specific industry, of a specific company, of a specific location, etc, etc. And there's no solution off the shelf, and as I told you in the beginning, I love big problems."

Intro:
Welcome to the Agile Digital Transformation Podcast, where we explore different aspects of digital transformation and digital experience with your host, Tim Butara, Content and Community Manager at Agiledrop.

Tim Butara: Hello everyone, thanks for tuning in. Our guest today is Amir Haramaty, CEO and co-founder of aiOla, an AI speech technology that helps businesses streamline inspections. Today, we'll be talking about how the evolution in AI speech technology is transforming enterprise operations. And Amir will be sharing some success stories of their AI tech, aiOla.

So, Amir, welcome to the show. Thank you for joining us today. Do you want to add anything here before we dive into the questions? 

Amir Haramaty: Thank you, Tim. It's a pleasure to be here. And again, one of the things is, it's important to mention that I'm a so called serial entrepreneur, although I don't like that name anymore. I think it's been diluted quite a bit. And I actually prefer to call it a serial problem solver. 

Therefore, I'm constantly searching for massive problem, then trying to figure out the best way to solve it, build an incredible team around it, and then just focus on bringing value to our clients. And that's what I've done for many, many years in different companies. I've made plenty of mistakes, don't get me wrong, but I did a few things right. 

And I had a chance to lead several tech companies to significant outcomes, sold to Microsoft, Facebook, Stingfield and others. And everything I've done in my entire career is actually coming now to the point... there's a very famous saying in Latin that says, every step I took in my life brought me to here and now. 

And it could not have been happening in a better time where technology around us now put us in a position that finally we can leverage those two most used and abused letters in English language at present time, which is called A & I, Artificial Intelligence, which has been around for ages. But now we can do it right because we have the right ingredients. It's up to us to connect it the right way and execute properly. And that's exactly what we try to do here. 

Tim Butara: Oh, I love everything about that intro, Amir. From you being a serial problem solver to every step and every failure and every mistake, if we can even call them mistakes, right? If their purpose was to bring you to this exact point where you're succeeding and everything was just an integral part of what's going on right now. I love that and it resonates with me very heavily. 

But to your point, so being a problem solver, right? So, in the context of our conversation today, so in the context of the evolution of speech technology and transforming enterprise operations, what kind of problems or use cases or industries are we talking about?

Amir Haramaty: Actually, I'll take it, with your permission, I'll take a few steps back. The AI and ML, machine learning, those are incredible tools. But just tools. Which means that everything starts and ends with data. And the known secret is that the majority of data in the world is still not part of the game, because the majority of data in the world is still unstructured and uncaptured.

So, first, we need to recognize most of the data is not still part of the game. What are we doing in this conversation, this podcast? We're talking. We're using speech. Speech is the most common communication tool. But yet, speech was the unbroken frontier, okay? The unpenetrable frontier, you know? Try to speak with Siri with an Israeli accent, as I do. Good luck. 

Okay, so at the end of the day, we realize while, one, data is the key component, the key ingredients, and most of the data is still not part of the game. Second, speech is the number one source of that unstructured and uncaptured data. Three, that was not solved until now, becausey yes, you can have some very good translation, but when you go to business, You'll find out the other challenges. It's not just the language or accent. It's acoustic environment because we're not sitting in a podcast studio. We're working on production floor or nowhere else, but then we learn all of those are solvable. 

The biggest challenge, more than 50% of the vocabulary use in business, it's not language, it's jargon. Of a specific industry, of a specific company, of a specific location, etc, etc. And there's no solution off the shelf, and as I told you in the beginning, I love big problems. And we realized that's a challenge we need to attack. And if we'll be able to break into it and get something which is very reliable, it's a game changer. And that's exactly what we have there. 

Tim Butara: So what would be like the real life benefits of these streamlined speech technologies? Maybe ,on the one hand for workers themselves individually, and on the other hand for, you know, businesses, companies, enterprises. 

Amir Haramaty: Absolutely. So actually I start with a concept, with aiOla, when I said, you know, every time we go to a client, by definition, we are exposed to a tremendous amount of critical human intelligence. We learn what the client needs, competitors, pricing, client's DNA. Arguably, there's no any other point of time when we're exposed to so much critical information. Agreed. 

Then, you know, ask yourself how much of that information is actually being captured. And you scratch your head and you say, actually very little. Now, CRM's been around the world since 1999, so it's almost 24 years. Commercially, phenomenal success. Operationally, if I may say so, and I apologize in advance, it's a colossal failure. I haven't found a single person said, oh, my God, Tim, I'm so excited. I'm going to log into Salesforce or any other CRM. Nobody likes it. You do it because you have to, not because you want to. You treat it as a chore. And therefore, it's reflected in a quality, quantity and timing. 

So say, you know what, if I just finish a conversation with you, and I learned so many new things I didn't know before, I'm going to walk to the elevators. I'm going to tell aiOla in natural language, any language, what I just learned. I capture it, automatically integrate it to your CRM of choice. 

And in the name aiOla, AI O LA , which actually stands for AI Operating Layer, the most important letter is not A or I, it's actually the O. And if you look at a logo of that O, it's not an O, it's a fat intelligence loop. Evergreen, ever learning, that was the concept. And with that in mind, to be honest and very modestly, it took off very quickly. 

And then, you know, unfortunately, about a year and a half ago, the war in Europe started. And I started to think, and I say, you know what, we arguably, collectively, as citizens of the world, facing one of the biggest perfect storms in history. Geopolitical and microeconomically. And that's forcing organizations to make a clear separation between what's perceived to be nice to have versus must have. Whatever is nice to have, forget about it. I don't have time for cosmetic surgery, I have to stay alive. 

So, one is recognizing what's going on in the world. And the second thing that happened to me, like, oh my God, this technology I just developed and I'm very proud of, maybe, maybe rightly so, considered as nice to have. So, I don't want to be a nice to have category. 

So, you know, in psychology, there's the Maslow Pyramid of Needs. So, I created MaiOla . The Maslow's from Mayola's perspective. What are the verticals which are immune? So, it doesn't matter what's going on in the world, we still have to eat. Food industry, beverage company. Unfortunately, we have to take our medication. So pharmaceuticals, energy, supply chain... you can easily find four, five, six verticals, which doesn't matter what's going on in the world that still must have.

So that's where I decided to take the technology and apply it to must have verticals with critical processes, sometimes are highly regulated, where you can demonstrate quantifiable ROI at scale. That was the path, that was the blueprint. 

Tim Butara: So can you maybe now share some success stories of aiOla, you know, beyond just kind of abstract benefits, what were like some of your favorite client stories and things that you worked on?

Amir Haramaty: Absolutely. And I do that with some... because one of the things I can tell you as a person and as a company, I'm always in a quest for learning. Okay. I'm a self declared infomaniac. Yes, that loop. I read everything that I see. I don't care what it is. I read everything. I read five books in parallel. Etc. Etc. Because I'm hungry for data. Okay. So it's a constant learning. 

So with that concept I just share with you, and we said food. So I picked up the phone and I'm on a call with a Fortune 60 company in the food globally. And I say, when you start your day, you just flip the button and start production? He said, of course not, I have to complete a highly detailed and highly mandatory regulated pre-op inspection.

So, okay. What is pre-op inspection? 4.30 in the morning, group of inspectors goes with 20 pages and checks every single piece of equipment until they complete it and their thumbs up, they cannot start production. So, wow. And how long it takes? Two hours; and those two hours, everything... production is idle.

We'll say, yes. I said, ooh, what if I'm going to take this 20 pages, make it speech based. Hands free. They're going to walk and talk, and the forms are going to be filled. One of the things is it's going to be way more efficient. Second is going to be safer because you can spend more time looking at the equipment versus looking on the clipboard.

Third, very importantly, we're going to generate tremendous amount of previously uncaptured and unstructured data. And that data now will allow you to connect the dots beyond what the human eye can see or beyond what the human brain can process, and bring you that evergreen, ever learning insights and intelligence.

And fourth, collaboratively. It's no longer just production line, it's connected to the entire factory, to an entire country, to an entire region, and you can identify trends before the form. We said, great, why don't we do a pilot and we'll start in Australia. Okay. 

And we got the recording from Australia, and there's no need, I always believe, to reinvent the wheel. And I took one of the leading automatic speech recognition platforms off the shelf, and we put that recording on that platform, very well known, and I was shocked to find out that accuracy was only 52%. So first, I thought maybe it's the Aussie accent. Well, it didn't help, but that was not the reason.

Second, maybe it's the acoustic environment. That's what turns out, is what I said in the beginning, that more than 50% is jargon, and no automatic speech recognition can do... so we have no choice, be the language model and a jargon model, and we took it to 99% accuracy. 

And that's when the needle dropped because I realized we don't need to boil the ocean. We don't need to talk about the football or weather or politics. Every process is up to 2000 words, more than 50% is jargon. And if I can do that with that accuracy, is a game changer. They did a pilot. And I came back after two months. I say, instead of two hours for the inspection, it dropped to one hour.

So without adding a single person or a single piece of equipment for this razor thin margin industry, we just give another hour of production per day. You can imagine how valuable it is. 

Second, we took 100% paper that went nowhere to 100% green knowledge paperless ESG solution, which is now environmental protection, an environmental solution. But not only that, now that becomes the raw material. And that's really led us to realize we can do it for any process and any industry. But we also understood the limitations of the available technology off the shelf. And that's when we took it to the next level. 

Tim Butara: Well, I'll ask you about limitations and challenges in just a little bit. But first I want you to tell us a little bit more about, you said that you built the language model and you built The jargon model. So can you tell us a little bit more about this jargon model? Like, how were you able to go from 52% in the kind of streamlined enterprise tech to 99% with your tech through this process?

Amir Haramaty: Absolutely. So several things again. We have done here some, a lot of manual work to be able to identify. We develop unique technology and actually we register three patents on that. It's the first time ever that we combine automatic speech recognition as available off the shelf. With natural language understanding.

Which means that everything we say right now, the understanding is part of the recognition. So, most solutions right now, most of them, if not all of them, doing the automatic speech recognition off the shelf with 50 some percent errors, up to 60% accuracy, and then manually trying to fix the errors.

We felt this is the wrong approach, like you're trying to scratch your left ear with your right arm behind your neck. There is an easier way to do that. So we start from the very beginning and we built something again. It's the first ASRU platform. So it's not automatic speech recognition, it's automatic speech recognition fused with natural language understanding, built from scratch to support more than 100 languages.

We build the language model, acoustic model, and jargon model on the cloud. We build two patterns again, one specific on the jargon, and the second one, because coverage, if it's cellular or WiFi, is not always perfect, so we do on device keyword spotting. So now the combination of both give you such an accuracy that it can jump from language to language, from industry to industry very quickly.

And actually everybody talks today about LLMs, you know, large language models, which is amazing because now you can do ChatGPT and you can do Bard. But you know, I'm a simple man. What I know best is to milk cows in a kibbutz. I'm thinking about it, and I say, and I'm drinking coffee, I say, I don't think, I don't need to boil the ocean in order to make coffee.

So I don't want to go on a large language model and say, you know what, I don't care, you know, you're based in Slovenia, and you have right now a plastic factory, and you want to go for a specific process in Ljubljana that you want to do right now. 

I say, great, send me that form in your language, which I don't speak. Okay. I can take that. That's the only data I need. We build a special platform. It's called Jargonic. Basically, I'm going to take this little sample of data and behind the scene, I'm going to use generative AI, GenAI, to generate hundreds of thousands of synthetic samples very quickly. 

And actually within several hours. I'm going to understand your language, your process, your jargon, your acoustic environment. And in about 30 hours, I'm going to have a language model that speaks your language, your factory, your process. And actually, we eliminated, we moved from LLM, Large Language Model. You can scratch the L for large, and you put, instead of that, D, for Domain Specific Language Model. And that's a revolution, that's what we have done. 

Tim Butara: Nice. Wow. That sounds super cool and definitely super advanced and definitely something that... you said that it was, that it just kind of exploded and it started growing super fast. It makes perfect sense that it did, right? 

Amir Haramaty: Absolutely. And one of the things again, learning for me, because I've been a disruptor my entire career, I enjoy very much disrupting, but it took me maybe too long thing to realize that most people in the world don't like to be disrupted.

That's when we need to eat a little bit of a humble pie and listen more than you think, you know, my mother always told me, you have two ears and one mouth. Try to keep that proportion. Listen twice as much as you talk. Okay. And it's a good lesson. I'm still failing, but I'm trying. 

But I realized the best way to transform - again, I told you I worked six years with McKinsey as their platform of choice. And if I'm going to hear those two words, digital transformation one more time, I'm going to get allergic reaction all over my body, okay. Because organizations, especially traditional industries, have been been doing things for many, many years, and we have to respect it. Don't tell them, change everything. 

But I found out the best way to transform is not to disrupt, but rather to enhance or augment an existing process. This is the most non threatening way. I say, I respect the way you conduct business. What if I'm going to help you to do it a little bit more efficient, safer, smarter and collaborative? And mostly to be able to demonstrate where you can measure those three magic letters - ROI, return of investment, because at the end of the day, it doesn't matter how small or big you are. It doesn't matter how successful you are. Everybody still needs to do more with less. And that's what we're doing. 

Tim Butara: That makes a lot of sense, yeah. So maybe now, if we take a look at the other side, that we also already started talking about. So what were some of the main challenges or limitations that you encountered or that you spotted in all this? 

Amir Haramaty: So first again, again, make sure that you know who you're talking to, because, I'm a geek and many of us are geeks. And sometimes we get confused because we develop technologies for people like us. They are not our users. Our users are drivers, are technicians. You know. And first is to make sure when you address your audience, keep it simple, keep it easy, keep it something to see the benefit. So one, again, is to make sure you're talking to the right audience.

Second, make sure that you are addressing the needs, limitation and capabilities of that specific audience. Third, I'm a firm believer that success breeds success, meaning you don't want to do a science project. You want to take a real big problem, a painful problem, that if you'll be able to solve it, the needle will move. With the ability to quantify the impact in several months. Because once you see the bottom line, nobody can argue with it. 

Those are three main principles which are very, very important when you start. In addition, don't try going back to the phrase I used before, to boil the ocean. Because I know and I'm very proud of the technology we built, it's proprietary, it's unique, it's different. But if you tell me, oh Amir, let's use it for call center, and I say please don't. Because call center is wide open. I like to go to very well defined processes where I'm dealing with thousands of words, you know, a funny project that started, not funny, I never saw that coming, but I got a call from McDonald's.

And I don't know if you stepped into McDonald's, but when you step to McDonald's today, you do all the ordering on a screen. You touch the screen. But if you go to a drive thru, it's the same old drive thru, that you're talking to a human, and you can have accent for you or the person inside, you can have noise, you can have kids in a car that you're talking to them in parallel or yelling, whatever, like who you're talking to.

And I got a call from McDonald's. I said, can you deal with background noises? Yes. Can you separate between different voices? Of course. Do you, and I said, we have a technology, it's called keywords. By the way, how many words do you have in the entire menu combinations of McDonald's? And it was like 772. That's easy. Okay. 

So, the engine can easily, you know, it can learn thousands of words, but 772 words, I don't care what's combination, I don't care what you say to the kids, I can get you close to 100% accuracy every single time. 

Okay, so then I learn, I don't care if it's, you're dealing with cruise ships, you're dealing with shipments and delivery, if it's manufacturing. We, by the way, are right now trying to help a healthcare system. So, you know, during some crisis like the one we're facing, and you have to support like mental health, you know, every day, thousands of social workers and psychologists are talking to patients. How do you capture the data? 

So you just finish a call or conversation. I take 20 seconds to record it in a WhatsApp message. And automatically the full summary with all the critical information will be ready for the review, and you can load it. So that ability to automate data capturing by speech in any language, any accent, any acoustic environment can be used cross industries, cross vertical, cross processes.

Tim Butara: And this is what makes your tech so streamlined and so capable when compared to like proprietary solutions, which as you said before, they get up to 60% accuracy or something like that. But usually it's not more than that, if it's jargon. 

Amir Haramaty: Yeah, so accuracy is absolutely a key component because if you don't get north of 90%, they cannot use it. And it's to keep it simple. You know, we come in with a very simple approach and as I described before, I can start with a new client in a new country... you know, I'm doing our project in China, in Hong Kong, in a jewelry industry for quality control. We don't speak Mandarin, but the system doesn't care what language it is, we do 97% accuracy.

The other day, I did a presentation for one of the largest tire manufacturers from Korea. And he told me, listen, we understand the uniqueness of speech and how critical it is, but since 2018, we're trying to find technology that will be able to deal with Korean and our accent and our noise, and we cannot get anything north of 60% accuracy.

And I said, great. So I don't need to convince you about the need. You know it. I don't need to convince you how difficult it is. You tested it. Let me show you something because as preparation for this call, we just rent a random recording in Korean, and I showed him, I can even show you right now, it was 94% accuracy, first time we played with it; I rest my case.

So, I love the fact... and, you know, actually, I was talking to one of the largest VCs in the world, globally. And they mentioned, you know, Amir, we made a ton of investments in AI in the last five years. Most, if not all, were tech startups that were selling to tech startups. That's gone. 

And we love about the fact that what you do, you take the bleeding edge technology that you developed and broke through, but you're offering to the most traditional industries, in their language they call it, oh my God, Amir, you found this blue ocean.

I said, one, thank you for the compliment. I don't think I deserve it. Second, I don't think it's a blue ocean. A blue ocean is a place all of us are running to. Here, those industry was not, nobody was even looking at, which means it's huge, but it's not a blue ocean. It's a yellow desert that desperately looking for the drip irrigation, to turn yellow to green. And that's exactly what we're trying to do here in a simple, efficient way. 

Tim Butara: I really love that point about designing, how it has to be designed with the actual users, the practicality in mind, you're not designing it for yourself. That was, like, we just recently recorded a great episode where one of the key points and key takeaways from the episode was that, you know, sometimes UX isn't as effective as it could be in like a tool, a website, platform, whatever, exactly because the people creating it are kind of not designing it or creating it with their users in mind, but they're kind of doing it, they're doing what they themselves would like in a tool. And it's something completely different.

Amir Haramaty: Exactly right. And I can give you an example, because most of those industries we're going after, keep in mind, those are very cost sensitive or price sensitive organizations. And it's always like, wow, this is amazing. But in the same sense, how much it costs? 

And then I smile and say, I don't care. Of course I do care, but I'm not chasing price. I'm chasing value. And if there's no value to be generated, don't waste your money. I don't want to waste my time. But if there's value to be generated, we're going to take a small piece of the value generated. But first of all, it's switching everything. It's not me versus you. It's you and me together chasing value. So this is a value based solution. And it's all focused about value. And it's a different dialogue. And AI is just the instrument we use to help us to generate value. 

Tim Butara: I love that, Amir. I love the... I love, the whole conversation. I really enjoyed speaking with you today. Just before we wrap things up, if our listeners would like to learn more about you or connect with you or, you know, discuss things, something more in depth with you, where can they find you? 

Amir Haramaty: Absolutely. So that's something that I encourage, by the way, I'm always love those conversations. I'm always learning and always like to share my mistakes first. So, you know, I'm sure I'm going to make plenty of new ones, but if somebody can avoid the one I did, I'm already a richer person, because... So again, it's very easy, Amir, A M I R @ aiOla, aiola.com. You can find me on LinkedIn. You know, I have 22,000 close friends, and now I can have some more.

But one of the things I do enjoy very much is talking to young entrepreneurs, talking to traditional industries. And I believe, again, it's not individual sports. It's a team sport, and the team is all of us. And if we can learn from each other, and if we can empower each other, and I can contribute a little bit to the process, I'm a very happy man.

Tim Butara: Amir, that was the perfect note to finish on. Thank you so much for joining us. It was great. 

Amir Haramaty: My pleasure. Thank you so much. 

Tim Butara: And to our listeners, that's all for this episode. Have a great day, everyone, and stay safe. 

Outro:
Thanks for tuning in. If you'd like to check out our other episodes, you can find all of them at agiledrop.com/podcast, as well as on all the most popular podcasting platforms. Make sure to subscribe so you don't miss any new episodes, and don't forget to share the podcast with your friends and colleagues.