Transcript Generated by Beluga
We view language learning right now as an unsolved problem. If you want to learn how to speak English, it’s impossible unless you move to the US for 20 years. Language studying, especially when it comes to speaking, has been a privilege for people who can afford to pay for another person’s time. It was very clear to us that machine learning would change everything.
We built an entire speech recognition system that could not only understand what people were saying, but understand that the accents upon which they were speaking with. That is actually better than if you hired a human tutor. We got it out to the app store, and I think like three people paid. I think we made $18 the first day and we celebrated.
I’m Connor and I am the co-founder and CEO of Speak. Speak is a mobile app that helps people practice speaking English. We use AI to replace a human, but build an experience that feels like you’re talking to a real-life human tutor. Hey, I was just about to call you. What’s up? Not much. How’s it going? Pretty good.
I was thinking of going out this Friday night. Want to come? Yeah, what are you thinking? I was thinking of getting some dinner at the new Italian place in town. We actually grew so fast, we became the number one education app in Korea. We have now grown to over 30 people. We have an office in San Francisco, and we have an office in Seoul, and then we have an office in Europe where we do all of our engineering.
I started my first company in high school. It was called Flash Cards Plus. I’ve always been really interested in computers. I got my first iPhone and I realized the iPhone was kind of the same size as the index cards that I was studying and memorizing for tests. And so I built a mobile app to solve my own problem, and that became super popular.
I went to school at Harvard one year before dropping up for the Teal Fellowship, which eventually led me to sell that first company and have an exit when I was around 21. Yeah, we lived in a dorm together, you know, three bedrooms, five guys, and Carter was one of the guys. I mean, a very, very first impression.
He was a very tall guy. He’s about 6’3″. That was my first first impression. Carter was like a different type of a student because he already had this product and this business out before he even came to school. He wasn’t always busy, but there were times where we would be hanging out with like talking about random things, and then he would just give us a heads up.
He’d be like, okay, by the way, I have a business call I have to make in five minutes, so I’m gonna go to my room. There were times where he would just fly out to San Francisco. Yeah, he seemed very ahead of the curve in terms of like knowing how things worked in society and like how the whole Silicon Valley scene worked.
Carter would be telling us like, you know, what happened in his last meeting, what kind of like deals are happening in Silicon Valley right now, and then like, you know, we would all kind of like, you know, listen in because although we’re all college students, you know, we haven’t had the experience to look at things in that perspective, and now Carter was like the window to that world.
Yeah, I think the experience of selling flashcards plus my first company at a young age, and I think the biggest impact that it had for me was that it allowed me to not have to worry or optimize around making money, but allowed me the freedom, I guess, to really just like focus on thinking about what I wanted to pursue from like a passion perspective.
And that’s ultimately what led me to take a year off of doing anything and just pursue AI research with my eventual co-founder. 16-year-old young man is racking up college degrees faster than most kids his age can collect video games. Andrew Shue isn’t even old enough to vote yet, but he already has three degrees from the University of Washington and he’s working on a doctorate from Stanford right now.
I had actually a very unusual educational path. I was in public school for fourth grade. I was basically just racing through the curriculum and I finished everything early and then started annoying other kids and causing problems in the class. My parents found out about this one day, they decided to homeschool me because I clearly wasn’t a good fit for the public school system.
It kind of unlocked very rapid growth and I ended up going through middle school and high school curriculum extremely fast. When I was 12, I actually finished everything, so the next step was college. So I ended up actually going to the University of Washington when I was 12. It was obviously a very unusual situation. I spent four years in college, studied biochemistry and neurobiology and ended up going afterwards to do a PhD in neuroscience at Stanford.
I did three and a half years of my PhD and then decided to drop out to pursue startups. So I met my co-founder actually through the Teal Fellowship and actually as a roommate for many years before we even started a company together. We didn’t really know what we wanted to do with it, we were just interested in AI.
And the first step we realized was to actually take a year and do an extremely deep dive into machine learning. We both were reading everything and it was very clear to us that machine learning would change everything. And actually spending a whole year learning, doing research, taking classes, really getting deep into machine learning. We built a ton of different algorithms to solve various problems.
And one area that we became super, super excited about was actually speech recognition. We built an entire speech recognition system that could not only understand what people were saying but understand that the accents upon which they were speaking were. We did this and we created a state-of-the-art result and it kind of blew us away because we were just using random data on YouTube but wasn’t even that well labeled and we created these kind of crazy accurate results.
Essentially you can boil down the story of speech as a series of these hypotheses that we de-risk. Really probably the first one that we focused on was can we actually build a language learning experience that people will use at all? Can we collect enough data from that in order to feed our algorithms and create a flywheel of data to better modeling, to better product experience, to getting more data and getting that going?
And that was really what we raised our seed round off of. It was mostly just like a technical proof of concept. We knew nothing about language learning when we started. We didn’t know anything about how people wanted to learn languages. Essentially we just started trying to build concepts and learn as much as possible, get them in front of as many users as possible, test it and inevitably wouldn’t work well enough.
We would learn a bunch and we would go back to the drawing board and we would do that over and over and over again. So the first few years of speak were definitely a struggle to find product market fit. We were trying a lot of different product experiences, launching new things. It kind of felt like nothing was working.
We launched worldwide in every market. It would have short conversations that you could speak with when you first opened it up was kind of like a category selector where you could choose what you wanted to speak about. You could choose anything and then you could have a short conversation. There were a bunch of times where we released something and people said they liked it but no one loved it.
Everyone would churn within the first 30 days and they wouldn’t use it long enough. And this was very, very exhausting process and it was very hard to stay motivated. And this is kind of like the period I think where you need to be the most resilient is before you actually have something people love. But we stayed super obsessed on that.
There were probably two or three times where we felt like we might have had something. People kind of were using it. We talked to investors and they were like, you just scale this and we had to kind of push against that mentality and just say, no, this isn’t good enough. This is not an experience that we can build on top of to build a category defining product.
We’re trying to launch globally and we realized very quickly that if we wanted to build something people love, we would need to pick a single market and start there. We actually just flew to a bunch of different countries. We flew to Korea, we flew to Japan, we looked at Europe. We just talked to a bunch of users in those markets.
I was born and raised in Korea until I was in fifth grade. Even after Connor left, we kept in touch. During my second year working in New York, he called me out of the blue. He asked me, SJ, I have a crazy idea. I have to go to all these countries to do user testing for Korea. I would like you to come with me for a week.
I would like you to be the translator slash planner for this trip. At the time, I had a lot of vacation days left and I was like, wow, this seems like a free trip to Korea. So I said yes and then I immediately kind of got involved in helping him recruit user testers. So everybody was in a small room.
We were sitting just all around the central table. So the user would come in and then we would give them a test phone that they would use. And then we had this sort of awkward setup with like a phone on the side on a tripod where we would try to record the whole session so that we could see the screen and how they were using it.
And the thing is that a lot of our users in Korea are able to understand me speaking English to them at least, you know, 50, 60, 70 percent. They just have a lot more difficulty speaking back. In Korea, people had so many opinions and they had tried so many options. Even just driving around Seoul, you would see these giant skyscrapers that were dedicated to English classrooms.
One statistic that is pretty crazy is that at one point in time, South Korea was spending 1 percent of their GDP on learning English. The amount of money that the average Korean person spends is probably two to three times more than most other comparable markets and was a super dynamic market. And if you could make something work in South Korea, then you could make it work anywhere.
In the early days of a product, the answers are very clear. What users always told us was, I want to speak more. There isn’t enough speaking. We developed this product. We got to really optimize it for the speaking experience. We got it out to the app store. And I think three people paid. I think we made $18 the first day and we celebrated.
January of 2018, yeah. I think there were many reasons why people don’t like a product. When you’re building a consumer product, way more than a B2B product, consumers are super finicky. It’s not so much that you’re building a bad experience, but it’s more the fact that people have a very limited attention span and they have so many options of how to spend their free time.
And so you’re competing against Instagram and YouTube or taking a walk and going to the gym. There’s all these other things that people could be doing. Building an experience, it isn’t about building a good enough experience, more so than it’s building an experience that is sufficiently good to outcompete all the other choices that someone has in their life at that time.
And that’s really hard to do. And I think that’s why you see most consumer companies not really go anyway. When we realized that a big reason why people weren’t using our product was because our product required people to speak into their phone. And the time of day that people wanted to use our product was on a subway or on a bus commuting.
People in Korea spend a lot of time on buses and trains every single day. And that’s actually one of the key times of day where you can build a habit. That’s when people use their phones the most. A counterintuitive realization we had was that by building a way to use our app in that context, would actually allow them to build a habit so that they would continue to use the product in other circumstances where they could then actually use it the way we intended, which was speaking.
And that was something that we didn’t really have any insight into until we went to Korea and we kept asking people why. Why aren’t you using the product? What happened? And literally just observing people in Korea. As soon as we did that, we saw usage spike incredibly. Conversion rate went up, retention rate went up, every metric went up significantly.
I think about product market fit as kind of a thing that is not just like a single point, but it’s more of a spectrum. The more you improve the product market fit, the faster you typically grow. Our first moment of product market fit is where we first started to see people really use Speak and retain and that was when we started to really start growing.
That was a few years ago and I think since then, we’ve essentially continued to ship tons of content, tons of new product features as quickly as possible. When Speak first achieved some level of product market fit, I wouldn’t say it was complete product market fit. I don’t think we even have that today and honestly, that’s kind of something that you’re always trying to improve.
But I think we did feel like, hey, we have something real now that people are paying for, that we can actually charge more for. It felt awesome. It felt really great that we finally landed on some sort of formula that was working in the market. And I think at the same time, it was also very motivational that, hey, now that we have the very beginning of something that is working, now let’s work really hard and make it even better.
I think there are three main components to making a really valuable service. First of all, there’s the machine learning capabilities that are super hard to do, but they power the entire experience and we’re constantly training and building new models to build new features. For the first several years of Speak’s lifetime, we’re not able to put a lot of bandwidth and energy into doing machine learning.
We were more focused on finding product market fit, building the app and trying new things to get that product market fit. When we first started Speak, we had no data. There’s this classic chicken and egg problem. To make a model, you need data, but to get data, either it costs a ton of money and you do it all manually, or you are able to create a product where you can collect that sort of data, but that only works if your model is good enough.
So the thing that actually allowed us to solve this was the fact that off the shelf, Speak recognition in 2015, in some certain cases, was just good enough to produce an acceptable product experience. So that allowed us to actually launch a first version of Speak without training a custom model. That worked well enough that more and more people started using it.
And then as they spoke into the app, we could use that training data to fine tune the machine learning model, the Speak recognition model, and improve its performance and get that whole cycle started. It’s really only been in the past year. We now have a machine learning team internally that is working on all sorts of really exciting things that will start to power features in the product over the next year or so, including this conversational feature.
We’re thinking about fundamentally how do cutting edge ML models unlock capabilities on the product side? What are magical new experiences for language learning? The second is continuing to ship as many new product features as possible, as quickly as possible to build the most appealing product, the most useful product. And the third part that is super, super important is building content that people love.
Obviously, I think we’re probably more of a technology company than a lot of the English players out there. So it’s obvious that we probably have a better product. But I think the content is the other part where it’s very common for English companies to invest one time in building a single content library and then from there, just spend all their time and money on marketing.
But we believe that we can constantly improve the quality of the content and make it better. And we take a very, very product oriented mindset to our content. So we A.B. tested and we’re constantly iterating it and fixing it. I think the second major component, though, is the marketing. Obviously, you need to have a great product in order to be able to market effectively.
It’s really hard to escape the noise of all the marketing that’s happening in Korea and get people to become aware of Speak and actually try it. Because we have a local marketing team in Korea that’s really, I think, truly world-class, that have been super creative and original in trying lots of different options, they’ve been able to create a unique brand voice for us that resonates with a lot of people in Korea.
Early marketing-wise, we tried a few different copies and a few different forms of medium to attract users to download the app. The thing that we really liked initially before knowing much about what would happen was the AI angle. We were like, oh, the AI tutor. It’s like, learn, speak using AI. But we quickly realized that the audience had a very different concept of what an AI is.
They were expecting something different. So basically, for us, we were using AI technology for speech recognition purposes. But people were thinking of kind of like robots and free-form conversations and other things around it. So because the expectation didn’t meet what people were like, what we were providing, I don’t think that worked very well. What worked really well the first time around was the idea that, oh, with speak, you get to speak.
Really focusing on the idea of speaking, it’s like, learn English through speaking. The fact that, oh, we’ll make you speak more in the first 20 minutes than you’ve probably spoken English your entire life. So if you go through our first few exercises, it’s like within 20 minutes, we would have our users speak anywhere between 80 to 120 sentences.
We were like, OK, let’s put some metrics around this. OK, so the copy being speak 100 sentences within your first 20 minutes of experience. And then I think that really caught on with people. One metric that I really like to look at and we’re very proud of is the fact that over 50% of our subscribers are active on the 30th day after they start a subscription.
It’s actually a little crazy that we’ve been so focused on Korea up until now. But I think it was necessary to stay super focused on one market and is key to our success. But now at this point, we feel like we’re finally ready to truly expand to other places. We feel like we’ve proven the model and now we need to expand.
So we’re going to be launching in Japan in a few months and we’re also going to be launching in the US in a few months. We are taking the exact same approach that we took with Korea. We’re prioritizing building a local team immediately that can figure out how to customize the product for Japan. I’m personally going to be going there and talking with users.
We’re already testing with a bunch of users there. We can apply a lot of the lessons of how to launch in a new market that we learned originally in Korea to Japan and every market from there. Our ultimate mission is to solve the problem of language learning. We view language learning right now as an unsolved problem. If you want to learn how to speak English, it’s kind of impossible unless you move to the US for like 20 years.
Language studying, especially when it comes to speaking, has been a privilege for people who can afford to pay for another person’s time. You need to like, you know, book up time and for with another person, like a native speaker to have that speaking experience, to have that conversation. So it was like very limiting. The ability to speak English just opens so many doors for so many people.
For example, if you’re like a kick-ass developer in Brazil, there’s a big difference as to whether you can speak English to a professional capacity or not. You could be working locally or if you can speak English, you can virtually work at any company in the world and it really opens the doors for you. For us here at Speak, we really want to equalize the playing field.
The language learning, especially when it came to speaking, has always been privileged for the people who have the money to afford it, but we want to make it much more accessible for people around the world, whether it’s English or not, and just open doors of opportunities for many, many people. We think that there is a future that we are building where anybody that wants to learn English will be able to use software that is powered by various types of speech recognition and voice models, language models, natural language models that will allow you to speak out loud and get feedback on your speaking in a way that is actually better than if you hired a human tutor.
At Speak, we ask the question, what happens when you can get, instead of 10 or 20 million people actively studying the language, but 100 million, 200 million, 300 million of those people. And so we think the market is actually much, much larger than even what we currently see today. Long-term mission at Speak is, and how it has always been, trying to help the maximum number of people achieve their language learning goals because we believe that the more people that speak common languages, the better the world is.
So our goal is to become like default way that people learn languages, learning any language anywhere in the world. From total beginner, even to, you know, you’ve moved to a country, you know, maybe you have a small foreign accent that you want to get rid of, and we’re listening to it and diagnosing that and coaching you. I think the ultimate vision is beyond just language learning, the technology that we’re building here can be applied to almost anything.
The machine learning challenges that we’re solving here to build a virtual tutor can be applied not only to different subjects, but also to any other use case that you can imagine, where people are trying to communicate with machines. Fundamentally, that is the problem that we’re building and solving.