July 2, 2024 Talking to machines

Deeper Learning

Hey there!

Another week, another new AI model by a company you’ve never heard of coming in to command your attention. This one is called Moshi, and we’ll get into it along with a highly-funded music studio app, and, of course, more AI news and launches.

It’s good to be back! - Sarah Wright


Making machines as fluent as humans

Image: Kyutai

Kyutai, a French AI company backed by billionaire Xavier Niel, just launched Moshi, a new chatbot that it says can understand and relate to humans better than anything on the market. The launch comes only two weeks after OpenAI had to delay the launch of its own voice mode for optimization.

Moshi is designed for hyper-human-like conversations. It’s not an AI assistant, says the kyutai team, but “rather a prototype for advancing real-time interaction with machines.” Moshi was built from scratch in just six months by a team of eight people.

So what’s the big deal? Well, Moshi is extremely low latency. The model “thinks” and speaks at the same time, and can listen and talk at the same time too, making a conversation feel more natural than anything I’ve seen demoed so far. You can chit-chat without weird pauses. “Moshi’s response time of 200 milliseconds surpasses GPT-4s reported 232-320 milliseconds,” writes Amy Sarah John for Wire19.

Then there’s the voices and “role play.” Moshi can adjust to different emotions and speaking styles at the drop of a hat, whether you want it to try on an French accent, use pirate lingo, or lower to a whisper. Kyutai explains that Moshi is also “incredibly easy” to adapt and tweak the model by fine-tuning on specific data. For example, make it listen to phone calls from the late 90’s, and Moshi will respond to you with time-appropriate references and context.

According to Tech Radar, Moshi was built with a pretty extensive fine-tuning process, including training it on over one hundred thousand synthetic dialogues generated to TTS (Text-to-Speech). Kyuti also worked with a professional voice artist to ensure the bot's dialogue sounded natural and engaging.

If you want to try Moshi, you can do so here. Personally, I couldn’t create an accent quite as smooth at the demo, but let me know how your experience goes.


This AI tool raised $125M to put a music studio in your pocket

When I was a teenager, which feels like a lifetime ago considering I’m turning 30 this month, I was into making my own music. I had FL Studio, a MIDI keyboard, an electric drum kit, and every plugin imaginable. As you can see, I didn’t make it to the big leagues. 

Music production has come a long way since then. For example, Suno.

Suno is an AI-powered music app that secured $125 million in funding for its text-to-song method of generating entire tracks. Now, the new iOS app puts an AI-powered music studio in the pockets of millions.

If you want to generate a song, type in a description like “indie pop tune with rock vibes describing a summer road trip.” From there, Suno will immediately get to work crafting your next chart-topper. You can also use your voice to generate music. Hit the mic button and start singing or talking to inspire the AI. 

Alongside that, the Suno ships with new social features almost akin to Spotify. You can quickly share clips of your song and other people’s songs across social media, browse songs generated by others, and curate different playlists. 

But like most AI + art stories, it’s not without controversy. Suno is being sued by the world’s biggest record companies, which allege that Suno and Udio (another song generator) engaged in copyright infringement. 

Chime in with your thoughts, or just try the app, right here. 


For productivity

  • ElevenLabs launched Voice Isolator to remove serious background noise.

  • Ariglad auto-creates and updates your knowledge base articles by analyzing support tickets and product release notes.

  • Rapport gives you the tools to animate ChatGPT and deploy an interactive AI personality.

  • Summer lets you add an AI summary button to your blog content.

  • GlobalSEO lets translates your site into 93 languages.

For makers

  • BuilderKit has pre-built AI tools, pre-built landing pages and a NextJS boilerplate to help you ship your AI tools fast.

  • L402 implements internet-native paywalls by building upon the HTTP 402 — and its AI assistant can handle payments automatically.

Thanks for going deeper with us!

Here via forward? Subscribe here.

Have feedback?

Did you enjoy today's newsletter?

Login or Subscribe to participate in polls.

Join the conversation

or to participate.