Shiny Inc
Posts
AI Voice Assistants

AI Voice Assistants

Coming sooner than you think.

Tom Osman
July 31, 2024 • Reading time: 2 minutes

G’day folks,

Been a minute since I’ve sent an email out so felt necessary to jump back on the horse.

Heres the three main things I’ve been thinking about this week.

The Big AI Wake-up Call

We're at the beginnings of the big wake-up call. People are realising that 85% of the tasks that make up their job, doesn’t require them.

Let's take recruitment as an example. 15% of the work is the "human touch" element but I‘d argue the rest can be handled with AI.

A recruiter's job boils down to a few key parts:

Finding Candidates
Sourcing Jobs
Formatting CVs
Updating an ATS
Building Relationships
Negotiating Terms

AI can handle the first 4 parts of that list better, faster and more accurately than humans already, and if Character AI is anything to go by, the fifth is about to topple too.

Your job as a human is now to mastqer how to become the conductor of the tools that do the work, not to bury your head in the sand thinking that the old way of doing things still applies.

On a long enough timeline the outcome is inevitable already but you get to choose the timeline.

Apply the same thought to your industry, job, career, path. Map out the core tasks and start thinking about how to go from worker to conductor!

Voice will now ramp this up to a whole new level…

Advanced Voice Assistants

A few months ago OpenAI teased it’s latest voice model that handles everything from human like expression, emotion, low latency feedback and human sounding voices that will no doubt enable the next generation of voice assistants.

We’re starting to roll out advanced Voice Mode to a small group of ChatGPT Plus users. Advanced Voice Mode offers more natural, real-time conversations, allows you to interrupt anytime, and senses and responds to your emotions.
— OpenAI (@OpenAI)
6:30 PM • Jul 30, 2024

Now it seems like they're slowly rolling this out to an alpha group of users (that are ChatGPT Pro subscribers) and they say that everybody should have it later this year.

At the moment we're not sure when API access is going to be available but now is the time to aggressively build voice AI assistants for your business, company, project or whatever it might be. If you can build these things now, when this latest model gets released every single assistant is going to level up in terms of its capabilities and its quality and latency will be reduced.

The simplest way to build one is:

Go to www.synthflow.ai
Sign up as a free tier that you can create inbound, outbound assistance using voice.
Write the prompt telling it what to do and how to conduct the conversation.
Connect your own ElevenLabs account, pick a custom voice that you like.
Create a Twilio account, buy a phone number, and create your first assistant.
Start making or taking calls.

Meta Segment Anything

Meta has just released the latest version of its Segment Anything AI model. While this may seem a bit advanced for AI newbies among you, it's important because it will enable the next generation of AR apps for Meta Ray-Bans and other devices!

Here’s a vid showing what it is and how it works.

Introducing Meta Segment Anything Model 2 (SAM 2) — the first unified model for real-time, promptable object segmentation in images & videos.
SAM 2 is available today under Apache 2.0 so that anyone can use it to build their own experiences
Details ➡️ go.fb.me/p749s5
— AI at Meta (@AIatMeta)
10:48 PM • Jul 29, 2024

One epic application of this tech will be creating more immersive experiences for watching sports. Imagine a live sports event feed where you can select each player and pop open their stats for the game. This new model makes it possible to do this quickly, easily, and cheaply.

Prior to this technology, such capabilities were very expensive and complicated to implement. However, Meta has open sourced this under the Apache 2.0 license, allowing everyone to use it for their own applications.

This technology will also be particularly useful for AI+AR enabled glasses. Imagine you're wearing a pair of Meta Ray-Bans and looking at ingredients on a table. The AI could automatically select an onion, create a circle around it, and overlay instructions on how to prepare it.

Segment Anything by Meta is incredibly powerful, and we're looking forward to seeing all the implementations, ideas, and products that people come up with using this technology.

Bonus

Watch “Inside Mark Zuckerberg's AI Era” on YouTube (link)
Listen “The Risk is Totalitarian World Government" - Peter Thiel” (link)
Admire this main character “Kim Ye-ji“ from the Olympics (link)
Runway released their new Image-to-video model (link)

That’s it for today.

Catch you on the next one.

Reply

or to participate.