Amazon is ditching Nvidia GPUs in favor of their own silicon

What just happened? Amazon has announced that they’re migrating their artificial intelligence processing to custom AWS Inferentia chips. This means that Amazon’s biggest inferencing services, like virtual assistant Alexa, will be processed on faster, specialized silicon instead of somewhat multi-purpose GPUs.

Amazon has already shifted about 80% of Alexa processing onto Elastic Compute Cloud (EC2) Inf1 instances, which use the new AWS Inferentia chips. Compared to the G4 instances, which used traditional GPUs, the Inf1 instances push throughput up by 30% and costs down by 45%. Amazon reckons that they’re the best instances on the market for inferencing natural language and voice processing workloads.

Alexa works like this: the actual speaker box (or cylinder, as it may be) does basically nothing, while AWS processors in the cloud do everything. Or to put it more technically… the system kicks in once the wake word has been detected by the Echo’s on-device chip. It starts streaming the audio to the cloud in real-time. Off in a data center somewhere, the audio is turned into text (this is an example of inferencing). Then, meaning is withdrawn from the text (another example of inferencing). Any required actions are completed, like pulling up the day’s weather information.