AIHS 23E6 Podcast: Vast.ai, Intel CPU Max, Perceive, Groq, Lightelligence, NeuReality

TechTechPotato
13 Nov 202334:37

TLDRIn this episode of the AA Hardware Show, Sally and the host discuss a variety of AI hardware accelerators. They cover vast.ai, a Chinese company founded by former AMD fellows, focusing on data center inference. Intel's CPU Max with HBM for high-memory bandwidth AI workloads is highlighted. Perceive's Ergo chip, designed for low-power consumption, is examined. Groq, led by a former TPU architect, offers a unique deterministic inference accelerator. Lightelligence and NeuReality provide innovative optical and infrastructure acceleration solutions, respectively.

Takeaways

  • 😀 The podcast discusses various AI hardware companies, focusing on their unique selling points and market strategies.
  • 🌐 Vast.ai, a Chinese company founded by former AMD fellows, is developing data center inference accelerators with video decoding on chip.
  • 🔤 There's disagreement over the pronunciation of 'Vast.ai', reflecting the challenges with naming conventions in the tech industry.
  • 💡 The discussion highlights the importance of enterprise markets in China for AI hardware, indicating a growing demand beyond hyperscalers.
  • 🔍 The potential for consolidation in the AI hardware industry is noted, with a comparison to the reduction in CPU architectures over time.
  • 📈 Intel's Sapphire Rapids with HBM is highlighted as a significant development, being the first consumer CPU with HBM.
  • 🏎️ The CPU Max series from Intel is positioned as a high-end product with up to 56 cores and 64GB of memory, targeting specific AI workloads.
  • 🚀 Perceive's Ergo chip is described, which focuses on low power consumption and reinventing neural network maths for efficiency.
  • 🛠️ Groq, founded by a lead designer of Google's TPU, offers a unique architecture with deterministic batch one latency, which is a differentiator in the market.
  • 🌌 Lightelligence, which spun out of MIT, works on optical AI acceleration, similar to Lightmatter but with a different approach to phase modulation.
  • 🌐 NeuReality partners with IBM to offer an acceleration card that enhances AI infrastructure, including job scheduling and database management.

Q & A

  • What is the pronunciation of 'Vast.ai' as discussed in the podcast?

    -The pronunciation of 'Vast.ai' is a point of disagreement between the podcast hosts, with one suggesting it's 'vast ey' and the other 'vast I'.

  • What is the background of Vast.ai's founders?

    -Vast.ai was founded by two former AMD fellows who were previously part of AMD's GPU division and AI division.

  • What are the known features of Vast.ai's data center inference accelerator?

    -Vast.ai's data center inference accelerator is known to include video decoding on chip.

  • What is the significance of Intel Sapphire Rapids with HBM in the consumer CPU market?

    -Intel Sapphire Rapids with HBM is significant as it represents the first consumer CPU with HBM (High Bandwidth Memory).

  • What is the difference between Intel's Sapphire Rapids with and without HBM?

    -The Sapphire Rapids with HBM is designed for specific niches such as machine learning workloads that are memory bandwidth-dependent, while the version without HBM targets more mainstream uses.

  • What is the relevance of Project Larabe to Intel's Sapphire Rapids with HBM?

    -Project Larabe, which aimed to use x86 for GPU-like tasks, influenced the design of Intel's Sapphire Rapids with HBM by utilizing the firmware and software developed during the project to manage the high-speed memory.

  • What is the name of the chip developed by Perceive and what is unique about it?

    -The chip developed by Perceive is called 'Ergo'. It is unique due to its approach to neural network maths, aiming to perform operations more efficiently.

  • What is the core concept behind Groq's architecture as discussed in the podcast?

    -Groq's architecture is based on a data flow model where the duration of every operation is known in advance, providing deterministic latency for every operation.

  • What is the difference between Lightelligence and Lightmatter as described in the podcast?

    -Both Lightelligence and Lightmatter use the Mach-Zehnder interferometer as the compute element, but Lightelligence uses electron injection to change the refractive index of the wave guide, whereas Lightmatter uses physical movement of the wave guide for phase modulation.

  • What is NeuReality's approach to AI acceleration?

    -NeuReality offers an acceleration card that interfaces with various AI solutions to help manage and accelerate the infrastructure around AI, such as job scheduling and database management.

  • What is the potential impact of NeuReality's technology on AI infrastructure?

    -NeuReality's technology could potentially increase the efficiency and throughput of AI infrastructure by optimizing job scheduling, data handling, and other system-level tasks.

Outlines

00:00

🌐 Discussing Vast AI and its Founding

The paragraph introduces Vast AI, a company that specializes in AI chips. There's a debate on the pronunciation of the company's name, with suggestions that it might be pronounced as 'Vast AI' or 'Vasti'. The company was founded by two former AMD fellows and has been operating in stealth mode. While not much is known about their architecture, it's clear they're focused on data center inference acceleration and have video decoding capabilities on-chip. The discussion also touches on the broader trend of companies incorporating 'AI' in their names, often capitalizing the 'A' and 'I' or using a dot, and the challenges faced by smaller accelerator companies, especially those in China with limited visibility, in finding their niche in the market.

05:02

💾 Intel Sapphire Rapids and its Evolution

This section discusses Intel's Sapphire Rapids, a CPU now named 'Max CPU' due to its integration with HBM memory. It represents a significant step forward as the first consumer CPU with HBM. The conversation delves into the history of Intel's Project Larabe, which aimed to create a GPU-like x86 processor with big Vector units but ultimately failed to produce a gaming GPU. However, it did contribute to compute-focused high-end CPUs like the KNL series. Sapphire Rapids with HBM is expected to be particularly beneficial for AI workloads in data centers, offering high memory bandwidth and supporting various memory tiers, including DRAM and Optane. The potential for these CPUs in HPC and AI spaces is highlighted, with the possibility of them being used in systems with multiple sockets.

10:02

🚀 Ergo Chip by Perceive and its Innovative Math

The conversation shifts to Perceive, a startup that has developed a chip named Ergo. Perceive is known for its impressive performance figures at low power consumption, achieved by reinventing neural network mathematics. The approach aims to perform operations more efficiently by reducing the number of calculations needed. While the specifics of this method are not publicly known, it's suggested that it could involve rearranging calculations to optimize memory usage and power efficiency. The potential applications of this technology in consumer electronics are also discussed.

15:03

🏎️ Groq's Unique Approach to AI Hardware

The discussion in this paragraph centers on Groq, a company led by one of the original TPU designers, John Ross. Groq's architecture is based on a data flow model where the duration of each operation is deterministic, allowing for a fixed latency that is known at compile time. This is in contrast to other AI hardware where latency can vary based on data input. The technology is particularly suited for real-time applications that require consistent latency, such as recommendation engines. The conversation also includes a personal account of a visit to Groq's offices to better understand their technology.

20:06

🌌 Lightelligence and Optical Computing

Lightelligence, another company spun out of MIT, is the focus of this section. They are working on optical computing, similar to Lightmatter, but using a different approach to modulate light. Instead of using MEMS for physical waveguide manipulation, Lightelligence injects electrons to change the refractive index, which alters the phase of light. This method is compared to Lightmatter's electromechanical approach. The potential challenges and applications of this technology are discussed, including its use in solving complex computational problems like the Ising problem.

25:07

🔄 New Reality and Accelerating AI Infrastructure

The final paragraph discusses New Reality, a company that has partnered with IBM to develop AI acceleration IP. Unlike traditional AI accelerators that focus on neural network computations, New Reality's approach is to accelerate the infrastructure around AI solutions. Their card plugs into a PCI slot and helps manage various aspects of AI infrastructure, such as job scheduling and database management. This unique approach is aimed at improving the efficiency and throughput of AI systems by optimizing the non-computational aspects of AI operations.

Mindmap

Keywords

💡Vast.ai

Vast.ai is mentioned as a company that specializes in AI technology. The discussion hints at the company's ambiguous pronunciation and its Chinese origins, founded by former AMD fellows. It's noted for being secretive about its operations, which is common for startups in stealth mode. The reference to 'Vast.ai' in the script indicates a focus on data center inference accelerators and video decoding on chip, suggesting applications in high-performance computing and multimedia processing.

💡AI Hardware Show

The 'AI Hardware Show' is the title of the podcast where the script is transcribed from. It discusses various AI hardware technologies and companies, serving as a platform for exploring and analyzing the latest trends and advancements in the field of artificial intelligence hardware.

💡Data Center Inference Accelerator

A 'Data Center Inference Accelerator' is a type of hardware designed to speed up the process of inference in machine learning models within data centers. In the script, it's associated with Vast.ai's product offerings, emphasizing the company's role in enhancing data center operations through specialized AI hardware.

💡Video Decoding on Chip

Referring to the capability of a chip to decode video data directly, 'Video Decoding on Chip' is highlighted as a feature of Vast.ai's technology. This suggests that their AI hardware is tailored for efficient video processing, which is crucial for applications like streaming services and video analytics.

💡Sapphire Rapids

Sapphire Rapids is a code name for an Intel processor microarchitecture that supports High Bandwidth Memory (HBM). In the context of the script, it's part of a discussion on advancements in CPU technology, emphasizing Intel's innovation in integrating HBM to enhance memory bandwidth for AI workloads.

💡HBM (High Bandwidth Memory)

High Bandwidth Memory, or HBM, is a type of memory technology that offers higher data transfer rates than traditional DDR memory. The script mentions HBM in the context of Intel's Sapphire Rapids, indicating a trend towards integrating this memory technology into CPUs to improve performance for AI applications.

💡Project Larabe

Project Larabe is an Intel initiative mentioned in the script, which aimed to develop a GPU using x86 architecture. Although the project did not result in a consumer product, it influenced the development of Intel's subsequent products, particularly in compute-focused areas involving large Vector units.

💡Perceive

Perceive is a startup discussed in the script, known for its Ergo chip designed for low-power AI processing. The company's approach to 'reinventing neural network maths' suggests an innovative method to perform AI computations more efficiently, which is a key focus of the discussion around their technology.

💡Groq

Groq is a company led by one of the original TPU designers, focusing on creating AI accelerators. The script discusses Groq's unique approach to determinism in AI processing, where the latency of batch one inferences is known at compile time, offering a consistent performance level that is highly valued in certain applications like automotive systems where reliability is critical.

💡NeuReality

NeuReality is described as a company partnering with IBM to develop AI acceleration IP. Unlike traditional AI accelerators that focus on model computation, NeuReality's approach is to accelerate the infrastructure around AI solutions, such as job scheduling and database management. This is framed as a system-level optimization, aiming to enhance the overall efficiency of AI operations.

💡AI System on a Chip

The term 'AI System on a Chip' is used to describe NeuReality's product, suggesting an integrated circuit designed to manage and accelerate various aspects of AI infrastructure. This includes elements like spatial vector matrix operations, which are critical for handling the complex computations in AI systems, thereby improving overall performance.

Highlights

Vast.ai is a Chinese company founded by two former AMD fellows.

Vast.ai specializes in data center inference accelerators and video decoding on chip.

Enterprise Market for AI hardware is growing in China.

Risks for smaller AI accelerator companies include overfunding and lack of compatibility with other systems.

Intel Sapphire Rapids with HBM is now named Intel CPU Max.

Intel CPU Max is the first consumer CPU with HBM.

CPU Max supports up to 56 cores with 64 GB of memory.

CPU Max can have 1 Gigabyte or more of memory per core in lower core count versions.

CPU Max is expected to be used in AI workloads due to its high memory bandwidth.

Perceive's chip 'Ergo' promises high performance at low power.

Perceive reinvents neural network maths for efficiency.

Groq, founded by a former TPU designer, focuses on batch one latency with deterministic performance.

Groq's architecture is fully deterministic, with fixed batch one latency known at compile time.

Lightelligence, like Lightmatter, uses a MaxZender interferometer as the compute element but with an electrical approach.

NeuReality partners with IBM for AI acceleration IP, focusing on accelerating infrastructure around AI solutions.

NeuReality aims to optimize job scheduling, database management, and other aspects of AI infrastructure.