Scaling AI with In-memory Analog Chips with EnCharge AI’s Naveen Verma

Naveen Verma is the CEO and co-founder of EnCharge, a startup building the world’s most powerful in-memory analog chips for AI.

What are you building at EnCharge?

If you look at what has happened in the field of computing, really driven by AI, it has been an explosion like nothing we’ve seen before. In the last 10 years, we have seen an increase in compute intensity and model size by 10,000-fold. These metrics are unprecedented. The truth is, the computing systems we have today, which we’ve relied on for many decades, are not able to keep up with the kind of requirements being pushed by these AI computations.

The reality is we need very differentiated, transformative solutions that contemplate where the real limitations are for these AI workloads, which are pushing the boundaries of today’s capabilities. These are critically important technologies for society, so we need to think about: what are the fundamental attributes needed for computing systems, and how do we address those needs? Being able to work in a fundamental research environment at Princeton since 2009 gave us the time and space to ask that question and think deeply about it. Our research focused on envisioning where the critical, defining compute trajectories were headed and how to address them.

What problem is EnCharge solving with in-memory analog AI chips?

If you’re just doing the kinds of things being done with today’s technologies, the big players like Nvidia have extremely smart, well-resourced teams that can push those technologies about as far as anyone can. That’s not where innovation is needed. The problem is the past paradigms we relied on, like Moore’s Law, have gone away. We need completely new roadmaps and solutions. The scale of the compute requirements here is unprecedented, and we need technology that would put us on a fundamentally different trajectory.

That’s what we developed: a technology called analog in-memory computing, combining the transformative concepts of analog computing for efficiency and in-memory computing to overcome the bottleneck of separating memory and compute. We spent six years reinventing and de-risking this fundamental technology using DARPA and DOD funding before spinning out EnCharge in 2022 when we felt we fully understood and validated where this technology was going. Those six years were important because the day you take venture capital, your agenda changes – you start solving commercialization problems instead of fundamental technology problems. With transformative tech like this, you need that time to de-risk it and understand it before becoming customer and product-focused. This time and space allowed us to develop this differentiated solution for sustainable AI compute.

deep tech newsletter

A weekly dispatch featuring exclusive interviews with deep tech founders & a roundup of the most important deep tech news.

What is the status quo today with these digital memory systems? How is it done and what are the limitations?

We have to acknowledge this 10,000x explosion, but at the same time, companies innovating in this space like Nvidia have been making remarkable progress, even if that progress isn’t keeping pace with the 10,000x we need. The problems we see today – the exorbitant cost of GPUs, and not being able to get them even if you have the money – are the consequences of that compute not being sustainable. What is being done today relies on digital technologies. Essentially, you’ve got these semiconductor devices with rich physics, but the signals are limited to ones and zeros. This is done because, with billions of devices on a chip, you need noise margin and separation between the signals. But this leaves a lot of efficiency on the table by not representing and processing all the signals possible in between.

The other issue is how data storage is separated from compute. Different technologies are used for building memory and digital logic. Since they are physically separate on the chip, moving data around so you can both store it and compute on it has a big overhead, but that is the way things are done today. The progress has been about working within the constraints of these fundamental inefficiencies. One big approach has been quantization, which is realizing you don’t need full 32-bit floating point precision for AI models. Rather, you can use lower precision data like 16-bit, 8-bit, or even 4-bit. That drives efficiency gains.

The other approach is looking at what computation AI needs. It turns out 95-99% of AI compute is matrix multiplies. Companies have built dedicated tensor cores optimized for that operation, with the right programmability around these for different types of models, like convolutional or transformer models. These are all ways of optimizing within the limitations of digital’s inherent inefficiencies in signal representation and the memory/compute separation. It’s admirable work, but it doesn’t cover a 10,000x gap. That’s where fundamentally different approaches are needed.

What were the core technical challenges you had to solve to develop a competitive in-memory analog AI chip?

A lot of AI compute, like 95-99%, turns out to be very large, high-dimensional matrix multiplies. That’s why you have so much data, and that’s why moving that data around becomes a big problem. But there are some very interesting attributes that matrix multiplies give us.

It turns out they involve a very parallel operation. You have multiplications between matrix and vector elements, but then there is a big reduction operation, essentially adding up all the multiplications along the inner dimension of the matrix. That adding process reduces all this parallel data to one result. The idea of in-memory computing is to do these multiplications and reductions inside the memory so that you are communicating much less data.

The challenge is doing this computation inside very tiny memory circuits and doing it with very high energy efficiency. This is where analog comes in. Analog gives you much higher area and energy efficiency because you’re not leaving all of these intermediate signal levels on the table just pretending they are ones and zeros. But when you do analog, the problem is now you’re susceptible to noise. While analog can be 100x more energy efficient and dense, the tradeoff is less noise margin. The big problem that EnCharge solved is the analog noise problem.

Is your technology manufactured at an existing semiconductor fabrication plant or is a new approach required to manufacture these chips?

That’s one of the biggest advantages of our particular approach. It requires no additional process technology changes. Where other analog approaches might require flash transistors – our technology just requires metal wires. These are available for free, and you get them in any semiconductor technology because you need them to connect your devices to build circuits. So, this works with standard low-cost and high-volume CMOS. It works in every technology node, and so we can scale to the most advanced nodes. That’s important. We can maintain our advantage over digital because we can benefit from scaling. We can also now efficiently integrate all of the digital infrastructure you need around this analog in-memory compute to make usable architectures.

This comes to the second point: The critical thing now is making sure that no user has to suffer from the burden of this new design. The requirement is to build an architecture around this efficient analog technology that allows people to run whatever AI model they want to run. We can run them in the same way people would for any other chip, like an NVIDIA chip.

This business of software design and architecture design is really important: to make sure that this sort of extraordinarily efficient and high-performance technology doesn’t impose any new burdens. It needs to look completely transparent and give you a boost in your AI capabilities. Having this digital infrastructure around is critical to building that architecture. The reason AI has captured our attention is because of those innovations in AI. However, if you’re doing something that gives you higher efficiency but puts a barrier to innovation, it won’t be useful.

What challenges did you face spinning EnCharge out of your lab at Princeton?

Universities have a powerful advantage in that we have the time and space to think deeply about fundamental problems and come up with really differentiated, transformative solutions. That’s what happens at universities; they are wonderful places for that reason. The big challenge that needs to be overcome is keeping the problem you are working on well-aligned to problems and markets in the real world. This way, solutions can benefit real people and companies. In those six years at the university that we spent, that happened through close collaboration and interaction with people in the industry who understood and could decipher core technology problems and solutions.

In my case, that involved working closely with people I had connected with through my research program’s deep industry ties. These folks became cofounders in the company – defining and making very clear to us the requirements and needs on the software level, architecture level, fundamental silicon level, manufacturability, and more. This is what enabled EnCharge and this technology to successfully move out of the lab and into the real world for commercial reality.

How would the world look different if everyone was using EnCharge’s analog and in-memory compute?

This amazing AI capability we’ve started to see with things like ChatGPT is just the tip of the iceberg. Imagine all the capabilities people are actively envisioning: how do we help improve productivity, health, safety, and experiences on all levels? There are entire ecosystems thinking about this. What happens with EnCharge’s technology is that AI-rooted innovations become accessible on a broader scale. They become accessible where they’re truly needed. In some cases, we need AI immersed in our environments – on our phones and laptops – because those are where we generate the most intimate, critical data about our lives. We can provide a cost-effective, environmentally sustainable, and accessible solution for AI. With EnCharge’s technology, all the incredible AI innovations people are imagining can become available in ways that are truly needed.