Marvell’s CTO: peering into the future is getting harder
Thursday, April 13, 2023 at 10:38AM
Roy Rubenstein in 3nm CMOS, CTO interview, CXL, Gordon Moore, Marvell, Noam Mizrahi, Universal Chiplet Interconnect Express (UCIe), artificial intelligence, books, co-packaged optics, disaggregation

CTO interviews part 4: Noam Mizrahi

In a wide-ranging interview, Noam Mizrahi (pictured), executive vice president and corporate chief technology officer (CTO) at Marvell, discusses the many technologies needed to succeed in the data centre. He also discusses a CTO’s role and the importance of his focussed thinking ritual.


Noam Mizrahi has found his calling.

“I’m inspired by technology,” he says. “Every time I see an elegant technical solution - and it can be very simple - it makes me smile.”

Marvell hosts an innovation contest, and at one event, Mizrahi mentioned this to participants. “So they issued stickers saying, ‘I made Noam smile’,” he says.

Marvell’s broad portfolio of products spans high-end processors, automotive Ethernet, storage, and optical modules.

“This technology richness means that every day I come to work, I feel I learn something new,” he says.

 

Chip design

The interview with Mizrahi occurred before the passing away on March 24th of Gordon Moore, aged 94, who co-founded Intel.

In his article published in Electronics in 1965, Moore observed how chip transistor count doubled roughly yearly, what became known as Moore’s law.

The law has driven the semiconductor industry for decades and, like all exponential trends, is reaching its limit.

Since Marvell’s business is infrastructure ICs, it is experiencing the law’s demise first hand.

While the core definition of Moore’s law is ending, technology and process advancement are still enabling the cramming of more transistors on a die, says Mizrahi. However, greater processing performance and lower power consumption are occurring at a different pace and cost structure.

It is now very costly to make chips using the latest 5nm and 3nm CMOS process nodes.

The cost is not just the chip mask (reticle) but also such aspects as intellectual property (IP), architecture, design verification, electronics design automation (EDA) tools, and design validation.

Getting to the first product using 5nm CMOS can cost as high as $450 million, while for 3nm, the estimate is $600 million.

Also, development flow takes longer due to the complexity involved and will cause a redefinition of what is meant by a ‘current generation’ of a chip, says Mizrahi.

Design reuse is also increasingly required; not just reusing IP but the validation process in order to speed up a chip’s introduction.

In turn, designers must be innovative since processing performance and lower power consumption are harder to achieve.

Areas include package design optimisation, chip input-output (I/O), and the software to claw back processing performance that previously came from using the latest CMOS process.

IC designers will also be forced to choose which chips to make using the latest CMOS process node.

Overall, fewer chip companies will be able to afford chips made in leading CMOS processes, and fewer companies will buy such ICs, says Mizrahi.

 

Rise of chiplets

Chiplets will also play a role in a post-Moore’s law world.

“Chiplets are currently a very hot topic,” says Mizrahi.

A chiplet is a die implementing a functional block. The chiplet is added alongside a central die for a system-on-chip (SoC) design. Using chiplets, designs can exceed the theoretical limit of the mask size used to make a chip.

Marvell has long been a chiplet pioneer, says Mizrahi. “Today, it all seems reasonable, but when we did all that, it was not so obvious.” Marvell makes one chip that has 17 dies in a package.

Chiplets are particularly suited for artificial intelligence (AI) ASICs, what Mizrahi describes as ‘monsters of chips’.

Chiplets enable designers to control yield, which is essential when each 3nm CMOS chip lost to a defect is so costly.

Using chiplets, a design can be made using a mix of CMOS process nodes, saving power and speeding up a chip’s release.

Mizrahi applauds the work of the Universal Chiplet Interconnect Express (UCIe) organisation, creating chiplet standards.

But the chiplets’ first use will be as internally-designed dies for a company’s product, he says. Chip designers buying best-in-class chiplets from third parties remains some way off.

 

A CTO’s role

Mizrahi’s role is to peer into the future to identify the direction technologies will take and their impact on Marvell’s markets and customers.

He says a company-level longer-term technological strategy that combines the strengths of Marvell’s product lines is needed to secure the company’s technical lead.

“That is my job, and I love it,” he says.

It’s also challenging; predicting the future is hard, especially when the marketplace is dynamic and constantly changing. Technology is also very costly and time-consuming to develop.

“So, making the right decision as to what technology we need to invest in for the future, that is tough,” says Mizrahi.

Rapidly changing market dynamics are also challenging Marvell’s customers, who don’t always know what they need to do.

“Creating this clarity with them is challenging but also a great opportunity if done correctly,” says Mizrahi. “That is what keeps me motivated.”

 

Job impact

How does Mizrahi, Marvell’s CTO since 2020, assess his impact?

The question stems from a comment by Coherent’s Dr Julie Eng that assessing a CTO’s impact is more complicated than, say, a product line manager’s. On becoming CTO, Eng discussed with Coherent’s CEO how best to use her time to benefit the company. She also called other CTOs about the role and what works for them.

“I would say that my goals are tangible and clear, but the environment and the topics that I deal with are far less tangible and clear,” says Mizrahi.

He is required to identify technology trends and determine which ones need to be ’intercepted’. “What do we need to do to get there and ensure that we have the right technologies in place,” he says.

But how technologies play out is hard to determine and becoming harder given the longer development cycles.

“It’s critical to identify these technologies and their impact ahead of time to give yourself enough time to prepare for what must be done, so you can start the development in time for when the wave hits.”

 

Marvell’s strategy

Marvell’s company focus is infrastructure IC.

“We deal with the network, connectivity, storage, security, all the infrastructure around the processor,” says Mizrahi.

Marvell has been acquiring companies to bolster its technology portfolio and system expertise. The acquisitions include Cavium, Inphi, and Innovium. Last year, Marvell also bought CXL specialist Tanzanite Silicon Solutions.

“It's going to be very important that you possess all the components in the infrastructure because, otherwise, it is tough to design a solution that brings value,” says Mizrahi.

Being able to combine all the pieces helps differentiate a company.

“I'm not sure there are many other companies that possess all the components needed to make effective infrastructure,” he says.

 

Disaggregation

Mizrahi gave a talk at Marvell’s Industry Analyst Day last December entitled Disaggregation using Optics.

During the talk, he described how data centres have been flexible enough to absorb new use cases and applications in the past, but now this is changing.

“AI training clusters are going to require a different type of data centre,” says Mizrahi. “It is more like a supercomputer, not the same traditional server architecture we see today.”

His analyst day talk also highlighted the need to disaggregate systems to meet the pace of scaling required and remove dependencies between components so they can be disaggregated and scaled independently.

Compute Express Link (CXL) and memory is one such component disaggregation example.

The CXL protocol optimises several memory parameters in computing systems, namely latency, bandwidth, and memory semantics. Memory semantics is about overseeing correct access by several devices using a shared memory.

CXL enables the disaggregation of memory currently bound to a host processor, thereby not only optimising the performance metrics but reducing overall cost.

Mizrahi cites the issue of poor memory usage in data centres. Microsoft Azure issued research that showed half of its virtual machines never touch half the memory.  

“This means that memory is stranded when virtual machines are rented and are unavailable to other users,” says Mizrahi. “And memory is one of the largest spends in data centres.”

CXL enables memory pooling. From this pool, memory is assigned to an application in real time and released when workload execution is completed.

Pooled memory promises to save hyperscalers hundreds of millions of dollars.

“Of course, it's not easy to do, and it will take time, but that's just one motivation for doing things [using CXL].”

His analyst talk also stated how optics is the one media that addresses all the disaggregation issues: bandwidth, power, density, and the need for larger clusters.

“We're going to see an all-optical type of connectivity if you look far enough into the future,” he says. “Of course, not today and not tomorrow.”

Mizrahi’s talk also suggested that AI will need even larger scale computing than supercomputers.

He cites Tesla’s supercomputer used to train its autonomous vehicle neural network.

“If you look at what it is composed of, it is a supercomputer,” says Mizrahi. “Some say it's one of the top five or top 10 supercomputers, and its only purpose is to train autonomous vehicle neural networks.”

Last year, Meta also announced a supercomputer for training purposes.

Such AI training systems are the tip of the iceberg, he says.

“Ask yourself, what is a unit for a training cluster,“ says Mizrahi. “Is it eight GPUs (graphics processing units), 256 GPUs, 4k TPUs (tensor processing units), or maybe it is an entire data centre in one cluster?”

That is where it is all going, he says.

 

Pluggable modules and co-packaged optics

Co-packaged optics continues to evolve, but so are standard pluggable modules.

There is a good reason why pluggable optics remain in favour, and that will continue, says Mizrahi. But at some point, designers won’t have a choice, and co-packaged optics will be needed. That, however, is some way off.

In time, both these technologies will be used in the data centre.

Co-packaged optics is focussed on high-capacity networking switches. “And we are right in the middle of this and developing into it,” says Mizrahi.

Another place where co-packaged optics will be used, potentially even sooner, is for AI clusters.

Such co-packaged optics will connect switches to compose AI clusters, and, longer term, the GPUs will use optical I/O as their primary interface.

Such optical I/O helps meet bandwidth, power reduction, and power density requirements.

“Let's say you want to build a cluster of GPUs, the larger the cluster, the better, but these are so power-hungry. If you do it with electrical connectivity, you must maintain proximity to achieve high speeds,” says Mizrahi. “But that, of course, limits your ability to put more GPUs into a cluster because of power density limitations.”

Using optical I/O, however, somewhat eases the density requirement, enabling more GPUs in a cluster.

But there are issues. What happens if something fails?

Today, with pluggables, one link is affected, but with co-packaged optics, it is less simple. “Also how do you scale production of these things to the scale of a data centre?” says Mizrahi.

These questions will ensure the coexistence of these different solutions, he says.

But AI is driving the need for the newer technology. Mizrahi cites how, in data centres, high-end switches have a capacity of 25 terabits while servers use a 50-gigabit interface. “That means, if for simplicity we ignore topologies and redundancies, you can connect 500 servers to that switch,” he says.

GPUs today have a 3.6 terabit-per-second full duplex I/O connectivity to talk to their peer GPUs.

“It only takes seven GPUs to saturate that very same [25.6-terabit capacity] switch,” he says. “The bandwidth requirement, it just explodes, and it's going to be very hard to keep doing that electrically.”

This is why co-packaged optics will be needed. 

 

Typical workday

Mizrahi is based in Israel, whereas Marvell’s headquarters is in Santa Clara, California.

“It [Israel] is the centre of my life and where my family is,” says Mizrahi. “I travel a lot, to the point where I think my biological clock is somewhere over the ocean.”

His day spreads across many time zones. Early morning calls are to the Far East before he turns to local issues. Then, his afternoon coincides with morning US Eastern time, while his evening aligns with morning US Western time.

That said, Marvell’s CEO repeatedly emphasises his desire for all employees to balance work and family.

“He encourages and insists to see that happen, which helps me keep a balance,” says Mizrahi.

 

Prime focus time 

Mizrahi loves sports and is a keen runner.

He ensures he does not miss his seven or eight-mile daily run, even on days when he has a long flight.

“Every morning, it is my alone time,” he says. “It’s when I let my brain work, and it is my prime focus time.”

He is also a family man and has three children. He is keen to spend as much time as possible with his wife and kids.

“It’s not going to be long before they [the children] start their journey away from home, so I try to cherish every minute I have with them,“ he says.

He reads a lot, including technical material. “I told you, I’m inspired by technology.”

The sticker

He cites two recently read books.

One, in Hebrew, is called Red Skies by Daniel Shinar.

“It talks about a friendship between two young guys from two sides of the fence,” he says. A friendship that proves impossible due to the reality of the situation.

The second book, one he found fascinating and meaningful, was part of a training course given at Marvell, called The Leadership Challenge by James Kouzes and Barry Posner.

“It gives you practices that the authors see as key for exemplary leadership, and it gave me so many things to think about,” he says. “To recognise things in my behaviour or other people, I view as leaders.”

Article originally appeared on Gazettabyte (https://www.gazettabyte.com/).
See website for complete article licensing information.