Trident 4 boosts enterprise switch capacity to 12.8 terabit
Thursday, July 11, 2019 at 9:49AM
Roy Rubenstein in Bob Wheeler, Broadcom, LightCounting, NPL, P4, Peter Del Vecchio, The Dell'Oro Group, The Linley Group, Tomahawk 3, Trident 4, semiconductors

Broadcom has unveiled the Trident 4, its latest family of switch chips for the enterprise. 

The largest-capacity Trident 4 family member, the X11 chip, has a switching capacity of 12.8 terabits. This is a fourfold increase in capacity compared to Broadcom’s current high-end enterprise chip, the Trident 3, announced in June 2017.  

The Trident 4 will also reduce the cost of a 128x100-gigabit switch by a factor of four. The current cost of a 12.8-terabit switch, a multi-chassis solution, is $245,000 not including the pluggable optics, says Broadcom, citing market research firm, The Dell’Oro Group.

“The announcement is significant both in updating the Trident line for enterprise and in adding compiler programmability thereby limiting the openings for competitors such as Barefoot - soon Intel - Innovium, and Marvell,” says Bob Wheeler, vice president of The Linley Group and principal analyst for networking.

The X11 matches the capacity of Broadcom’s leading Tomahawk 3 switch chip used by the hyperscalers in their data centres. 

Peter Del Vecchio

“With the Tomahawk line, the goal is always to lead in terms of bandwidth as far as gigabits-per-second/ Watt,” says Peter Del Vecchio, product line manager for the Trident 4 at Broadcom.

In contrast, enterprises need to support a variety of networking protocols so that while the Tomahawk is a streamlined design, the Trident family requires a broader feature set.

“The Tomahawk 3 was Broadcom’s first 12.8-terabit chip and we were able to achieve that in 16nm CMOS,” says Del Vecchio. “To achieve 12.8 terabits in a monolithic implementation with Trident class functionality, we moved to 7nm."


Trident 4 programmability

The first 12.8-terabit Trident 4 chip has been in the hands of customers since May. 

Broadcom will offer a family of Trident 4 devices. The company is starting at the high-end with its 12.8-terabit X11 and 8-terabit X9 and will offer subsequently the 4-terabit X4 and 2-terabit X3 devices. 

“Trident 4 matches the bandwidth of the Barefoot Tofino 2, Innovium Teralynx, and Marvell Falcon,” says Wheeler. “That is overkill for many enterprise applications, which is why Broadcom will offer lower bandwidth variants, as Marvell does.”

The company has also enhanced the switch-chip programmability with the Trident 4.  

Many Trident 3 customers see it as a fixed-function chip and use the features provided by Broadcom. 

Trident 3, while programmable, does not come with a compiler. it requires Broadcom’s engineers to handcraft the images that go into the device’s programmable elements. Broadcom thus provides the software-image upgrades that are implemented in the field.

“We take input from customers that have an evolving roadmap and add features to the Trident 3 line,” says Del Vecchio.

For the Trident 4, Broadcom has developed a compiler suite and is making available the source code for the base images used for the device. Switch vendors and enterprises can use the compiler to tweak the source code to add proprietary features and differentiate their applications.

“Programmability also provides future-proofing for both OEMs and end customers, extending the life of Trident 4-based systems,” says Wheeler.

Broadcom uses its high-level Network Programming Language (NPL) to program the Trident 4 and has made available the language’s specification. It is also open-sourcing the front-end compiler and a behavioural simulator.


P4 and NPL

There is already an open-source programming language for switch silicon, dubbed P4, overseen by the Open Networking Foundation (ONF).

Broadcom says that its NPL language is not competing with P4. “We have participated in P4 and have given a lot of detail as to what we have in NPL,” says Del Vecchio. “Our hope is that the industry eventually converges.”

However, the limitations of the P4 language means that the company cannot wait for the open community to adopt everything it needs to get Trident 4 to market, says Del Vecchio, who stresses that switch makers choose their chips based on bandwidth, efficiency and protocol support, not the language used for the chip’s coding.

“This is not something where you have millions of lines of code,” says Del Vecchio. “If someone wants a priority feature, like a tweak to a header format, they tackle the existing stack and code a few tens of lines.”


The Tomahawk 3 was Broadcom’s first 12.8-terabit chip and we were able to achieve that in 16nm CMOS. To achieve 12.8 terabits in a monolithic implementation with Trident class functionality, we moved to 7nm.


Images and code migration 

The main image that customers will use with the Trident 4 is the Datacenter NPL Application (DNA). This application supports a comprehensive set of protocols similar to Trident 3, with added features available with the Trident 4.

Broadcom also includes additional Trident 4 applications including support for large-scale network address translation (NAT), load-balancing, and distributed denial-of-service (DDoS) protection. 

But Del Vecchio says that the DNA image will meet most customers’ needs whether for data centre use cases and for general enterprise networking.

Broadcom has also developed a legacy application programming interface (API) mapping layer to its compiler code to aid migration of Trident 3 software to the Trident 4 (see diagram below).

The API mapping layer translates legacy APIs used for the Trident 3 into logical table-based calls, what Broadcom calls its Software Development Kit Logical Table (SDKLT) APIs. 

Logical tables are used to expose the switch’s information such as MAC addresses, layer 3 route tables and the device’s ternary content-addressable memory (TCAM). The switch chip’s behaviour is managed by accessing these logical tables via the APIs. 

The network operating system (network OS) shown in the diagram is used to control the switch in real-time, using the functionality defined in the NPL code. The API mapping layer generates the APIs needed for the network OS to control the Trident 4.  

Source: Broadcom



The Trident 4 includes enhanced instrumentation that is used to collect data to understand how the chip and the network are behaving. 

“There is no defacto standard [regarding instrumentation],” says Del Vecchio. Broadcom, therefore, supports two approaches: in-band telemetry and streaming telemetry. 

With in-band telemetry, information - metadata - is added to a packet as it passes through the network. Once the packet arrives at the endpoint - which can be a switch, a network interface card or a hypervisor - the metadata is stripped off to understand what happened to the packet on each of its hops in the network.

With streaming telemetry, in contrast, the switch chip collects statistics regarding latency, queue levels and the state of congestion, which are sent to a ‘collector’.

“Some companies want one [approach] more than the other,” says Del Vecchio. “With the Trident 4, we do both.”

Broadcom has also added real-time programmability to the Trident 4 that allows device changes on-the-fly.

“Now you can tweak things in the network, be it instrumentation or packet flow, and you don’t have to wait for the switch to come down for the upgrade,” says Del Vecchio.      

The capability can be used to modify the switch’s behaviour without packet loss. For example, if a DDoS attack is suspected, or the real-time programmability can be used to tune applications and the network.

Broadcom stresses that the Trident 4 is a monolithic die whereas other 12.8-terabit chips announced by competitors, such as the Barefoot Tofino 2 and the Marvell Falcon, are a multi-chip design comprising a main die and chiplets that support the input-output.


Monolithic versus a multi-chip design

A monolithic chip has advantages in terms of time-to-market in that there is no interposer to debug, less total silicon area as there is no need for ASIC-chiplet interfaces, claims Del Vecchio. That said, Broadcom does agree that the use of chiplets is where switch design is going and is an approach Broadcom is investigating.

“All else being equal, such as memories, etc., we believe a monolithic die will offer the lowest power and the lowest cost,” says Wheeler. 


Source: Broadcom


Switch pipeline

The Trident 4’s packet-processing pipeline comprises input (ingress) and output (egress) paths (see chip diagram above).

“With the Trident 4, you do have the flexibility where you can take resources in the pipeline and tie them to a separate set of features,” says Del Vecchio. “With other architectures, you might not be able to do that and also support a full protocol stack.”

The Trident 4 uses what Broadcom calls a tile-based architecture.

Three tile types are used: Hash tiles, Index tile and TCAM tiles, the names referring to the tile’s memory and how it is accessed. Hash and Index tiles’ table state being held in SRAM while the latter uses TCAM. 

“For Hash tiles, the SRAM access is performed using a hash computation,” says Del Vecchio. “For the Index and TCAM tiles, the access is a direct lookup into the table state.”

The tile is where a look-up table resides and where the programmable key generation and programmable policy reside. The key is used to access the SRAM or TCAM, says Del Vecchio. It is generated using the tile's key-generation logic with the function performed by the logic being specified by a user's NPL program. “The inputs to that function are packet fields or metadata from previous pipeline stages,” says Del Vecchio.

Meanwhile, a tile’s policy can dictate when a packet is dropped, traced, mirrored, modified of has its forwarding changed.

The Trident 4’s pipeline also includes special-function hardware engines that operate at line rate. These are used for common networking applications such as quality of service, equal-cost multi-path (ECMP) load balancing, and hash computations.

“We also have a special function which we call flex switch logic that allows you do optimised arithmetic such as Boolean applications and weight-based decisions,” says Del Vecchio.  “The special functions and the flex switch logic is what makes the Trident 4 differentiated from other programmable architectures.”

The Trident 4 also has 256 serialiser/ deserialisers (serdes), each running at 50 gigabits-per-second (Gbps), similar to its Tomahawk 3. The Trident 4 is also pin-compatible with the Tomahawk 3, minimising the switch-making development effort. 

LightCounting Market Research expects to see 128-port 100GbE switches with a single Trident 4 ASIC. Indeed, Broadcom expects the first Trident 4-based systems to be available as early as the year-end or the first quarter of 2020.

The largest enterprises will be able to move to 200 Gigabit Ethernet (GbE) or 400GbE whenever they wish. All this bodes well for sales of 100-gigabit and faster Ethernet optical module, says LightCounting. 

Article originally appeared on Gazettabyte (
See website for complete article licensing information.