feature article
Subscribe Now

The Match Game

Netlogic Speeds DPI by Accelerating Text Pattern Matching

Chester was a real stickler for grammar. It started innocently enough: he would review his own memos a couple of extra times to make sure they were right. Then he started cracking down on his staff: he wanted them all to be as careful about their prose as he was about his. And he was reviewing their stuff. And that worked, more or less.

But the problem was, he was kind of OCD about everything he read. A memo might come in describing an incredible new bonus program that was going to net him thousands of dollars, but misplaced commas would distract him, and he would completely miss the main message. He seemed to be able to comprehend only grammatically pure materials. He just couldn’t let it go.

So one day he decided he’d had enough. It was one thing for him to clean up the work of his staff, but now he decided he was done reading other people’s shoddy stuff. He directed that all memos and all emails be reviewed by his admin before being sent to him. He compiled, off the top of his head, a list of rules. And he would accompany some of them with a rant to illustrate why the rule mattered.

A typical example might be, “The word ‘only’ should be placed only in front of the thing that it modifies. I just read an email where it says, ‘It will only take a minute.’ That is incorrect usage of the word ‘only.’ If I read that right, it’s saying it will only take – not give or borrow or donate or fricassee – a minute. That makes no sense! What it should say is, ‘It will take only a minute.’ Not an hour, not a picosecond; a minute. THIS STUFF MATTERS, PEOPLE!!”*

Of course, there were only so many rules that he could come up with at a time. So, as emails got to him with problems not covered in the rules, he would make a note and add them to the rule set. He figured that, at some point, it would be water-tight, and he’d never again have to stumble over anything that broke his rules. If an email or memo came in and failed the test, his admin sent it back for correction: it would never get to his desk until clean.

This had an immediate effect on his workload: because he was no longer burdened with reviewing documents, he had far more time on his hands. In fact, things got easier and easier, and he was feeling rather pleased with his newfound liberation. Until he noticed his admin’s desk and computer desktop: piles of memos and hundreds of emails were stacked up awaiting review. It wasn’t that he no longer had as much work to do; it was that everything was stuck in grammar review, and his admin couldn’t keep up.

It was only when he missed a mandatory corporate strategic offsite meeting (the invitation had asked attendees “… to please be prompt”, and Chester had decided that, grammarians’ disagreements on the point notwithstanding, split infinitives were an evil up with which he would not put) that he decided that he needed to accelerate the grammar rule-checking process.

 

Deep packet inspection (DPI) is the unglamorous process of peering into packets public and private to make sure that there’s nothing problematic lurking in there. “Problematic” typically refers to evil things like viruses and malware and Trojan horses (although there’s nothing to say that it couldn’t be extended to include pejorative comments about a government or company).

We recently looked at one aspect of the process of DPI, Netronome’s notion of flow processing. However, we really looked at acceleration of DPI by managing the rest of the process – flow processing, in this case. But we didn’t deal with the actual deep inspection of packets.

We also took a brief look at Snort, an open-source rule-processing engine. But that’s only one particular engine, and its complexity is limited by constraining the kinds of rules that can be expressed. One can formulate more complex search patterns than Snort can handle, but then the pattern-matching engine must also be more sophisticated.

You may recall that rules tend to consist of two parts: a pattern to match and then an action to take based on a match. You search for text having a particular characteristic and, if you find it, then you do something – and that something will depend on what’s being searched.

You take the action only if a match happens, which, hopefully, isn’t too often. But there are thousands and tens of thousands of possible things to look for to decide if a packet is good. Having to run all those rules takes time, but the performance issue isn’t with the action – a host processor can probably handle that; the problem is with all the string matching patterns from all the rules.

String matching is probably one of the exercises you used for your first software state machines in your undergrad programming course. There’s actually an intimidating name for the kind of state machine that parses “regular languages” or “regular expressions” – including, in particular, the apparently somewhat misnamed “Perl-compatible regular expressions (PCREs)”: a deterministic finite automaton (DFA). Every home should have one.

There’s a whole body of mathematics behind this regular expression thing that I won’t even attempt to plumb here. (Because I’d have to understand it first.) Put simply, they are a way of expressing strings in a search – they’re the “re” in “grep,” one of Unix’s typically opaque commands, this one meaning, more or less, “find.” (Why use a simple common word when a made-up one will do?) The bottom line of this is that you can compile a set of string search patterns into a tree that can be processed by a DFA.

Netlogic has taken this approach one step further with their NETL7 family of what they call “knowledge-based processors (KBPs).” (They also have a Sahasra family of KBPs, but they’re very different.) They’ve integrated their own enhanced DFA, which they call their Intelligent Fabric for Automata (IFA), into a dedicated chip. Actually, they’ve integrated around 10 per chip (their Mike Ichiriu, VP of Systems and Applications Engineering, kept the exact number somewhat vague).

The “fabric for automata” nomer makes sense: it’s not like there’s one set of hard and fast rules that can be cast into hardware via a dedicated state machine. The rules are forever changing, so any attempt to deal with this must allow for any state machines – or automata – within the defined scope to be implemented in the KBP fabric.

The KBP consists of logic and memory, including some packet buffering memory. The engine itself is very tightly coupled with internal memory for the stored patterns. The KBPs store only the pattern-matching part of the rule, not the action portion.

You can stream a packet through the engine either by passing a pointer to the packet or by actually encapsulating the packet in an “instruction” that gets sent to the KBP. You can also check content across packet boundaries, since most long messages end up being fractured into multiple packets. This avoids the chance that something untoward sneak in with its head in one packet and its tail in the next.

Companies can add their own rules – and, in this case, it’s not typically going to be the system builder that adds rules, but the service provider using the system. So the mechanism has to be particularly straightforward, because it’s several degrees removed from anyone familiar with the dirty details of DPI.

The idea is that these chips can accompany the packet-processing chips that manage the actual network traffic. DPI would normally be done by host processors in the “slow path,” since the dedicated packet or flow processors in the “fast path” can’t do it, given their laser-like focus on packet routing. But if practically every packet must be scanned, one could almost argue that DPI needs to be added to the fast path.

Typically, however, packets are sent out of the traditional fast path for checking, since there’s no host-style processor in the fast path. Offloading is intended to make that portion of the slow path faster. The dedicated engine can process the rules at rates ranging from 250 Mbps to 20 Gbps (depending on the device), much more quickly than a straight software implementation would be able to.

Given a set of rules for English grammar, that might even be even fast enough to process all of Chester’s incoming emails and memos. Except for one problem: the rules for the English language are anything but regular…

 

*Full disclosure: I violate this rule in my drafts all the time. Our editor apprised me of the rule, and I kept failing it so often that I started doing my own “only” scans before submitting…

Leave a Reply

featured blogs
Mar 28, 2024
'Move fast and break things,' a motto coined by Mark Zuckerberg, captures the ethos of Silicon Valley where creative disruption remakes the world through the invention of new technologies. From social media to autonomous cars, to generative AI, the disruptions have reverberat...
Mar 26, 2024
Learn how GPU acceleration impacts digital chip design implementation, expanding beyond chip simulation to fulfill compute demands of the RTL-to-GDSII process.The post Can GPUs Accelerate Digital Design Implementation? appeared first on Chip Design....
Mar 21, 2024
The awesome thing about these machines is that you are limited only by your imagination, and I've got a GREAT imagination....

featured video

We are Altera. We are for the innovators.

Sponsored by Intel

Today we embark on an exciting journey as we transition to Altera, an Intel Company. In a world of endless opportunities and challenges, we are here to provide the flexibility needed by our ecosystem of customers and partners to pioneer and accelerate innovation. As we leap into the future, we are committed to providing easy-to-design and deploy leadership programmable solutions to innovators to unlock extraordinary possibilities for everyone on the planet.

To learn more about Altera visit: http://intel.com/altera

featured chalk talk

GaN Solutions Featuring EcoGaN™ and Nano Pulse Control
In this episode of Chalk Talk, Amelia Dalton and Kengo Ohmori from ROHM Semiconductor examine the details and benefits of ROHM Semiconductor’s new lineup of EcoGaN™ Power Stage ICs that can reduce the component count by 99% and the power loss of your next design by 55%. They also investigate ROHM’s Ultra-High-Speed Control IC Technology called Nano Pulse Control that maximizes the performance of GaN devices.
Oct 9, 2023
22,474 views