|Intel in the Cloud Post-Moore’s Law
Here at its headquarters, Intel projected an upbeat image at a data center event, and it showed how profoundly the company is changing with the times.
The x86 giant is increasingly relying on a basketful of technologies to deliver performance increases that it used to get with a turn of the crank at its fabs. And it is offering its customers a fat cookbook of systems and silicon recipes in place of its old formula of the next big CPU.
These days, what’s most interesting at Intel is its work in memories, machine learning — and some of its rock star engineers like Jim Keller. All three were on vivid display at the event.
Details of the next big processors — Cascade Lake, Cooper Lake, and Ice Lake — were part of the event. But part of that news was their added AI features, and they stood alongside Optane DIMMs now shipping to Google Cloud and plans for smart networking cards.
Wall Street analysts were hungry for information on Intel’s much-delayed 10-nm process. The event’s host, Navin Shenoy, general manager of Intel’s data center group, deflected the questions that a decade ago would have been a cue for an Intel exec to brag.
“I don’t talk to customers about nanometers; customers care about delivered system performance on their workloads,” said Shenoy, noting that Intel delivers that through many vehicles.
Intel’s Select Solution program, launched last year, can configure and test systems tailored for an expanding set of workloads including blockchain hashing or deep learning on Apache Spark.
“You just say you want chicken marsala and we send you all the ingredients,” said one exec, describing the program as the tech equivalent of the HelloFresh service.
The menu already incudes FPGAs and a handful of machine-learning accelerators as well as x86 CPUs. Next year, it will also include a new GPU designed by a team led by Raj Koduri, a graphics veteran from Apple and AMD.
Intel’s new GPU won’t be a clone of Nvidia’s popular Volta, described by one Intel exec as “a Frankenstein of two chips,” because it has separate pipelines for graphics and deep-learning jobs that can’t run simultaneously. Instead, the Intel chip will focus on graphics and visual computing with some support for how machine learning is used in those apps.
Intel drove home to Wall Street analysts its vision of an expanding market. Click to enlarge. (Chart: Intel)
Like most chipmakers these days, Intel is sprinkling a bit of AI support everywhere. It’s a major spice for the CPU roadmap.
Cascade Lake will ship late this year with a new vector instruction that packs three multiply-accumulate tasks into one. It helps deliver what the company claims is an eleven-fold boost over its current Skylake Xeon on inference jobs.
Late next year, the 14-nm Cooper Lake version of Xeon will support bfloat16, an emerging floating-point format that Google and others are backing. Intel’s Lake Crest deep-learning accelerator “was going down another format road, but bfloat has emerged as an industry standard that customers want,” said Naveen Rao, general manager of Intel’s AI group.
Intel’s first 10-nm server chip, Ice Lake, will ship sometime in 2020, using system elements in common with Cooper Lake to ease the job of upgrading to it. All three chips show that Intel is now making stepwise process and microarchitecture advances annually.
When Moore’s Law was running at a faster pace, it used to introduce a new node one year and a new chip design the next. Now the Cascade, Cooper, and Ice Lake chips will all benefit from some design and process changes, and Ice Lake will follow the first 10-nm PC CPUs due in late 2019 by “much less” than Intel’s traditional 12 to 18 months, said Shenoy.
Among other interesting bits about Cascade Lake, it will sport higher frequencies, hardware security to fend off Meltdown/Spectre attacks, and come with smarter network cards that combine Ethernet and FPGA silicon. But the most interesting bit about Cascade is a new memory controller that enables DIMMs with Intel’s non-volatile Optane memory chips.
Optane will enable up to 3-TBytes main memory per socket and help boost performance on a range of apps that Intel and partners are still exploring. Shenoy declined to estimate what percent of Cascade chips will ship with Optane DIMMs.
“We don’t fully know yet how customers will innovate on Optane,” he said. “I don’t expect that it will be a niche — it will be broadly used.”
Google Cloud will be among the first to put Optane through its paces, with the DIMMs shipping to its data centers now for qualification. To date, the chips have proven hard to make and slow to be used for storage in solid-state drives, in part due to the limits of the PCI Express bus that the drives ride.
In tandem with its Select Solutions program, Intel is cracking the door a bit wider on its plans to customize chips. Today, half of the Xeons that Intel sells to cloud service providers are “off-the-roadmap” versions that adjust some Xeon parameter to suit a workload.
With the Cooper Lake generation, Intel aims to be more open about letting customers specify a chip that might add a new block as part of what it calls a semi-custom design. It is also stating that it is willing to make full ASICs but not providing details of any specific plans or engagements.
Google got the first “off-the-roadmap” Xeon, a version of Sandy Bridge, in 2008. Facebook used a proprietary version of the Xeon D geared for networking jobs. Amazon Web Services uses a Xeon modified to sustain a 4-GHz turbo mode on all cores for its z1d instance, said Raejeanne Skillern, general manager of Intel’s group that caters to cloud service providers.
Intel optimized a CPU “to reconfigure on the fly to fit our apps,” said a tech exec at Oath, a data center subsidiary of Verizon, speaking in a video here. Skillern implied that China’s Tencent also uses a unique Xeon to run inference jobs on its WeChat app. Her team includes 200 engineers working on 150 projects in person at customer data centers.
Two highlights from the event were a brief stage appearance by Jim Keller, a veteran microprocessor designer recently hired away from Tesla, and a lunch conversation with Naveen Rao, Intel’s AI guru-in-chief.
Shenoy introduced Keller as an engineering rock star, a term that the company — and EE Times — once used in marketing campaigns. Keller played along, saying that he loved using the term, especially with his kids, but offered few specifics about what he is doing in his new role.
“I started working on 14-nm and 10-nm products and getting up to speed with the fab guys on where we are in 14 and 10,” he said, seeming, as in the past, a bit uncomfortable in the spotlight. “The technology is so good … [we’re] focused on performance and yield and what it takes to get the products out.”
“Long term … the AI revolution is really big, and its impact is showing up everywhere … how can we have a coherent view of CPU, GPU, and AI acceleration and their memory and system demands from half a watt to megawatt problems … that’s pretty exciting,” he said, tipping his hat to the company’s engineering excellence.
Over a small group lunch, Naveen Rao, whom I consider a rising engineering rock star, schooled me in more detail on AI.
He believes that Intel can drive a standard for a high-level abstraction layer that rationalizes the differences in neural networks as expressed by today’s many frameworks. It would ride below the ONNX inference API that Microsoft helped launch and above the hardware abstraction layer that folks at the Khronos group aim to set.
Google and Baidu have similar ideas with their XLA and Anikin, respectively. But Rao believes that Intel is a bit further down the road with its NGraph. “We see an opportunity to lead here,” he said.
Asked for his picks among the dozens of AI accelerator startups, he noted that he acts as an advisor to Mythic, one of several companies exploring the concept of a processor-in-memory. “PIM is still a bit of a science project, but it could be a game changer,” he said.
Overall, “the 10x opportunity is gone” now that Nvidia and others, including Intel’s Nervana group, are starting to deliver hardware acceleration for matrix linear algebra. However, there are still opportunities, especially in areas such as edge networks, where reinforcement learning and other techniques are blurring the line between training and inference jobs, he said.
Training at the edge will spawn problems, too. “There’s a management nightmare if these devices evolve to become different” he said. “How do you determine if they are within spec?”
At a practical level, the AI surge has bloated salaries for anyone with neural-network expertise, making hiring a challenge. The pressure can pit chip vendors and their data center customers in bidding wars for talent, said Rao.
He also spoke a bit about his baby, Spring Crest, aka the Intel Nervana NNP L-1000, the 2019 follow-on to the abandoned Lake Crest accelerator for training and inference. It will still use HBM memory on a 2.5D chip stack, though a follow-on may shift to Intel’s EMIB packaging. It will support more data formats, primitives, and batch sizes than Google’s TPU, the first custom deep-learning accelerator.
Some Spring Crest systems may use Optane DIMMs as an option. It or a follow-on will also sport innovations dealing with sparse data sets, baking into hardware some of the features of the open-source program Distiller, letting users move zeros around in matrices in ways that bolster performance.
In his talk to the group, Rao exuded the enthusiasm that one would expect for a neuroscientist-turned-engineer at this stage of tech history.
“Computing is in a new architecture phase, which doesn’t happen very often,” he said. “We’re in the top of the second inning in AI … the whole stack is continually evolving. I haven’t seen anything move this fast in my career.”
It was a good reminder that, even as Moore’s Law fades into history, new things emerge.
I suspect that at least some Wall Street analysts left the meeting keeping Intel at a hold, maybe even a sell. Its stock price has been on a slow slide since early June (about the time that BK resigned), after it hit $57, its second-largest peak since the dot-com boom.
Veteran analyst Linley Gwennap noted that the event gave no deep insights into Intel’s struggles getting 10-nm chips out the door or exactly how it will recover from missing the boat with its Lake Crest AI chip.Xeon unit sales to enterprise customers were down 4% on average from 2014 to 2017, but they are back up 6% so far this year, said one Intel exec. It’s a much-watched franchise that has generated $130 billion in revenues to date from a family of chips that sell for prices ranging from $213 to $10,000 — a high watermark, especially in these days of the internet of things.