| The page can be
accessed at two URLs: |
http://www.microprocessor.sscc.ru/Merced/
- direct satellite channel to Hamburg, Germany
http://www2.ssd.sscc.ru/microprocessor/Merced/ - fiber line to St. Petersburg, Russia, then to Finland |
|
Merced Facts and Speculations
Supercomputer Software Department RAS
| "This truly uncompromising article in-depth analyzes main features
and real novelties of forthcoming processor. It gives a chance to you,
learning a processor, to have your own opinion on it."
Oleg Yu. Repin, maintainer of the VLSI Microprocessors
|
|
| Contents | |
|
|
| Introduction | |
| Merced is code name of Intel's general purpose 64-bit
microprocessor, which is currently under development. It is scheduled for
production in mid-2000. The processor should be fabricated on the basis
of 0.18 micron process technology. Intel Corporation expects to begin sample
production in 1999.
The processor is named after Merced city, located near San Jose, Calif., USA. Merced should be the first member of new IA-64 family. IA-64 stands for Intel 64-bit Architecture. IA-64 implements EPIC (acronym from Explicitly Parallel Instruction Computing) concept. EPIC is jointly defined by HP and Intel, they claim EPIC to be fundamental architecture technology, analogous to CISC and RISC. IA-64 includes new 64-bit instruction set which is also jointly developed by HP-Intel. In official HP-Intel's announcements the new instruction set also called as 64-bit Instruction Set Architecture (64-bit ISA). In addition, Merced should provide full compatibility with Intel's x86 family. Intel officials often use IA-32 (abbreviation from Intel 32 bit Architecture) instead of x86. Today it`s known about two IA-64 processors under development
|
|
| Chronology of events | |
| Hewlett-Packard and
Intel announced their joint research-and-development
project in June of 1994. Aimed at providing advanced technologies for end-of-the-decade
workstation, server and enterprise-computing products, the two companies
efforts include development of the 64-bit instruction set and compiler
optimization.
Two years later, in 1996, HP produced its first 64-bit general purpose processor named PA-8000. It was the first member of new PA-RISC 2.0 family. It is naturally to assume PA-RISC 2.0 to be the result of the joint R&D 64-bit instruction set project. The more so, as PA-8000 implements two of key IA-64 features - predication and speculation. But there is no official information which confirms the assumption. In October 9, 1997 Intel Corporation announced ([1]) that
Also the same day at the Microprocessor Forum there was a presentation by Joel Birnbaum, Director of Hewlett-Packard Laboratories, Senior Vice President of Research and Development. He presented a short retrospective of the architecture work at HP from the early '80s until their decision to form the HP-Intel IA-64 alliance in 1994. According to Joel Birnbaum, the result of HP Labs research, known internally as Wide-Word and then as SP-PA, Super-Parallel Processor Architecture, served as the starting point for the Intel alliance; Bill Worley of HP Labs has headed the stages of both the Precision Architecture and Wide-Word efforts. Joel Birnbaum said Wide-Word included such features as statical parallelism, speculation, predication, mechanisms to enable number and speed of functional units to scale. Also Joel Birnbaum explained decision to make alliance with Intel. It is too long to place it here. Rajiv Gupta was mentioned as HP Labs' technical lead in HP's collaboration with Intel. On May 29, 1998 Intel Corporation announced ([2]) a change in the production schedule of the Merced processor. According to the announced, planned production volumes are moved from 1999 to mid-2000. Intel Corporation expects to begin sample production in 1999. The announcement does not contain information on Merced's architecture and technology process. On October 12-15, 1998 Microprocessor Forum was held. At the presentation "IA-64 Processors: Features and Futures" Intel`s Stephen Smith provided some insights into the IA-64 products and their features. |
|
| EPIC, IA-64, Merced | |
According to HP and Intel, EPIC concept includes
all VLIW advantages and does not include the disadvantages. John Crawford
([3])
revealed the following EPIC features:
HP and Intel officials claim EPIC to be the next generation concept. They opposed EPIC to CISC and RISC architectures. In their opinion ([4]), traditional microprocessor architectures have fundamental attributes that limit performance. But some RISC processor makers don't share such a pessimistic opinion ([6]). By the way, in 1980's, when RISC concept has appeared, there were many assertions that "CISC ran out of gas" and that CISC has fundamental attributes that limit performance. But processors, recognized as CISC, are still widely used (e.g. Intel x86 family). Their performance still increases. In fact, all the abbreviations - CISC, RISC, VLIW mean idealized concepts
only. It's difficult to classify real microprocessors. Present-day microprocessors,
reckoned among RISC, differ greatly from the first processors of RISC architecture.
The same with CISC. Most perfect processors implement a lot of successful
ideas not depending on concepts they came from.
|
|
| IA-64 features | |
IA-64 registers:
IA-64 instruction format:
Let's reckon all combinations of three instructions in a bundle:
i1 & i2 || i3 - first i1, then i2 and i3 executed in parallel i1 || i2 & i3 - i1 and i2 executed in parallel, then i3 i1 & i2 & i3 - i1, i2, i3 executed serially
By the way, EPIC bears striking resemblance to Texas Instruments' TMS320C6x's VelociTI architecture. A good example is TMS320C6201 DSP processor. The processor contains 32 general purpose registers - this is not a small number. It has 8 functional units - this is a large number even comparing to up-to-date superscalar processors. TMS320C6201 instructions are packed by compiler into instruction words containing 8 instructions along with the template. The template indicates dependencies between instructions - explicit parallelism. Each instruction has conditional field - predication. IA-64 family is not the only upcoming VLIW-like design of general-purpose CPU. For example, E2k (Elbrus-2000) processor is under development since 1992 in Elbrus, Russia. Elbrus's Chief Technology Officer, Associate Member of the Russian Academy of Science, Professor Boris Babaian says the processor will be two times faster than Merced's successor, McKinley. It is estimated E2k running at 1.2GHz will deliver 135 SPECint95 and 350 SPECfp95. There are more examples:
|
|
| Predication | |
| Predication is a method to handle conditional branches.
The main idea of the method - compiler schedules both possible paths of
the branch to be executed on processor simultaneously. Indeed, EPIC processors
would have a lot of functional units.
When an IA-64 compiler finds a branch statement in the source code it marks all the instructions that represent each path of the branch with a unique identifier called a predicate. Each instruction has a predicate field for that. When the CPU encounters a predicated branch at the run time, it will begin executing the code along both destinations of the branch. But it does not store the results while predicate registers values are not defined. After the condition is evaluated, the processor stores a 1 in predicate register which correspond to "true" destination and a 0 in another. Before storing the results, the CPU checks each instruction's predicate register. If the register contains a 1, the instruction is valid, so the CPU retires the instruction and stores the result. If the register contains a 0, the instruction is invalid, so the CPU discards the result. The ARM architecture from Advanced RISC Machines Ltd. (Cambridge, UK) has included a form of predication since its inception in 1980's. By the way, Intel Corporation has a license from Advanced RISC Machines to produce, sell and enhance the StrongARM (developed by Digital Corporation, DEC has licensed the ARM architecture) microprocessor family. All instructions of already mentioned TMS320 DSPs include conditional fields. Some instructions of HP PA-RISC are predicated. Describing the predication HP and Intel representatives often mention conference paper A Comparison of Full and Partial Predicated Execution Support for ILP Processors, that was done by Scott A. Mahlke, Richard E. Hank, James E. McCormick, David I. August, and Wen-mei W. Hwu from IMPACT Research Group which is located in University of Illinois at Urbana-Champaign. This paper was published in Proceedings of the 22nd International Symposium on Computer Architecture, Santa Margherita Ligure, Italy, June 1995. Currently some authors of the research are employed by HP corporation. In that study they measured how effective is predication at increasing performance. They had a hypothetical eight-wide machine. They found that, on average, about half of the branches allowed predication. Unfortunately, HP and Intel has not revealed how IA-64 processors will handle the second half of the conditional branches. Existing RISC processors use prediction and speculative execution along
with predication. They rather often predict correctly - in 95% of cases.
|
|
| Speculative loading | |
| Sometimes processors are idle while waiting for the
load from relatively slow memory to complete. Speculative loading mechanism
is aimed to reduce the processor idle times.
Using this mechanism, the load can be placed by compiler as early as possible in the code. Therefore when some instruction will need data from memory, the processor will not be idle. Such replaced loads would be named speculative loads. They will be indicated by a special way. The compiler will insert speculative load check instruction right before instruction using speculatively loaded data. If an exception occurs when the data is needed, the exception will be recognized in the load's original "home block." - when the processor will encounter speculative load check instruction. If, for example, the compiler carries load instruction out of the branch which is never executed, then the exception will be ignored. Usually processor designs implement L1, L2 and L3 caches to break dependence on memory latency. HP PA-8500, for example, has 1.5 Mbytes of L1 single-cycle cache on-chip. Instruction sets in both Sun UltraSPARC (SPARC version 9) and IBM POWER3 include prefetch instructions that explicitly tell the CPU to preload certain data and instructions into their L1 caches. HP PA-8xxx processors also implements speculative fetching of data. These prefetch instructions resemble the described speculative loads, don't they? |
|
| Other
Merced-related facts
|
|
| According to Intel Corporation' press
release, Merced would provide industry leading performance. More exact
official estimations are not announced yet. But then Intel announced 32-bit
Foster (x86-architecture), which will be equal to Merced in floating-point
performance. And even Merced's successor, McKinley, will be slower than
Foster in 32-bit integer calculations. So Intel itself said Merced will
not be a performance champion.
MicroDesign Resources` analyst team expects Merced to operate at speeds of around 800 MHz and to deliver 45 SPECint95 and 70 SPECfp95. In x86 mode, Merced could match the performance of a 500MHz Pentium. Performance results for 450MHz Pentium II are 17.2 SPECint95 and 12.9 SPECfp95. So Merced would run x86-code 3-5 times slower than native one. Alpha 21264 on 500MHz already shows 27.7 SPECint95 and 58.7 SPECfp95 performance results. It is possible to run x86-code on Alpha using FX!32 binary translator. Performance is decreasing by 3 times at an average. By the way, in 1997 Intel Corporation bought several Digital Equipment Corporation's licenses on Digital Alpha processor. Intel had to buy them to escape law-court punishment for illegal using Digital Alpha technological solution in its production. Probably Digital Alpha know-how greatly influenced upon forthcoming Merced design. D.H. Brown analyst Tony Iams says that the performance estimates he
has seen show that UltraSPARC will still have the advantage over Merced
in floating point performance. Iams says that UltraSPARC and Merced are
expected to be equal in terms of integer performance.
In general, Digital Alpha 21264, Sun UltraSPARC-III, IBM POWER3 are
recognized to become Merced competitors.
|
|
| Price | |
| It`s estimated the Merced chips will sell for about $5,000 each. | |
| 64-bit | |
| In 2000 Merced would be the first Intel developed
64-bit microprocessor. The very first 64-bit general-purpose microprocessor
is MIPS R4000. It was produced in 1992. Now MIPS is widely used in supercomputers,
servers, workstations and even in game consoles (e.g. Nintendo-64). Also
for several years 64-bit general-purpose microprocessors Digital Alpha
(1992), PowerPC-620 (1994), Sun UltraSPARC (1995), HP PA-RISC 2.0 (1996)
are widely used.
Moreover UltraSPARC also contains a number of 128-bit registers. |
|
| Operating frequency | |
| Linley Gwennap in [16] assumes the first Merced chip will operate at frequency of about 800 MHz. Digital Alpha 21164' operating frequencies is up-to 600 MHz; Alpha 21164 600 MHz is in serial production since 1997. In October, 1996 Exponential Technologies` PowerPC/750 MHz was demonstrated. In February, 1998 IBM Corporation demonstrated Xperimental PowerPC operating at 1GHz. | |
| Technology | |
| The processor will be produced on 0.18 micron process
technology, which is also under development currently in Intel Corporation.
Decreasing such a technology characteristic allows to reduce power dissipation,
to raise operating frequency, to enlarge scale integration. Enlarging scale
integration allows to place more functional units, registers, cache on
a processor. Currently all of the above 64-bit microprocessors are produced
on 0.35 and 0.25 micron technology. Intel Corporation uses 0.25-micron
technology to produce its 32-bit x86 processors.
The first Merced will be a cartridge-style module, including a CPU,
L2 cache and bus interface, said Merced director of marketing Ronald Curry.
The cartridge will employ a newly defined system bus, using concepts from
the Pentium-II bus.
|
|
| Compatibility | |
| Before the official Intel announcement in 1997 it
was expected that jointly developed HP-Intel architecture would provide
source compatibility with x86 and PA-RISC families. But now it's disclosed
that Merced implementing this architecture will run only software that
currently operates on x86 family.
EPIC and CISC concepts are contrary. While the EPIC implies explicit parallelism (therefore compiler parallelizes and schedules code), CISC implies implicit one (on-chip parallelization and scheduling). And the concepts are to be combined in the Merced design. That's strange. In the Microprocessor Report article Intel patent application titled "Method and Apparatus for Transitioning Between Instruction Sets in a Processor" is analyzed. The Intel application describes a processor, which is assumed to be Merced, that executes both x86 instructions and a second "64-bit instruction set," which is assumed to be IA-64. The Intel document describes a processor that can support operating systems and applications that use either or both instruction sets. Patent application includes description of several instructions used to switch modes and share data between the two instruction sets. And Linley Gwennap, the article author, writes "In some places, the document gives the impression that Intel will treat IA-64 as simply a 64-bit extension to x86, much as when the 386 pioneered new 32-bit modes." In brief, it is not clear about x86 compatibility. Only one thing is for certain: Intel's officials say Merced will be able to run x86 code. |
|
| Operating systems supporting Merced | |
| Sun Microsystems
and Intel announced
16 Dec 1997 that Sun will develop a version of its Solaris (UNIX dialect)
operating system for Intel's Merced. Sun and Intel also announced a licensing
agreement whereby the Merced-optimized Solaris will be licensed to other
hardware companies. Vendors including Fujitsu, NCR, Siemens Nixdorf and
Toshiba said they would use Solaris on their Intel based products. Current
version of Solaris is 7 (2.7) which is a fully 64-bit system. Merced should
be supported beginning from version 8 (2.8).
Digital Equipment Corporation and Sequent port Digital UNIX to Merced. Tandem, Compaq and Sequent announced they would ship their Merced based systems under Digital UNIX environment. Digital UNIX is the first 64-bit member of UNIX family on market. Digital UNIX became a 64-bit in 1993. Hewlett-Packard prepare its HP-UX (UNIX dialect) for Merced. Current version of HP-UX is 11.0. This is the first version of HP-UX with full 64-bit support. As you remember HP is a co-developer of EPIC. HP licenses HP-UX to Hitachi, NEC and Stratus. Microsoft Corporation announced forthcoming Windows NT 5.0 would have a 64-bit variant for Merced. Unfortunately, Microsoft has not any experience in 64-bit software development. By the way, Microsoft developed its first 32-bit system only 8 years after first Intel's 32-bit processor i386 had appeared. Silicon Graphics Inc. has reached agreement with Intel to port IRIX (UNIX dialect) to Merced. Novell, Inc. unveiled its plans
to develop new network operating system code-named Modesto.
Novell says Modesto will leverage Intel's IA-64 processors while preserving
backward compatibility with NetWare 5.
|
|
| Merced compilers | |
| Compilers for Merced are developed by Intel, Hewlett-Packard,
Microsoft, Metaware Inc. (Santa Cruz,
Calif.) and Edinburgh Portable Compilers Ltd. (Edinburgh, UK) companies.
People from Pentium Compiler Group
have plans to provide EPIC/IA-64
support for GCC.
On 9 Oct. 1997 it was announced Intel had a complete IA-64 compatible software development environment running, and key independent software vendors (ISVs) were using it to develop operating systems and enterprise-level applications. And at the Intel Developer Forum (Sept. 15-17, 1998) Merced emulator software was demonstrated. HP has released Trimaran System which is an integrated compilation and performance monitoring infrastructure for research in instruction-level parallelism. Trimaran is a collaboration of three research groups:
|
|
| Summary | |
EPIC has the same principal feature as VLIW - compiler,
not processor, performs the parallelizing of instruction stream. This approach
has such advantages:
According to HP and Intel's announcements the architecture simplicity is one of EPIC advantages. But IA-64 will support complex instruction set of x86 family. As expected, in x86 mode, 800-MHz Merced could match the performance of a hypothetical 500-MHz Pentium. Then the old software for processors of x86 family will not use Merced in any efficient way. It is too expensive to run DOS or Windows on Merced. Intel Corporation aims Merced to enterprise servers and high-level workstations. Processors of x86 family has never been used for these purposes so it is not clear why Merced is to support x86. Perhaps increasing of the number of functional units is not such a difficult problem for RISC processor and is not so easy for EPIC as it is assumed by EPIC/IA-64 developers. Especially as processors, which are recognized as RISC, already use many features to be implemented in upcoming Merced. As already mentioned, classifying processors among RISC, CISC and VLIW is a kind of fiction. Up-to-date processors implement successful ideas coming from all the above concepts. In the article from Microprocessor Report, which is dated January 26, 1998, it is supposed many EPIC features can be added to existing RISC instruction sets using extension words; a retrofitted processor could execute current RISC binaries, but on programs compiled to take advantage of the new EPIC features, the processor could be as fast as or faster than IA-64 chips. As HP and Intel repeatedly claimed that Merced would be implementation of revolutionary EPIC concept. But some of already built processors have major EPIC symptoms, e.g. TI's TMS320C6201 DSP processor (1997). Nevertheless Merced is a very interesting experiment in VLIW design.
Certainly it will have hard but interesting destiny. That's why HP and
Intel play safe along with their appeal to whole computer industry to make
a transition to Merced. Just a few facts. Intel plans to continue its 32-bit
processor x86-family. In addition Intel bought several Digital Equipment
Corporation's licenses on famous Digital Alpha RISC processor. Hewlett-Packard,
EPIC co-developer, is continuing development of new members of PA-RISC
family. PA-8500 is expected to appear in systems in the second half of
1998. It will be followed by PA-8600, 8700, 8800 and 8900!
|
|
| Sources and Links | |
[2] Intel Notifies Customers Of Change In Merced Processor Schedule http://www.intel.com/pressroom/archive/releases/sp052998.htm [3] John Crawford, Intel, and Jerry Huck, HP: Motivations and Design Approach for the IA-64 64-Bit Instruction Set Architecture http://www.intel.com/pressroom/archive/speeches/mpf1097c.htm [3a] Slides of HP/Intel IA-64 presentation at Microprocessor Forum http://www.hp.com/esy/technology/ia_64/products/slides/index.htm [3b] 1997 Microprocessor Forum http://www.chipanalyst.com/q/@3720331wxyxzk/events/mpf/highlights.html [4] The Next Generation of Microprocessor Architecture: A 64-bit Instruction Set Architecture (ISA) Based on EPIC Technology http://www.intel.com/pressroom/archive/backgrnd/sp101497.HTM [5] HP and Intel Unveil Breakthrough EPIC Technology at Microprocessor Forum http://www.intel.com/pressroom/archive/releases/sp101497.HTM [6] Solaris on Merced: What's in it for Sun? by Robert McMillan, SunWorld, January 1998 http://www.sun.com/sunworldonline/swol-01-1998/swol-01-ia64.html [7] Beyond Pentium-II
by Tom R. Halfhill, BYTE, December 1997 http://www.byte.com/art/9712/sec5/art1.htm
[9] First Merced Patent Surfaces by Linley Gwennap, MPR 3/31/97 http://www.chipanalyst.com/q/mpr/merced/merced.html [10] Intel, HP Make EPIC Disclosure by Linley Gwennap, MicroDesign Resources http://www.chipanalyst.com/q/mpr/merced/v11_14.html [11] Intel's
Merced and IA-64: Technology and Market Forecast by Linley Gwennap,
MicroDesign Resources http://www.mdronline.com/q/tech_lib/IA64/index.html
[13] IA-64 News from HP http://www.hp.com/esy/technology/ia_64/news/ [14] VLSI Microprocessors by Oleg Yu. Repin http://www.microprocessor.sscc.ru [15] The Russians Are Coming by Keith Diefendorff, Microprocessor Report 02/15/1999. Short version of the article is available at http://www.elbrus.ru/press/mprep-p1.html. [16] EPIC historical precendents by Mark Smotherman http://www.cs.clemson.edu/~mark/epic.html [17] Texas Instruments' Digital Signal Processing Solutions http://www.ti.com/sc/docs/dsps/products.htm [18] The VLIW project at IBM Research http://www.research.ibm.com/vliw/proj.html [19] The Word on VLIW by Dick Pountain, BYTE, April 1996 http://www.byte.com/art/9604/sec8/art3.htm [20] VLIW Questions by Peter Wayner, BYTE, November 1994 http://www.byte.com/art/9411/sec12/art1.htm [21] What is VLIW? BYTE, November 1994 http://www.byte.com/art/9411/sec12/art2.htm [22] Free On-Line
Dictionary of Computing http://wombat.doc.ic.ac.uk/foldoc/index.html
|
|
| Glossary
CISC - acronym from Complex Instruction
Set Computer
|
|
Examples of CISC processors are Motorola 680x0 family and Intel x86 family (IA-32). Both 680x0 and x86 are still popular. CISC was coined in contrast to RISC.
|
|
| RISC - acronym from Reduced Instruction Set Computer | |
The RISC concept provides more abilities to the compiler to perform optimization. Now just RISC microprocessors prevails. The field of usage is very wide - from microcontrollers to supercomputers. Exactly RISC microprocessors achieve the highest levels of performance, today's industry performance leader is Digital Alpha. There are several standards on RISC architectures, often called as Open Architectures. Among them are MIPS (current version is IV, R10000), SPARC (current version is 9, UltraSPARC) and PowerPC. |
|
| VLIW - acronym from Very Long Instruction Word | |
VLIW processors are not used wide. The most famous VLIW machine was
built by (the late) Multiflow Computer, Inc. The company is defunct now.
Hewlett-Packard has many engineers on-board from Multiflow. In Russia Elbrus-3
VLIW-based supercomputer is well-known. Perhaps contemporary example of
VLIW processor is TI's DSP TMS320C6x family. The
VLIW effort at the IBM T.J. Watson Research Center started in 1986.
|
|
| About author | |
|---|---|
![]() |
Alexei Pylkin obtained Bachelor of Sciences Degree
in Mathematics from the State Technical University of Novosibirsk, Russia
in 1997. Despite his young ages, he has a perfect record of successful
projects, mainly in parallel computing and supercomputing fields. In 1997
he developed a new programming language targeted at parallel algorithms
representation and he has also realized a compiler for this language.
|