CMOS Optoelectronic Components for Clock Distribution

Robert Pownall a, Charles Thangaraj a, Guangwei Yuan b, Phil Nikkel b, Tom W. Chen a, Kevin L. Lear a

a Colorado State University, Fort Collins, CO 80523-1373 USA
b Avago Technologies, Fort Collins, CO 80525 USA

Abstract

Optoelectronic components for clock distribution that are fully compatible with all standard CMOS processes are described. Waveguide cores are silicon nitride, while the waveguide cladding is silicon dioxide. Polysilicon photodetectors offer responsivities up to 1.3 A/W, 10% - 90% rise time of 0.58 ns, and full-width half-max duration of 0.85 ns. Power budget calculations indicate that 1 µA of photocurrent from the end-node detectors can be achieved with only 48 µW of optical power input into a 16-node H-tree.

Keywords: Complementary metal-oxide-semiconductor (CMOS)-compatible optoelectronic integrated circuits, leaky-mode metal-semiconductor-metal photodetectors, optical interconnects, photoconductivity, waveguides

1. Introduction

Integrated circuit maximum die sizes are growing as component sizes shrink causing global signals (e.g. clock) to travel longer distances on chip even while parasitic effects (e.g. metal line resistance and interline capacitance) are increasing. The combination poses a significant issue for signal transmission on integrated circuits. Clock signals, the most obvious and possibly most critical example of global signal challenges, are faced with increased clock skew and distortion [1,2]. Other signals suffer from increased signal latency, and all must contend with more power consumption associated with interconnect [2,3].

For these reasons, optical interconnect technologies, as a replacement for conventional “metal line” interconnect, are a topic of great interest [4-6]. However, many of the proposed complementary metal-oxide semiconductor (CMOS) optical interconnect approaches in the literature have drawbacks such as material or processing which is either not compatible with conventional CMOS processing or which is not part of a standard CMOS technology [7,8]. One example of this is waveguides made from SU-8 photoresist [9] or polyimides [8]. Another example is photodetectors made from III-V materials [10,11]. The use of SU-8 or polyimides limits the temperature of subsequent processing. It also raises the possibility of carbon or other contamination in subsequent processing steps. The use of III-V materials in a CMOS process requires wafer bonding [11], quantum well creation [12,13], mismatched materials growth [14] or other exotic hybrid integration techniques [15]. Other proposals require the use of valuable silicon substrate area, or place constraints on the circuit routing above the photodetectors used to convert optical signals into electrical signals [16]. While the use of germanium photodetectors [17] is compatible with CMOS processing, germanium is typically available only in facilities which are using high-speed SiGe heterojunction bipolar transistors (HBTs) or strained silicon.
Even facilities with germanium capabilities would welcome an interconnect technology which did not use germanium photodetectors, due to the reduction in amount of capital equipment expenditures. CMOS-compatible optical interconnects involving the use of silicon-on-oxide (SOI) technology have been reported. [18] In addition to sharing the disadvantage of using valuable silicon substrate area, the additional expense and processing complexity of SOI and the design challenges associated with learning to deal with SOI’s parasitic characteristics would be a disincentive for facilities which were not already using SOI.

In this work, we discuss the results of investigations into an optical interconnect technology which is fully compatible with all standard CMOS processes. In this technology, light is guided through optical waveguides consisting of a silicon nitride core and silicon dioxide cladding to photodetectors made from polysilicon. All of these materials are in common use in standard CMOS technologies. The results reported in this work are from wafers fabricated at a commercial CMOS facility\(^1\) in a 0.35 µm minimum feature size technology.

This paper covers the processing and optoelectronic testing of the waveguides and photodetectors discussed above. The discussion begins by detailing the process for integration of the optical and electrical components in Section 2. Following this, Section 3 describes the optical and optoelectronic test components along with measurements results. Next, a link budget analysis, to assess overall system feasibility, is performed and reported in Section 4. Finally, overall system results are presented, along with conclusions and suggestions for improvements and other follow-on work.

2. Integration of Optical and Electrical Components

One key to a successful optical interconnect system is careful integration of the optical components which transmit the optical clock signal around the chip with the optoelectronic components which convert the optical signal back into a usable electrical signal. Examples of the former are optical waveguides, while photodetectors and transimpedance amplifiers (TIA) are examples of the latter.

Due to the lack of efficient high speed and high power optical emitters that are compatible with a standard CMOS process at the present time, the monolithic optical interconnect system described in this paper used an off-chip optical source. The off-chip optical clock source approach may offer benefits in terms of thermal management and low jitter from mode-locked diode lasers even if CMOS compatible optical sources were available. The modulated optical signal is injected to the test chip by end-fire coupling into an optical waveguide configured as a standard clock-distribution H-tree in a balanced 16-node configuration. Photodetectors and transimpedance amplifiers are at the end nodes of the H-tree. Thus, the global clock distribution is optical, while the local clock distribution remains electrical. In the future, similar optical and detection components could be used for global signal interconnects.
Fig. 1. Detector structure, with waveguide width W, detector length L and contact spacing S indicated.

Fig. 1 shows the optical waveguide, which consists of a silicon nitride (SiN$_x$) core with a phosphosilicate glass (PSG) / tetraethyl orthosilicate (TEOS) silicon dioxide (SiO$_2$) cladding on top of a polysilicon detector. The SiN$_x$ is deposited using a plasma-enhanced chemical vapor deposition (PECVD) technique, then patterned and etched. The PECVD SiN$_x$ was a minor thickness modification of a standard deposition used in the wafer fabrication facility. The waveguide core is deposited on a PSG / TEOS sandwich, which serves as the bottom cladding outside of the detector region. The TEOS is chem-mechanically polished (CMPed) down to the top of the polysilicon layer prior to deposition of the SiN$_x$ layer, so that the waveguide core lies directly on top of the polysilicon detector. The top cladding is a second TEOS layer. The top TEOS is also CMPed, to provide a smooth, topology-free surface for depositing and patterning the subsequent metal-1 film. This leaves a nominal 200 nm of TEOS on top of the waveguide core. The PSG layer and the top TEOS layer are standard film depositions in the facility, while the bottom TEOS is a minor thickness modification of a standard deposition.

---

1 Avago Technologies, formerly the semiconductor division of Agilent Technologies.
Fig. 2. Photomicrographs of (a) first generation and (b) second generation test chips. Die size for both generations is approximately 8.8 mm by 9.2 mm. The arrows indicate the launch points for the 16-node H-trees on each test chip.

Two generations of test chips were designed and fabricated. The first generation had only photodetectors and waveguide structures, the minimum necessary to establish the basic feasibility of the concept. In contrast, the second generation had full processing including transistors and two levels of metal. As a result, the second generation test chip included transimpedance amplifiers, phase comparators, ring oscillators and other electronic circuitry in addition to the photodetector and waveguide structures. The second layer of metal required TEOS as a dielectric between metal-1 and metal-2 as well as CMP of the TEOS. The TEOS between metal layers also provided additional top cladding for the waveguides. The moderately high index difference between the waveguide core (n=1.8) and cladding (n=1.45) provides excellent light confinement in the waveguide, allowing the
bends necessary for the H-tree optical clock distribution to have radii of curvature as low as 100 μm without significant loss [19]. The use of polysilicon as the detector layer provides a material with a refractive index of n=3.85, higher than silicon nitride (n=1.8), permitting efficient leaky-mode coupling from the waveguide core to the detector. The detector polysilicon and the gate polysilicon are deposited at the same time. However, while gate polysilicon is heavily doped n-type or p-type and frequently silicided in order to further lower the electrical resistance, the detector polysilicon is masked from implant everywhere and from silicidation everywhere except in the via regions, in order to provide the desired electrical and optical properties. Silicidation would reduce coupling from the waveguide core to the polysilicon, by increasing the vertical separation between the two. Silicidation and implantation would both greatly increase the dark current of the photodetectors, by lowering the gigaohm per square-range resistance [20] of the polysilicon photodetector. This would severely degrade the contrast ratio of the photocurrent signal coming from of the photodetector. While in this work the detector polysilicon is at the same level as the gate polysilicon, it would be possible to place the detector polysilicon in the electrical interconnect layers. Placement of the detector polysilicon after metal-1 would, however, require a low-temperature deposition technique. In addition to flexibility in positioning detectors in the backend of the CMOS process, use of dielectrically isolated polysilicon removes the detectors from the potentially electrically noisy environment of the CMOS crystalline substrate.

The via from metal-1 to the polysilicon detector has a titanium nitride (TiN) seed layer at the bottom, making a Schottky contact to the undoped polysilicon. A tungsten plug fills the remainder of the via hole. After plugging the via hole, metal-1 is deposited and patterned. As a result, the detector structure in this work falls in the general category of metal-semiconductor-metal (MSM) photodetectors.

The waveguide crosses the polysilicon photodetector, permitting direct vertical coupling from the waveguide to the photodetector. This enhances the effective optical absorption coefficient of the polysilicon compared to normal incidence systems. Absorption coefficients as high as 0.67 dB/μm at 10 V bias [21,22] have been reported. This value is much higher than the values reported in the literature for other leaky-mode coupled detectors [9,23,24], thus allowing smaller detector area.

3. Optical Test Devices – Description and Results

Both first and second generation test chips contained many optical test devices and structures. The most important ones, which were common to both test chip generations, are various types of waveguide structures and waveguide-coupled photodetectors placed around the edge of the die.

(a) Waveguide structures

The waveguide structures consisted of clock distribution H-trees and long waveguides with a 90-degree bend in the middle.
The 90-degree bend structures were used to measure waveguide loss, as discussed in the next section, and run from the left edge to the top edge in Fig. 2. They varied in width from 0.5 μm to 8 μm (0.5, 1, 2, 4, 8 μm) and in bend radius from 10 μm to 100 μm (10, 20, 25, 50, 100 μm) and were essentially the same on both the first and second generation test chips. The H-tree structures allowed testing of the basic feasibility of the optical clock distribution concept. While too small to be easily seen in Fig. 2, the launch points for the main, 16-node H-trees that span the upper left two-thirds of each die are indicated by arrows. The first generation test chip had only one main H-tree, and used the same photodetector geometry at each end node of the H-tree. It included a tapered multi-mode input waveguide intended to relax the optical fiber horizontal alignment tolerance but the taper proved unnecessary and was abandoned in the next design. The second generation test chip had five small, 4-node H-trees in addition to the main H-tree. This allowed investigation of various values of H-tree waveguide width and bend radii, as well as different end node photodetector geometries and configurations. Fig. 3 shows half of a large first-generation 16-node H-tree. Waveguides, waveguide splitters and leaky-mode waveguide-coupled photodetectors are visible due to scattered light from the waveguides. Blooming near the input is from scattered light at the edge of the die that does not couple into the waveguide.

![Image](image_url)

**Fig. 3.** Photomicrograph negative of one half of a lit-up H-tree on a first-generation test chip. The dots at the end of each branch are scatter from leaky-mode detectors.

(b) Waveguide loss

Waveguide loss was measured using a custom-built fiber scanning system to detect the change in scattered power along the waveguide. Optical power from a 654 nm laser diode was end-fire coupled from an Oz Optics visible wavelength single-mode optical fiber (4 μm core, 125 μm cladding, 0.11 numerical aperture) into the input of the waveguide. A cleaved 62.5 μm core diameter, multimode graded index optical fiber was positioned typically 25 μm above the waveguide to be measured. The fiber collected optical power scattered up out the waveguide and guided it to a Hamamatsu S8745-01 photodiode / preamp combination with 1 GΩ nominal transimpedance gain. The maximum scattered optical power was measured for a series of transverse scans at different positions along the length of the waveguide with transmission scans at each position averaged to
reduce noise. Measured scattered power values as low as 1.5 pW were easily observed with this system. The system’s noise floor was under 0.2 pW. Under the assumption that the fraction of optical power scattered up out of the waveguide is a constant fraction of the optical power propagating down the waveguide, a line can be fit to the average maximum power (expressed in dB) at each position, with the slope of the line giving the waveguide loss in dB/unit length. Typical waveguide loss results are shown in Fig. 4.

Measured waveguide loss between 8 dB/cm and 9 dB/cm was much higher than expected and much higher than other values reported in the literature for SiN$_x$/SiO$_2$ waveguide loss, which are typically below 1 dB/cm [23,24]. Further investigation revealed an unexpected loss mechanism caused by absorption in the PSG layer which formed part of the lower cladding. The non-zero imaginary part of the refractive index of the PSG layer, in the range of 0.001 to 0.002 [25], resulted in waveguide losses which were both much higher than expected and essentially independent of waveguide width. Simulation results [26] using an $n_t$ value of 0.0015 predict a waveguide loss of approximately 11 dB/cm, in reasonable agreement with the measured waveguide loss.

![Graph](image)

**Fig. 4.** Average peak optical power from each transverse scan versus the position along a waveguide. The slope of the best-fit line is the waveguide loss. Y-axis is dB referenced to 1 pW.

(c) Waveguide splitter performance

Construction of the optical clock distribution H-tree required waveguide splitters in addition to straight waveguides. To this end, passive waveguide Y-type splitters based on 90° bends were implemented in the design. Fig. 3 shows the combination of straight waveguides and waveguide splitters in a lit-up H-tree, while Fig. 5 below shows a waveguide split in the second-generation large H-tree.

One second-generation wafer was pulled from the fabrication process immediately after the optical waveguide etch step, so its optical waveguide top and side cladding was air. This sample was used for near-field scanning optical microscope (NSOM)
studies of the splitters. Fully-processed second-generation wafers could not be used for NSOM studies of the splits as the nominally 950 nm thick SiO$_2$ films on top of the waveguide core in a fully processed sample would result in the evanescent field from the waveguide decaying to below detectable limits before it reached the NSOM tip. A Witec Alpha-NSOM in contact mode measured the optical intensity at the upper surface of the air-clad waveguide [19,27]. NSOM measurements yielded simultaneous light intensity and topography maps.

![Waveguide split](image)

**Fig. 5.** Waveguide split in large H-tree. Incoming vertical waveguide is 2 μm, outgoing bends and horizontal waveguides are 1 μm. Bend radius is 100 μm. The structure below the split is a photodetector used to evaluate scatter at the split.

Fig. 6 shows the lateral optical field profiles of a waveguide splitter that was aggressively designed with two 25 μm radius waveguide bends. The entrance waveguide is 2 μm wide while the output branches start at 2 μm wide and gradually taper to 0.5 μm wide. This makes the air-clad waveguides slightly multimodal at the 654 nm wavelength used. The inset shows the NSOM image of the splitter and defines the locations of the three field profiles, at position I in the entrance waveguide and at positions II on the right-hand branch and III on the left-hand branch. Numerical solutions of the Helmholtz equation showed the entrance waveguide contained the fundamental and first order TE modes. In position I, the NSOM light intensity profile demonstrated an interference pattern, as shown in the corresponding plot in Fig. 6. Fitting of the simulated modes to the measured lateral intensity profile determined the ratio of fundamental to first order mode power was 3.8 and phase difference of the two waveguide modes was 0.39 radians [28]. NSOM was also used to measure an excess loss for the splitter without a top cladding of 0.84 dB [28].

For comparison, the light intensity distribution in a first-generation splitter was also measured. This sample had 250 nm TEOS as top and side cladding, which was standard for first-generation wafers. The waveguide entrance was 1 μm wide while the branch waveguides were 0.5 μm wide. The 250 nm top cladding was thin enough to allow the evanescent field from the waveguide to propagate to the NSOM tip and still have enough strength to be measured. Significant scattering loss was found at the input waveguide / splitter interface. Worst-case excess loss was measured as high as 4 dB/splitter [29].
Fig. 6. Measured optical intensity distribution as a function of transverse position on 25 μm radius splitters on a sample with air as the top and side cladding at the positions indicated in the inset. The inset shows the NSOM image of the measured light intensity. The fiber injecting light into the waveguide was adjusted to steer more light down one bend than the other, for measurement purposes.

(d) Photodetectors

For full compatibility with standard CMOS processing, the size of the photodetectors is as important as their materials and processing. With a typical length of 10 μm and a typical contact spacing less than 3 μm, the photodetectors discussed in this work are relatively compact in the 0.35 μm process.

As noted above, the optical waveguide crosses the polysilicon photodetector region. This lateral coupling of optical power from the waveguide to the detector, by means of leaky modes, is the reason for the very high absorption constants measured for the photodetectors, more than an order of magnitude higher than other values reported in earlier literature [21,22]. In turn, this high absorption coefficient means that relatively short detectors capture almost all of the light in the waveguide. Fig. 7 shows two microphotographs of a triple detector structure (W = 0.50 μm, L = 50 μm, S = 1.14 μm) at the end of a second-generation H-tree. The disappearance of all light after the first photodetector indicates that essentially all of the light is absorbed by the first detector.

The test photodetectors outside of the H-tree consisted of an optical waveguide brought out to the edge of the die and a photodetector structure, with the waveguide crossing the photodetector.

The photodetector had three design variables, as defined in Fig. 1: (1) the waveguide width W, (2) the detector length L (that is, the distance that the waveguide crossed the polysilicon photodetector layer), and (3) the spacing S between the contacts to the
polysilicon. A ground-signal-ground pad structure allowed electrical probing of the test photodetectors.

![Image](a)

![Image](b)

Fig. 7. Negative image photomicrographs of a 0.50 μm waveguide crossing three sequential 50 μm long detector structures (a) with and (b) without top illumination.

On the first generation test chip, geometries were chosen to explore the entire design space. A 2:1 ratio of the detector lengths allowed analytic extraction of the absorption coefficient of the detector. [22] The second generation test chip focused on the design region of greatest interest.

(e) Detector responsivity

Detector responsivity was found by determining the optical power reaching the photodetector, then measuring the photocurrent produced by the detector over a range of bias conditions. Input optical power was estimated by end-fire coupling light into a waveguide that crossed the die without being coupled to an on-chip detector. A commercial photodetector (Photonic Detectors PDB-C605-2) was used to measure the optical power exiting the waveguide. From the measured output power and measured waveguide loss, the input optical power could be calculated. Measured photocurrent was maximized by active alignment of the input optical fiber relative to the waveguide. First generation responsivity results ranged up to 0.35 A/W at 10 V for λ=1.14 μm [22]. Second-generation responsivity has been measured at 1.3 A/W for a device with a 0.5 μm wide waveguide, 40 μm long polysilicon detector length and 2.6 μm wide contact spacing at 10 V bias. Fig. 8 is an I-V plot of dark current and photocurrent
for this device.

![Diagram](image)

**Fig. 8. I-V plot for 2\textsuperscript{nd}-gen detector with W = 0.5 \mu m, L = 40 \mu m, S = 2.6 \mu m, with R = 1.3 A/W at V_{bias} = 10 V.**

Based on the measured responsivities and the fact that the ratio of the responsivities should approximately match the ratio of the $\mu \tau$ (mobility-lifetime product) values, it was estimated that second-generation $\mu \tau$ values are approximately 6.5 times the first-generation value of 1.79 $\mu m^2/V$ at $V = 10 V$ [22], or approximately 12 $\mu m^2/V$. This ratio is consistent with the 6:1 ratio of $\mu \tau$ values measured on first-generation and second-generation normal-incidence interdigitated-finger MSM detectors on the same wafers [30]. The improvement between first and second-generation results is attributed to the additional thermal processing seen by the second generation material causing increased polysilicon grain size and thus higher mobility. This suggests even higher responsivities could be achieved with further polysilicon annealing.

(f) Detector speed

The response speed of the detector was measured by using an optical pulse to generate a current pulse from the detector. The optical pulse was generated using an HP 8116A pulse/function generator set to an 8 nsec pulse width, 20 MHz repetition rate pulse to drive a Hitachi HL6738MG 680 nm laser diode that was coupled to an Oz Optics visible wavelength single mode fiber. The electrical pulse low output level was fixed at 0.0 V, while the high output voltage level was adjusted to minimize the optical pulse width through gain-switching of the laser diode and was typically around 3.8 V. A 47 $\Omega$ resistor was in series with the laser diode, both to protect the laser diode by limiting the current and to improve the match to the 50 $\Omega$ output impedance of the function generator. The photodetector was biased at 10 V using a Keithley 2400 Source/Meter and a Mini-Circuits ZFBT-4R2GW-FT+ bias tee. A Cascade Microtech ACP40-GSG-125 ground-signal-ground high-speed probe card was used to probe the device under test. The photodetector current pulse was fed into a Miteq AM-1646 amplifier with 1 GHz bandwidth. The
resulting voltage signal was captured with an Agilent 86100A oscilloscope. SMA cables were used throughout. Typical detector temporal response is shown in Fig. 9 for a second-generation device with a 2 μm waveguide, a 10 μm detector length and contact spacing of 2.64 μm. Full-width half-max (FWHM) is 0.85 nsec, while the 10% - 90% rise time is 0.58 nsec.

![Graph showing output electrical and input optical pulses](image)

**Fig. 9.** Typical 2nd-gen detector response. 10% to 90% rise time is 0.58 nsec. FWHM is 0.85 nsec. Input and output peaks are aligned for comparison purposes. W = 2 μm, L = 10 μm, S = 2.64 μm

Detector speed could be enhanced by reducing the trapped carrier lifetime by implanting the polysilicon with electrically inactive species to damage the polysilicon [31,32] or by using very fine grained polysilicon [33]. As long carrier lifetimes enhance the photoconductive gain mechanism mentioned earlier, a reduced lifetime would result in lower photocurrent response. In particular, if gain >> 1 is assumed, the gain-bandwidth product for photoconductive detectors is

\[ G \times f_{3dB} = \mu_{n,p} V / 2\pi S^2, \]

where \( \mu_{n,p} \) is the carrier mobility, \( V \) is the bias voltage and \( S \) is the contact spacing. [34] Since the photoconductive gain mechanism is also inversely proportional to the square of the contact spacing, the effect of lower lifetimes could be alleviated by smaller contact spacing, by delivering more optical power to the detector, or possibly by electrical circuitry which had lower drive current requirements. The gain-bandwidth product for the second-generation detector shown in Fig. 9 is estimated to be just over 2 GHz. Extrapolating from the 0.35 μm process used in this work to a 65 nm process by assuming that the detectors would shrink by the same ratio as the minimum feature size and keeping the field constant leads to a gain-bandwidth product over 10 GHz. One potential problem is that as the contact spacing \( S \) becomes less than the polysilicon grain size, the photoconductive gain mechanism becomes less effective due to a reduced contribution from trapping mechanisms. More significantly, the dark current will increase dramatically as the material between the contacts will effectively be single-crystal silicon instead of polycrystalline silicon. The use of very fine grained polysilicon, mentioned above, would alleviate both problems.

Another possibility to enhance the device speed would be to change from a photoconductive detector to a photodiode detector.
This might be done by implanting the contact regions n-type on one side and p-type on the other, thus making a polysilicon p-i-n photodetector, with the waveguide crossing the i region of the detector.

4. System Test and Analysis

A link budget analysis is important when designing an optoelectronic system, to determine if the desired output can be feasibly obtained from the available inputs and components. For this system, the design specification for the TIAs at the end of the H-tree is 1 μA of input current. Therefore, the desired photocurrent out of the detector is 1 μA. A detector responsivity of 1.3 A/W will be assumed, equivalent to the best measured value for second-generation edge-coupled detectors as noted earlier, requiring the detectors to receive 0.77 μW or -31dBm of optical power. Waveguide loss of 8.5 dB/cm and excess splitter loss of 2 dB/split (half the worst-case measured value) are used for the power budget. The large optical clock distribution H-tree in the second generation test chip has a total drawn length of 0.8335 cm from input to each detector, and contains four splits between the input and the detector. This will result in a loss of 27dB. Therefore the optical source must be capable of injecting -4.1 dBm, or approximately 390 μW, into the 16 leaf H-tree waveguide.

The detectors operate in a purely photoconductive mode, so an external bias is required to generate a photocurrent. The 10 V bias used in this work is quite likely too high to be practical for modern integrated circuit applications. However, it was necessary to obtain sufficient photocurrent from these experimental detectors, as the relatively large micron scale contact spacing impedes photoconductive gain at low bias voltages. As a voltage-dependent photoconductive gain process [34] enhanced the photocurrent response of the detectors, the 10 V bias provided sufficient photocurrent to meet the design specs of the electrical clock detection and recovery circuitry at the ends of the H-tree. If smaller contact spacing, such as those available in recent sub-100 nm technologies, were employed, or optical losses can be reduced so that more optical power reaches the detectors at the end of the H-tree, then the bias voltage can be significantly reduced.

The low level of optical power reaching the detector is primarily due to three factors: coupling, waveguide, and splitter losses. The first factor is poor end-fire coupling between the input optical fiber and the on-chip optical waveguide, which we estimated at approximately 5% best-case and typically 1%. One possibility for improving fiber to waveguide coupling would be to change from end-fire coupling to grating coupling. [35] In addition to improving coupling efficiency, the use of grating coupling would also eliminate the need for cleaved and polished facets on the integrated circuit die. The second factor contributing to low delivered power, unexpectedly high waveguide loss, was discussed earlier. The high waveguide loss could be reduced by changes in the lower cladding to either replace the PSG layer or to move it further from the waveguide. If waveguide loss can be reduced to the more typical 1dB/cm value, the required input power would drop from approximately 500 μW to approximately 120 μW (-9.2 dBm). Changing from conventional Y-type splitters to multimode interference (MMI) devices [36] holds promise
as a method to reduce the excess splitter loss, which is the third factor. If the excess splitter loss can be reduced to 1 dB, somewhat higher than the best-case 0.84 dB reported above, in addition to reducing waveguide loss, then the required input power drops even further, to 48 µW (-13.2 dBm).

**Table 1: Link budget table - design vs. actual measured vs. goal, assuming a spec of 1 µA of photocurrent.**

<table>
<thead>
<tr>
<th></th>
<th>Design</th>
<th>Actual</th>
<th>Goal</th>
</tr>
</thead>
<tbody>
<tr>
<td>Required photocurrent (µA)</td>
<td>1</td>
<td>1</td>
<td>1</td>
</tr>
<tr>
<td>Responsivity (A/W)</td>
<td>0.66</td>
<td>1.3</td>
<td>1.0</td>
</tr>
<tr>
<td>Optical power to detector (dBm)</td>
<td>-28.2</td>
<td>-31.1</td>
<td>-30</td>
</tr>
<tr>
<td>WG loss (dB / cm)</td>
<td>0.5</td>
<td>8.5</td>
<td>1.0</td>
</tr>
<tr>
<td>Splitter excess loss (dB)</td>
<td>0</td>
<td>2.0</td>
<td>1.0</td>
</tr>
<tr>
<td>Total loss for 0.83 cm 16-node H-tree of design (dB)</td>
<td>12.4</td>
<td>27.1</td>
<td>16.8</td>
</tr>
<tr>
<td>Required H-tree input optical power (dBm)</td>
<td>-15.8</td>
<td>-4.1</td>
<td>-13.2</td>
</tr>
<tr>
<td>Required H-tree input optical power (µW)</td>
<td>26</td>
<td>390</td>
<td>48</td>
</tr>
</tbody>
</table>

Each end node of the clock distribution H-tree has a photodetector and TIA circuitry similar to that discussed earlier, to convert the optical clock signal into an electrical voltage-based clock signal. Once this has been done, the clock signal quality must be verified. Specifically, clock signal quality with respect to skew, jitter and phase unison is important. To ascertain clock quality, a number of phase comparators are used to compare the recovered clock signals. TIA response up to 155 MHz and phase comparator response at 100 kHz, both at \( V_{DD} = 3.3 \) V, has been observed. Performance is limited due to undersized circuit elements, caused by underestimation of circuit parasitics.

While the performance of the photodetectors and waveguides are of interest, the ultimate goal is to integrate them into a clock distribution network with performance - power, skew and jitter - superior to that of an electrical network. While some authors have concluded that the sources of power consumption, skew and jitter are not well addressed by an optical clock distribution network [1], more recent work has reached the opposite conclusion. [2] Replacing the entire electrical clock distribution network with an optical network leads to a power-hungry solution. A hybrid optical-electrical approach was suggested in [2], where the first few stages of the clock tree (the L0 and L1 levels, analogous to global and continental distribution) are optical while the other stages (L2, L3, L4, etc., analogous to national, state or provincial and city distribution) remain electrical.
Simulations of a hybrid clock recovery circuitry were performed during the design process, with the goal of comparing the proposed hybrid optical/electrical network with a purely electrical network. The complete results of system-level simulations and measurement results are reported in a related paper [37], however highlights of the power, jitter and skew results are discussed here. The total clock skew and jitter variation introduced by the TIA in the recovered clock signal under extreme process variation conditions (supply voltage ±3%, photocurrent ±40%, FET width ±10%, FET length ±10%, temperature ±10%) was found to be less than ±5% of the clock period, i.e. ±50 ps for the 1 GHz clock rate used in the simulations. Under a more realistic set of process variations, the total clock variation would scale to be less that 2.5% of the clock period, or less than 25 ps, without requiring any additional clock deskew techniques. This is smaller than the estimated worst-case optical clock skew of 10% of the clock period even for advanced nanometer CMOS technologies. [2]

Existing purely electrical clock networks use a variety of clock deskewing techniques to reduce clock variation, as shown in Table II. Without using active clock deskewing or clock grids, the worst-case clock variation using the proposed optical clock network was shown to be ±2.5% of the clock period [37] (i.e. 25 ps in our case), which is comparable to the values in Table II.

**Table II: Clock skew numbers from modern microprocessors indicating deskew technique used [38-42]**

<table>
<thead>
<tr>
<th>Author</th>
<th>Source</th>
<th>Deskew</th>
<th>Skew before (ps)</th>
<th>Skew after (ps)</th>
</tr>
</thead>
<tbody>
<tr>
<td>Gennoulhos, et al</td>
<td>ISSCC 1998</td>
<td>Active</td>
<td>60</td>
<td>15</td>
</tr>
<tr>
<td>Rusu</td>
<td>ISSCC 2000</td>
<td>Active</td>
<td>110</td>
<td>28</td>
</tr>
<tr>
<td>Kurd, et al</td>
<td>ISSCC 2001</td>
<td>Active</td>
<td>64</td>
<td>16</td>
</tr>
<tr>
<td>Stinson, et al</td>
<td>ISSCC 2003</td>
<td>Active</td>
<td>60</td>
<td>7</td>
</tr>
<tr>
<td>Pham, et al</td>
<td>ISSCC 2005</td>
<td>Grid</td>
<td>n/a</td>
<td>12</td>
</tr>
<tr>
<td>Hart, et al</td>
<td>ISSCC 2005</td>
<td>Grid</td>
<td>n/a</td>
<td>20</td>
</tr>
<tr>
<td>Mahoney, et al</td>
<td>ISSCC 2005</td>
<td>Active</td>
<td>n/a</td>
<td>10</td>
</tr>
<tr>
<td>Golden, et al</td>
<td>ISSCC 2006</td>
<td>Retiming</td>
<td>n/a</td>
<td>21</td>
</tr>
</tbody>
</table>

To better understand the impact of power consumption on future on-chip optical clock distribution systems in advanced nanometer CMOS, a comparative study of the purely electrical clock network in a 90 nm CMOS process [42] and a hypothetical optical-electrical hybrid clock network based on the clock network from [42] was performed. The global clock routes L0 and L1 are replaced with their optical equivalent in the hypothetical clock network. The equivalent optical H-tree for L0 and L1 routes will have seven levels of splits and will be 25 mm from root to leaf nodes. An optical H-tree of this size would roughly
correspond to a die size of 20 mm x 20 mm. Although the test chip exhibits high loss (8 dB/cm waveguide loss and 2 dB/split excess loss, as noted earlier), better waveguide loss is obtainable based on other published results in the literature. The feasibility of 1 dB/cm waveguide loss and 1 dB/split excess loss has been demonstrated. [43] For this level of waveguide loss and excess loss, calculations similar to those made in Table I for this seven-level equivalent optical H-tree yield a required input optical power of 1.1 mW. Assuming a combined wall-plug efficiency for the laser and coupling efficiency to the waveguide of 10%, the required electrical power is 11 mW. With seven levels of splits, there are 128 H-tree leaf nodes. However, the number of L0 and L1 route terminations are 85 (14+71). Therefore, placing TIAs at 85 leaf nodes for clock recovery is sufficient. To calculate the power consumed by the TIAs, measured power efficiency of 8.1 mW/(Gb/s) [44] and receiver power contribution of 45% [44] are used. The design in [44] is in a 90 nm CMOS technology similar to [42]. The rated clock in [42] is 2.5 GHz, therefore the power consumption for one TIA at 2.5 GHz will be 9.1125 mW, i.e. 8.1×2.5×0.45. Therefore the total power for 85 TIAs will be 775 mW. The total power for the optical equivalent clock network will be 775+11=786 mW.

Taking an purely electrical clock system in 90 nm design [42] as an example, the total power consumption for L0 and L1 routes is 1300 mW. The equivalent optical clock network with waveguide loss of 1 dB/cm and excess loss of 1 dB/split consumes approximately 40% less power than its electrical (L0 and L1) counterpart. In other words, the proposed hybrid optical/electrical clock distribution network provides skew performance similar to a purely electrical network, but with a 40% reduction in power. If the waveguide and split losses were higher, at 2.5 dB/cm and 2 dB/split, the optical clock network still has a power consumption advantage of approximately 30% over the purely electrical counterpart. Compared to the electrical L0 and L1 clock network, there are fewer differential amplifiers and repeaters in the optical equivalent, which reduces the number of potential sources for clock skew, jitter and delay. Therefore, replacing the first two stages of the clock distribution network in [42] with their optical equivalent results in lower global clock power and reduces design complexity while still delivering the same or better downstream performance. A generic optical clock distribution system with optimal optical clock tree depth will have fewer electronic components (which are sources of jitter, skew and delay) in the clock signal path, compared to a purely electrical clock distribution system, thus minimizing absolute clock jitter, skew and delay, and potentially eliminating the need for an expensive and complex active clock de-skew, while retaining similar performance.

As with the TIA circuitry, additional details on the phase comparator circuitry will be published elsewhere.[37]

5. Conclusions

In this work, which focuses on the optoelectronic side of a CMOS optical clock distribution system, and a related paper [37] which focuses on the circuit side of the system, we have demonstrated a distribution network that is compatible with all standard CMOS processes. The demonstrated photodetector responsivity values of 1.3 A/W are sufficient for this purpose. The speed of
the photodetectors needs to be increased by roughly a factor of 10 to be compatible with current clock rates, which appear to be plateaunng around 3 to 4 GHz, and this should be readily achieved in 65 nm technology. Previous reported results on normal incidence polysilicon photodetector structures from other groups indicate that this increase in speed can be obtained while still maintaining full CMOS compatibility. In contrast to other proposals in the literature, the technology reported in this work requires no additional equipment or technology development to standard CMOS processes other than, perhaps, a high-dose silicon implant to reduce carrier lifetime in the polysilicon.

6. Acknowledgements

The authors would like to thank Ron Kee and Brad Lewis of Avago Technologies for assistance with sample preparation, and Duane Fasen, also of Avago Technologies, for assistance with process design. This work was supported in part by the National Science Foundation (NSF) GOALI Program under Contract ECS-0323493.

REFERENCES

[37] C. Thangaraj, et al., IEEE Transactions on Very Large Scale Integration (VLSI) Systems (Accepted for publication).