## 10-35 nano second magneto-resistive memories

by

K T ManeI Ranmuthu

A Thesis Submitted to the

Graduate Faculty in Partial Fulfillment of the

Requirements for the Degree of

MASTER OF SCIENCE

Department: Major: Computer Engineering Electrical Engineering and Computer Engineering

Signatures have been redacted for privacy Signatures have been redacted for privacy

Iowa State University Ames, Iowa 1990

Copyright @ K T Manel Ranmuthu, 1990. All rights reserved.

# TABLE OF CONTENTS

 $\mathcal{L}^{\text{max}}_{\text{max}}$  and  $\mathcal{L}^{\text{max}}_{\text{max}}$ 



ii



 $\label{eq:2.1} \frac{1}{\sqrt{2\pi}}\int_{0}^{\infty}\frac{1}{\sqrt{2\pi}}\left(\frac{1}{\sqrt{2\pi}}\right)^{2}d\mu\,d\mu\,d\mu\,d\mu\,.$ 

## **LIST OF TABLES**



 $\sim 10^{-11}$ 

 $\Box$ 

 $\mathcal{A}$ 

# **LIST OF FIGURES**



 $\mathcal{O}(10^{10} \, \mathrm{Mpc})$ 

v



VI

### **ACKNOWLEDGEMENTS**

The author wishes to express her sincere thanks to Dr. Arthur V. Pohm for his guidance and sponsorship throughout the course of this project. His wealth of knowledge, patience and understanding is truly remarkable.

A very special word of thanks is also due to Dr. C. S. Comstock and Dr. J. H. Lutz for their support and encouragement.

Finally, the author wishes to acknowledge a boundless and welcome debt of gratitude to her parents who were very much responsible for her achievements, and to her husband for his loving support as well as technical assistance.

### **CHAPTER 1. INTRODUCTION**

The Engineering Research Institute of Iowa State University with grant support from Honeywell Corp., is leading a major research effort to develop a new memory technology using magneto-resistive memory elements.

MR memory elements possess a number of very attractive properties. They are non-volatile, radiation hard and infinitely reprogrammable. Their inherent geometry makes it possible to easily incorporate them in random access read-write memories. IC compatibility requiring only one mask beyond semiconductor processing, sharing of electronics by many elements, very high cell density and amenability to wafer scale integration are some key economic factors favorable towards this technology.

Looking at the existing spectrum of non-volatile, programmable semiconductor memories, a gap can be noticed in the middle where neither erasable programmable read only memories (EPROMs) nor electrically erasable programmable read only memories ( $E^2$ PROMs) are particularly cost-effective. (EPROMs offer bit densities as high as 1 M, but must be taken offline to be erased with ultraviolet light prior to reprogramming. Thus reprogramming an EPROM can easily take 20 or more minutes.  $E^2$ PROMs are easily reprogrammable, but only a byte at a time, and the bit density level is at  $256$  K.)

Applications where more memory capacity is needed than *E2PROMs* provide.

and reprogramming must be done faster and more often than can be accomplished with EPROMs are the most affected by the above deficiency. Flash  $E^2$ PROMs have gained wide popularity as the best suited to fit into that gap in non-volatile, programmable memory spectrum. A typical flash  $E^2$ PROM has a bit density of about 512 K and an access time of 200 ns. The erase time can vary from 1 to  $7.5$  secs and a flash  $E^2$ PROM would typically support a minimum of 100 and a maximum of 10,000 write cycles [10]. However, the high erase time and the limited number of write cycles pose problems in many applications.

Results obtained in MR memory research have indicated the possibility of developing MR memories that could be strong contenders for the above applications.

Test memory ICs with a density of 16 Kbits and an access time of 1  $\mu$ s have already been fabricated by Honeywell Corp. A study has shown that very high cell densities with a limit of  $4 \times 10^8$  bits/cm<sup>2</sup> is theoretically possible with advanced process technologies [6]. Also, studies on the switching characteristics of these elements have reported very fast switching and therefore, the possibility of reprogramming within a few nanoseconds  $[12]$ . Thus the access time of these memories is determined by the read time which is affected by the element size as explained in chapter 3.

If the access times are improved, to about 10 ns while maintaining a moderate cell density of about  $2.5 \times 10^5$  bits/cm<sup>2</sup>, these memories can serve most of today's high speed. non-volatile and reprogrammable memory needs such as those in aerospace environments.

This thesis describes the design techniques used to meet the above specifications together with experimental and simulated results as well as layouts.

### CHAPTER 2. MAGNETO-RESISTIVE MEMORIES

### Magneto-resistance and Uniaxial Anisotropy

In magnetic films, the preference for the magnetization to lie along a single axis in one of two anti-parallel states is known as uniaxial anisotropy. This preferred axis  $-$  called the easy axis  $-$  is determined by the direction of a strong magnetic field applied during film deposition and/or annealing (see Figure 2.1).

In magneto-resistive materials, the electrical resistance changes when subjected to a magnetic field. The resistance of a magnetic film varies depending on the angle the current makes with magnetization according to the formula,

$$
R = R_0 + \Delta R \cos^2 \theta \tag{2.1}
$$

where

 $R_0$  = the resistance when the current and the magnetization are

perpendicular to each other

 $\Delta R$  = magneto-resistance term

The ratio  $\Delta R/R$ , called the MR coefficient, is ordinarily between 0.01 and 0.06 for most ferromagnetic alloys. It should be noted that this relationship is independent from the easy axis of the magnetic film.

Storage of a bit in a MR memory cell uses the anisotropic nature of the film.



Figure 2.1: Use of Uniaxial Anisotropy for Data Storage

Whether it is a '1' or a '0' depends on which anti-parallel state along the easy axis the magnetization vector is in (see Figure 2.1). The sensing of a stored bit makes use of magneto-resistance, i.e., the change of resistance with current.

### **The Sandwich Structure**

The basic MR memory cell is a sandwich structure with 2 layers of magnetic film (65% Ni, 15% Fe and 20% Co) separated by an exchange barrier or a non-magnetic layer (Ta). The magnetic material has a sheet resistance of 10  $\Omega$ /sq. and an MR coefficient of  $2.0 \pm 0.3\%$  on average.

The design is such that the uniaxial anisotropy axis of the magnetic film is either parallel or perpendicular to the long dimension of the element. Those elements with



the easy axis perpendicular to the long dimension are called transverse elements and are preferred for memory applications (see Figure 2.2). In remnant '1' or '0' states, the magnetization points across the strip with the magnetizations in the two films oppositely directed. This greatly reduces the demagnetizing fields and permits high cell density.

### Read-Write Operations

In a typical MR memory, the cells are organized in a 2-D array of sense lines and word lines (see Figure 2.3). A memory element is accessed by activating both the sense line which provides the sense current through the element, and the overlaid orthogonal word line which is electrically isolated from the element.



Figure 2.3: MR Element Array Geometry





Stored '0'



Stored MY



 $\sim$ 

'Vhen writing a bit, a positive word current is used in coincidence with the sense current. The state of the bit is determined by the polarity of the sense current. A positive sense current would store a '0' and a negative sense current would store a  $'1'.$ 

To read a bit, a positive sense current together with a negative word current is applied. If the bit being read is a '1', the sense current opposes the stored flux and the magnetization rotation is large. If the bit is '0', the sense current aids the direction of the stored flux and the magnetization rotation is small. The resistance of the cell is thus larger in the former case and smaller in the latter because of magneto-resistance, and a '1' can be distinguished from a '0' (see Figure 2.4). The reversed read with a negative word current is a recent development which quadruples the output level over the previously used technique which uses a positive word current with a lower magnitude  $[7]$ .

### Access **Times**

For normal writing conditions. cell switching time is measured to be less than a few nano seconds. Thus the cell write time is determined by the selection electronics.

\Vhen reading, care must be taken to ensure an adequate signal to noise ratio, since the signal output from the cells is low. One approach is to repeatedly read the non-destructive readout cell in order to improve the signal to noise ratio  $(SNR)$ . It has been shown that an adequate SNR could be achieved using multiple read pulses and a correlating amplifier with a read cycle time of 3  $\mu$ s. This is possible because with each read pulse. the signal increases linearly while the noise increases as the square root of the number of samples  $[5]$ .

The other approach is to increase the element size thereby increasing the output signal level of the cells. It results in highly reduced read cycle times and was adopted for this design.

 $\hat{\mathbf{z}}$ 

 $\sim$ 

 $\mathcal{A}^{\mathcal{A}}$ 

 $\mathcal{A}^{\pm}$ 

 $\mathcal{A}^{\mathcal{A}}$ 

 $\bar{z}$ 

 $\ddot{\phantom{a}}$ 

### CHAPTER 3. SENSE TECHNIQUES

#### Development

As mentioned earlier, the memory elements are accessed by switching on the sense current and the word current in pre-determined directions and magnitudes. Typically, the sense current is about 3 mA and the word current is about 30 mA.

When reading a bit. a cell containing a '1' will offer a larger resistance to the sense current than that of a cell containing a '0'. Therefore, the sense voltage across the element containing a '1' is higher than that of an element containing a '0'. This voltage difference  $\Delta V$ , is the output signal which has to be amplified to full logic levels, in order to accomplish a read (see Figure 3.1).  $\Delta V$  can be approximated by,

$$
\Delta V = 0.006 I_s R_s \tag{3.1}
$$

where.

 $I_s$  = sense current

 $R_s$  = resistance of the element (along the sense line)

 $0.006 = a$  process dependent constant

Since the optimum sense current is at about 3 mA, the  $\Delta V$  is directly dependent on the element resistance, and tend to be very small. This modest output signal requires higher access times in order to maintain an adequate SNR. and therefore.



Figure 3.1: Element Output vs Word Current

the attainable speed is limited.

It is possible to considerably increase the output signal by making longer elements with higher resistances. Even with the increased output levels, which would still be in the range of a few milli volts, proper elimination of thermal and noise effects become very important. This is achieved by using a 'bridge formation' and a 'reference sensing scheme' (see Figure 3.2).

### Noise Considerations

A simple noise calculation with the formula for Johnson's noise was used to estimate the noise levels that has to be dealt with, when sensing a stored bit.

$$
v_s^2 = 4kTR\Delta f \tag{3.2}
$$

Worst case parameters for a 10 ns memory are,

$$
T = operating temperature = 85 degrees C = 358 K
$$

 $\Delta f =$  amplifier bandwidth =  $1.66 \times 10^8$  Hz

$$
R = input resistance to amplifier = 500 \Omega
$$

 $k =$  Boltzmann constant =  $1.38 \times 10^{-23}$  JK<sup>-1</sup>

Substituting the above values in equation (3.2), the thermally generated noise within the amplifier bandwidth becomes only 45  $\mu$ V. This would result in a 90  $\mu$ V noise level if the amplifier noise figure is about 6 dB.

In a memory chip, a noise induced read error rate in the range of 1 in  $10^{15}$  – 1 in  $\,$  $10^{20}$  is considered acceptable. A SNR of 10 is sufficient to keep the failure rate within the above range [4]. Therefore, in order to have acceptable reliability maintaining a SNR of 10 with the above noise level of 90  $\mu$ V, a signal level of 0.9 mV is needed. Doubling that, a signal level of 1.8 mV would give an ample noise margin.

### Bridge Formation

The basic bridge (see Figure 3.2) consists of 4 matched memory cells with high resistance (about 500  $\Omega$ ). Two of these are used as storage cells (S1 & S2) and the other two as reference cells (R1  $\&$  R2).

There are several possible bridge formations which can achieve the necessary signal level of 1.8 mV. These have different levels of bit densities and complexities.

### Full bridge per data bit

This arrangement uses a full bridge of 4 memory cells to store a single data bit. The bit pattern in the storage cells is inverted and stored in the reference cells for



Figure 3.2: Basic Bridge Formation



Figure 3.3: Full Bridge to Store a Single Data Bit



Figure 3.4: Half a Bridge/Data Bit with Shared reference

maximum output. i.e.,  $R1 = \overline{S1}$  and  $R2 = \overline{S2}$ .

A data bit of '0' is written by setting  $S1 = R2 = 1$  and  $S2 = R1 = 0$ . A data bit of '1' has the above bit pattern inverted.

When reading, if the bridge holds a data bit of '0', the resistance in the cells S1 and R2 will increase to 503  $\Omega$  each (500  $\Omega \times 0.006 = 3 \Omega$ ). Then, with the sense current set to 3 mA,  $V_{out}$  with respect to the reference line becomes  $-9$  mV. Similarly, a data bit of '1' will set  $V_{out}$  to  $+9$  mV (see Figure 3.3).

### **Half bridge per data bit**

This uses only half a bridge (SI and S2) to store the data bit, with a fixed reference pattern of '0,0' stored in the reference cells, i.e.,  $R1 = R2 = 0$ . Also, it is possible to share the reference line with several other half bridges, thus reducing the



Figure 3.5: One Leg of Bridge/Data Bit with Shared Reference

area occupied by a data bit by half.

In this scheme, a data bit of '0' is written by setting  $S1 = 1$  and  $S2 = 0$ . If a '1' is written, the pattern would be inverted to  $S1 = 0$  and  $S2 = 1$ .

When reading a stored '0', only the resistance of cell S1 increases to 503  $\Omega$ , thereby setting  $V_{out}$  to -4.5 mV. Similarly, a stored '1' would give a  $V_{out}$  of  $+4.5$ mV (see Figure 3.4).

#### Other possible arrangements

There are several other possible approaches which can achieve still higher densities [9]. Using one leg of bridge per data bit with a shared fixed reference line containing zeros in one of them (see Figure 3.5).



Figure 3.6: Test Chip Configuration

Another one is to have 2 memory cells per leg of bridge, and use an averaged input from two shared reference lines. Here one reference line contains 'O's while the other contains 'l's. The IC design discussed uses the above approach which is described in detail in Chapters 5 and 7.

#### Experimental Details

Before starting on the IC design, some of the above techniques were tested on a discrete circuit using synthesized MR memory cells with high resistances.

Test chips fabricated by Honeywell were used for this purpose. The ICs contained simple arrays of MR memory elements (see Figure 3.6), with sizes of  $1.8\times18$   $\mu m^2$ and  $1.8\times12$   $\mu$ m<sup>2</sup>. As the sheet resistance of these elements was about 10  $\Omega$ /sq., the average element resistances were 112  $\Omega$  and 60  $\Omega$  respectively. Since the aim was to



Figure 3.7: Synthesized Cell of Eight Elements





synthesize cells of about 500  $\Omega$ , several of these elements (i.e., four of  $1.8 \times 18 \ \mu m^2$ elements or eight of  $1.8 \times 12 \ \mu m^2$  elements), were strung together and accessed as a single memory cell (see Figure 3.7).

The read-write circuit was wired with discrete components, including a fast amplifier with good common mode noise rejection capabilities, and synthesized large cells with multiple bits in a bridge formation. Then, the output signals and offsets for various configurations were measured.

Table 3.2: Observed outputs with full bridge per data bit

|  | $S1$ $S2$ data bit stored output $(mV)$ |         |
|--|-----------------------------------------|---------|
|  |                                         | $-11.0$ |
|  |                                         | 70      |

First, each of the large cells were synthesized using four  $1.8 \times 18$   $\mu m^2$  MR elements. Therefore, each arm of the bridge had an average sense resistance of 450  $\Omega$ . Only half the bridge was made to hold the data bit while the two reference cells were loaded with '0's. The read sense current was set to 3.5  $mA/cell$  and a reverse read word current of 40  $mA/$ element was applied. Ideally, with a perfectly balanced bridge, output voltages of  $\pm 4.5$  mV could be expected. The observed outputs (see Table 3.1) agreed with the expected values except for a slight unbalance in the bridge due to deviations in element resistances resulting from process inconsistencies.

Next, each of the four large cells in the sense arrangement were synthesized using eight of  $1.8 \times 12 \ \mu m^2$  elements. Therefore, the average sense resistance on each arm was 470  $\Omega$ . The full bridge was used to store a single data bit by making the reference cells hold the inverse of the pattern in the storage cells. Theoretically, the output should (roughly) double to  $\pm 9$  mV. Results close to the predicted values (see Table 3.2) were obtained for a read sense current of 3.5  $mA/cell$  and a reverse read word current of 34  $mA/element$ .

A study of the relation between the output voltage level  $(V_{out})$  and the magnitude of the reverse read word current. produced results that agree with the theoretical model (see Figure 3.8). As expected, the magnitude of  $V_{out}$  becomes a minimum at approximately 1/3 of the nominal reverse word current. This presents the possibility



Figure 3.8: Bridge Output vs Word Current

of auto-zeroing at  $1/3$  of the nominal read word current to obtain a maximum output signal level [9]. An auto-zero circuit is incorporated in the IC design to eliminate the offsets due to mismatched resistances, etc. In this design, auto-zeroing takes place before the word current is switched on.

### **CHAPTER 4. IC DESIGN CONSIDERATIONS**

### **Folded Element**

Since the MR memory elements with higher resistance tend to be very much elongated, a folded element is used to achieve a more compact and economical design. The element used in this design is folded into 3 parts (see Figure 4.1). Each part has the dimensions 1.65×13.75  $\mu m^2$  and with a sheet resistance of 10  $\Omega$ /sq., makes up a total element resistance of 250  $\Omega$ .

Even in the case of failure in one part of the folded element, the other 2 parts can provide enough signal strength to avoid an error. Thus, it is possible to achieve a higher reliability by folding the element.

In order to achieve proper switching and sufficient signal levels, 30 mA of word current needs to flow over the element when accessed. Therefore, a single word line laid over the element would require a current of 30 mA. By using the two-turn word line as shown, it is possible to reduce this word current requirement by half. The output end of the first word line is connected to the input of the second word line through a return path. When switched on, each line carries a current of  $15 \text{ mA}$  in the same direction, thereby providing a total word current of  $30 \text{ mA}$  over the element.



Figure 4.1: Folded Element with Two-Turn Word Line

### Process Parameter Modifications

As previously mentioned, the MR elements require an additional mask level for the sandwich structure of magnetic material. These are connected to the sense supply with shorting bars of 'metal 1'. The word lines are of 'metal 2' and both metal layers have high current capacities.

In this design, the layouts and simulations were done using VLSI design tools by VTI and HPSPICE simulation program. The Honeywell process of 1.2  $\mu$ m with a magnetic multi-layer is not supported by these tools. Therefore, it was not possible to layout and simulate the MR elements. Instead, space was allocated for each element and the circuit was completed with the element nodes kept open. Theoretically calculated element resistances and output levels were specified between the element nodes for simulations. This is based on the assumption that the MR element behavior can be accurately modelled, which is strongly supported by the experimental results described in the previous chapter.

There are a number of considerable discrepancies between the 2  $\mu$ m CMOS process supported by VTI used for this design, and the Honeywell process used for MR memory fabrication. On the whole, the Honeywell process is very much superior with reduced strays, etc. There are two major drawbacks in the  $2 \mu m$  CMOS process and they had to be rectified in order to proceed with the design.

First was the large discrepancy between the MOSFET model parameters of the two processes. The 2 micron CMOS process supported by VTI has very high 'source to drain' sheet resistivities with  $R_{sh}(N) = 32 \Omega/sq$  and  $R_{sh}(P) = 105 \Omega/sq$ . These are very much higher than the Honeywell parameters ( $R_{sh}(N) = 4.0 \Omega/sq$  and  $R_{sh}(P) = 4.2 \Omega/sq$ , and thus limits the transistor performance. In order to get a realistic estimation of the transistor drive capabilities, the VTI MOSFET model parameters were replaced by those of Honeywell in the simulation files.

The other is the very large discrepancy between the current capabilities of the metal layers of the two processes. The current capacities of 'metal 1'  $(0.5 \text{ mA}/\mu\text{m})$ and 'metal 2'  $(1 \text{ mA}/\mu\text{m})$  in the VTI process are too low to support the MR element geometries used in the Honeywell fabrications. Therefore. the current capacities of the metal layers in Honeywell process (50  $mA/\mu$ m for 'metal 1' and 10  $mA/\mu$ m for 'metal 2') had to be adopted for this design.

Apart from the above two modifications. the design was done according to the 2 micron CMOS process supported by VTI. And the stray capacitances were included in the net-list extractions used for simulations. Thus the results obtained in this design are highly pessimistic.

### **Circuit Block Diagram**

The following design is for a 64 bit MR memory. As shown in the block diagram, it follows a typical RAM structure with additional features to manipulate the MR elements (see Figure 4.2).

The MR memory elements are organized in an array of  $18\times4$ , i. e., there are 4 elements on each sense line which can be accessed by switching on the corresponding 'two-turn' word line. Out of the 18 sense lines, 16 store data and the remaining two are used as reference lines. 'REF 0' line contains all zeros and 'REF l' line contains all ones. When reading a bit, the combined outputs from the two reference lines are compared with the accessed line's output as explained in Chapter 7.

Out of the 6-bit address, 2 bits are used to select the word line. Depending on the pattern in the address bits' AO' and' AI', word decoder turns on the two word gates corresponding to the selected 'two-turn' word line, as described in chapter 6. The remaining 4 address bits 'A2', 'A3', 'A4' and 'A5' are used by the sense decoder to select the sense line by turning on the midline sense gate and the pass transistor. If a read operation is being carried out, the midline sense gates and pass transistors of the two reference lines are also turned on.

The 'operation decoder', turns on the driver circuits (both sense and word), so that the sense and word currents will flow in the right directions once the sense and word gates are turned on.



Figure 4.2: Block Diagram of the 64 Bit Memory

### **Write Operation**

A 'write' takes place as follows. First, the operation decoder decodes the data bit and the  $R/\bar{W}$  to turn on the sense and word drivers. The word drivers that provide a positive word current are turned on. For a data bit of '1', the sense drivers that provide a negative sense current are turned on and vice versa for a data bit of '0'.

Simultaneously, sense and word decoding is done and the corresponding midline sense gate and the two word gates of the two-turn word line to be activated are turned on to complete the write. The switching on of the sense pass transistors together with the sense gates doesn't have any adverse effects since the data output is not enabled.

### **Read Operation**

Reading takes a longer time and thus determines the memory access time. First, the operation decoder turns on the sense drivers that provide a positive sense current and the word drivers that provide a negative word current. Simultaneously, sense and word decoding is completed to turn on the corresponding mid-line sense gates and pass transistors together with the gates of the two-turn word line.

The element responses are transmitted through the three active pass transistors (those of the element accessed, 'REF 1' line and 'REF 0' line) to the sense amplifier. The sense amplifier consists of 3 main stages; namely, the pre amplifier stage. the auto-zero and differential amplifier stage and the final stage which completes the amplification of signal to full logic levels.

Then. the identify logic determines the bit value depending on the position of the MR element along the sense line as described in Chapter 7.

### **Critical Path Timing**

As previously mentioned, the read takes a longer time and the critical path delay consists of the decoder delays (operation, sense and word), driver delays (sense and word), delay through the sense amplifier and the identifying logic delay.

Out of these, the decoding is done in parallel and the decoders are very simple combinational circuits with only a few gate delays. The identifying logic too is a very simple combinational circuit with hardly any delay. (Simulations have shown that the gate delays are very small, in the order of 0.1 ns.) Thus, the analog circuitry contribute to almost all the delay in accessing. Therefore, all the effort was concentrated on designing the drivers and sense amplifier with the auto-zero circuit to have a minimum delay.

After an extensive process of design, simulation and re-design, the total delay through the above circuitry was found to be less than 9.8 ns. The next few chapters describe these circuits in detail with layouts and simulation results.

### CHAPTER 5. SENSE CIRCUIT

The sense circuit (see Figure 5.1) consists of 2 pairs of 'drive' transistors ( $T_{SDP1}$ &  $T_{SDN1}$ ,  $T_{SDP2}$  &  $T_{SDN2}$ ), 18 'sense gate' transistors  $(T_{G0},...T_{G17})$  and 18 'sense pass' transistors  $(Tp_0,...Tp_{17})$ . The signals SDC1 and SDC2 generated by the operation decoder controls the direction of the sense current. If a positive sense current is needed, SDC2 is set to logic  $1$  (5 V) and SDC1 is set to logic 0 (0 V) thereby activating transistors  $T_{SDP1}$ ,  $T_{SDN2}$  and cutting off transistors  $T_{SDN1}$ ,  $T_{SDP2}$ . Similarly, a negative sense current can be obtained by setting SDC2 to 0 and SDC1 to 1.

The sense decoder generates the control signals to turn on the 'sense gate' and 'sense pass' transistor pairs in order to the select a sense line.

The width to length ratios  $(W/L)$  of the transistors used are as follows:

- Driver  $(N)$ : 432/2
- Driver (P) : 432/2
- Gate (N) : 32/2

Pass Tr.  $(N) : 80/2$ 

The pass transistor's  $\mathrm{W}/\mathrm{L}$  ratio was set to 80/2 in order to keep the input resistance to the sense amplifier at about 500  $\Omega$ .

For a write operation, only one sense gate and a pass transistor pair is turned on



Figure 5.1: Sense Circuit

 $\bar{z}$ 

requiring a current of about 3 mA from the drivers. But, for a read operation, 3 such pairs (of the line accessed and the 2 reference lines REF 0 and REF 1) are turned on, requiring a total of 9 mA from the drivers. This adjustment occurs automatically in the designed circuit with more voltage utilized across the drive transistors for a read operation.

The voltage budget for a write operation is as follows:

 $V(P \text{ driver}) = 0.1635 \text{ V}$  $V(N \text{ driver}) = 0.0690 \text{ V}$  $V(\text{on gate}) = 1.4349 \text{ V}$  $V(each element) = 0.8250 V (I<sub>s</sub> = 3.3 mA)$ 

For a read operation with 3 sense lines 'on', the voltage budget is modified as follows:

 $V(P \text{ driver}) = 0.4723 \text{ V}$  $V(N \text{ driver}) = 0.1935 \text{ V}$  $V($  each on gate) = 1.2521 V  $V(\text{each element}) = 0.77 \text{ V } (I_s/\text{element} = 3.08 \text{ mA})$ 

All the transistors are operated in Ohmic region in order to obtain the high current needed.

It should be noted that the currents through these transistors, calculated using the simplified MOS transistor equations  $[8]$ ,

$$
I_{ds}(Ohmic) = K \frac{W}{L} [(V_{gs} - V_t) V_{ds} - \frac{V_{ds}^2}{2}]
$$
\n(5.1)

and

$$
I_{ds}(Saturated) = K \frac{W}{L} \frac{(V_{gs} - V_t)^2}{2}
$$
\n(5.2)

where

$$
K = \frac{\epsilon_{ox} \epsilon_{ou}}{t_{ox}} \tag{5.3}
$$

are much higher than those obtained by the simulation program. This is because, the above equations assume that carrier mobility is constant, do not account for channel length modulation and also neglect the leakage currents [11]. These effects cannot be neglected specially in the case of short gate lengths as used in this design, and they are accounted for by the 'LEVEL 2' MOSFET models used by the HPSPICE simulation program [I].

Simulation results show that the nominal voltage transmitted by any pass gate during a read, prior to the word line is activated, is 1.734 V. This is the nominal input voltage to the sense amplifier. Simulations with the sense gate switched on at 5 ns shows this voltage to settle within 1.3 ns (see Figure B.I) and so do the sense currents in both read (see Figure B.2) and write (see Figure B.3) operations. Therefore. the total delav in the sense circuitry is 1.3 ns.

#### **CHAPTER 6. WORD CIRCUIT**

The word circuit (see Figure 6.1) consists of 2 'drive transistor' pairs  $(T_{WDP1})$ &  $T_{WDN1}, T_{WDP2}$  &  $T_{WDN2}$ ), and 4 pairs of 'word gate' transistors ( $T_{G01}$  &  $T_{G02}$ ,... $T_{G31}$  &  $T_{G32}$ ) with a pair of gates controlling each two-turn word line.

The operation decoder generates the control signals \VDCl and \VDC2 to set the direction of the word current. WDC2 = 1 and WDC1 = 0 generates the positive word current for write operations and WDC2 = 0 and WDC1 = 1 generates the negative word current for read operations. The word decoder generates the control signals ( $W_{G0},...W_{G3}$ ) to turn on the pair of gates corresponding to the selected word line. Only one word line is activated at a given time.

Transistors much larger than those of the sense circuit are needed for the word circuit in order to provide the required word current of 15  $mA$ . The  $W/L$  ratios of the transistors are;

Word Driver  $(N)$ : 900/2 Word Driver  $(P) : 900/2$ Word Gate  $(N) : 115/2$ 

Each 'turn' of the two-turn word line is 5.5  $\mu$ m wide with a 4  $\mu$ m separation (see Figure 4.1). And the return path is 7  $\mu$ m wide. Since each word line is laid over a stack of 18 sense lines, the total length of each turn approximates 1100  $\mu$ m. Since



Figure 6.1: Word Circuit

'metal 2' has a sheet resistance of 0.05  $\Omega$ /sq., the total word resistance is about 30  $\Omega.$ 

The voltage budget of the word circuit is as follows;

V(P driver) : 0.4394 V

V(N driver) : 0.2127 V

V(Gate with higher Source potential) : 2.9179 V

V(Gate with lower Source potential) : 0.9811 V

Voltage drop along the word line: 0.45 V

The thick and long word lines tend to have a considerable stray capacitance. Using

$$
C = \frac{\epsilon_{Si}\epsilon_o A}{t_{Si}}\tag{6.1}
$$

the total stray capacitance of a word line was calculated to be 0.96 pF per line.

Since the actual word line was not laid out (see Figure A.2), this calculated stray capacitance of 0.96 pF was lumped near the two open ends of each word line for simulation purposes.

A timing simulation with the word gates switched on at 5 ns showed the word current to stabilize within 1 ns, thus limiting the total delay of the word circuit (with strays) to 1 ns (see Figure BA).  $\ddot{\phantom{0}}$ 

 $\ddot{\phantom{1}}$ 

### **CHAPTER 7.** SENSE **AMPLIFIER**

### Design Issues

There are several factors that need to be accounted for in order to achieve reliable sensing of a stored bit.

\Vhen sense currents are turned on, thermal effects themselves tend to change the input voltages to the sense amplifier from the nominal value of  $1.734$  V. By using a differential sensing scheme where the output of the sense line with the accessed element is compared with those of reference lines (REF 1 and REF 0), the thermal effects are eliminated.

By averaging the output voltages of REF 0 and REF 1 lines in the pre amplifier stage, a perfectly averaged 'midline voltage reference' is obtained (see Figure  $7.1$ ). Then the output sense voltage from the accessed element with respect to the reference  $(\Delta s)$  becomes a positive or a negative value depending on the bit stored and therefore, a '1' or a '0' can be identified.

The position of the memory element in the sense line introduces a slight complication that is taken care of by the 'identifying logic'. The sense circuit design is such that a read sense current always flows in the positive direction (see Figure  $7.2$ ), from node X to node Y. (The sense gates and sense pass transistors are not included in Figure 7.2 for simplicity.) The inputs to the sense amplifier (SEN0, SEN1 and



Figure 7.1: Midline Voltage Reference



Figure 7.2: Output dependence on Element Position

 $34<sub>1</sub>$ 

SEND) are taken from the points along the midline.

From 'REF 0' line, irrespective of the activated word line, a voltage of  $1.734$  V appears at the input to the sense amplifier (i.e.,  $V_{SEN0} = 1.734 \text{ V}$ ).

In 'REF 1' line, the voltage at midline *(V<sub>SEN1</sub>)* depends on the activated word line. Since there are all '1's stored in the REF 1 line,  $V_{SEN1}$  will differ from  $V_{SEN0}$ by  $2.25$  mV. If a word line to the left of the mid line (i.e., wdln0 or wdln1) is activated.

$$
V_{SEN1} = V_{SEN0} + 2.25mV
$$
 (7.1)

and if a word line to the right of the midline (i.e., wdln2 or wdln3) is activated,

$$
V_{SEN1} = V_{SEN0} - 2.25mV.
$$
 (7.2)

Thus, the averaged reference voltage for an element to the left of the midline is  $V_{SEN0}$  + 1.125 mV and that for an element to the right of the midline is  $V_{SEN0}$  $-1.125$  mV.

The same effect occurs when accessing a data storage element. For an element to the left of the midline, if a '1' is stored  $V_{SEND} = V_{SEN0} - 2.25$  mV and if a '0' is stored  $V_{SEND} = V_{SEN0} + 2.25$  mV. Therefore,  $\Delta s = -1.125$  mV for a stored '1' and  $\Delta s = +1.125$  mV for a stored '0' if the element is to the left of the midline. Similarly, if the element is to the right of the midline,  $\Delta s = +1.125$  mV for a stored '1' and  $\Delta s = -1.125$  mV for a stored '0'. The 'bit identifying logic' takes care of this effect by' decoding the output according to the word line activated. The shifting of the reference voltage doesn't have any adverse effects since the differential amplifier stage has almost infinite common mode rejection ratio in that range.

The above reasoning is based on the assumption that all the element resistances are perfectly matched. But, this is not quite true and voltage off-sets due to mismatched resistances do exist in the inputs to the sense amplifier. In this design, any such off-sets are eliminated using an auto-zero circuit and therefore, the above reasoning holds true. The auto-zero circuit uses a couple of capacitors which are charged up to zero out any off-set prior to activating the word line. In ICs, the resistances are generally matched within 1%. Therefore, the auto-zero circuit is designed to eliminate any off-sets resulting from a resistance mismatch of  $1\%$  or less.

The sense amplifier consists of 3 stages where pre amplification, auto-zero and differential amplification and then the amplification of the signal to full logic levels is done.

### Pre Amplifier

The pre amplifier (see Figure 7.3) is a CMOS inverting stage with a small signal gain of 34 (see Figure B.5). The sense signal from the accessed element (SEND) is fed to the data branch with the following  $W/L$  ratios in order to ensure proper biasing (see Figure B.6).

P transistor: 14/2

N transistor: 22.3/2

The reference signal is generated by 2 inverting branches with their outputs connected together. Each of these branches has following  $W/L$  ratios:

P transistor:  $7/2$ 

N transistor:  $11.15/2$ 

And they are fed with the sense signals SEN1 and SEND from the REFI line and REFD line respectively. This arrangement averages the outputs corresponding to a stored '0' and a stored 'I', thus achieving the perfect midline resultant (see



Figure 7.3: Pre Amplifier

Figure 7.1) while maintaining a small signal gain of 34.

This pre amplifier amplifies the  $\pm 1.125$  mV of  $\Delta s$  signal to  $\mp 38.25$  mV with the reference output (VREF) set to 2.21 V.

Simulations have shown a maximum delay of 0.1 ns in this stage (see Figure B.7).

#### Auto-zero Circuit and Differential Amplifier

The amplified outputs from the pre amplifier (VBIT and VREF) are routed through an auto-zero stage to the differential amplifier (see Figure 7.4).

The auto-zeroing is done prior to switching on the word lines, in order to eliminate signal offsets due to resistance mismatches. A simple circuit of 2 identical capacitors  $(C_B$  and  $C_R$ ), through which VBIT and VREF are fed to the gates of



Figure 7.4: Auto-zero Circuit and Differential Amplifier

the source coupled pair  $(T_{SC1}$  and  $T_{SC2})$  is used for this purpose. The 2 gatecapacitor nodes are also connected to a reference voltage of 2 V (Varef) through 2 pass transistors  $(T_{A1}$  and  $T_{A2})$  controlled by the signal 'AUTO'.

At the beginning of a read, 'AUTO' is pulled high and therefore, the capacitors are connected to 'Varef'. Thus when the amplified sense outputs VBIT and VREF appear, the capacitors  $C_B$  and  $C_R$  will charge to the voltage differences between VBIT & Varef and VREF & Varef respectively. Thereby, the offset between VBIT and VREF prior to switching of the word line, is eliminated using the charge of these capacitors which set the gate voltage of  $T_{SC1}$  and  $T_{SC2}$  to Varef. Then the signal 'AUTO' is pulled low, and simultaneously, the selected word line is switched on. There is a slight discharge of the capacitors at this point, but since both capacitors behave alike, discharging has no adverse effect.

Gate capacitances of 2 transistors with source and drain kept open, are used as  $C_B$  and  $C_R$  (see Figure A.4). Each capacitor occupies an area of  $20\times22$   $\mu m^2$  and has a gate capacitance of 0.410 pF. (This value is less than the calculated value due to fringing effects, etc.) Simulations have shown that these capacitors charge up very fast (see Figure B.8).

The differential amplifier uses an N-channel source coupled pair  $T_{SC1}$  and  $T_{SC2}$ (see Figure 7.4).  $T_{SS}$  acts as a constant current source biased by the gate signal (Vbias) of 1.18 V. The loads for  $T_{SC1}$  and  $T_{SC2}$  are simple P-channel current mirrors which are perfectly matched. Bias voltages, Vbias and Varef are generated using CMOS voltage dividers  $[2]$   $[3]$ . This circuit converts the differential input signal to a single-ended signal with a differential amplification of 28.

The  $W/L$  ratios of the transistors used in this circuit are,

$$
T_{SS}: 140/2
$$
\n
$$
T_{SC1} & T_{SC2}: 24/2
$$
\n
$$
T_{L1} & T_{L2}: 20/2
$$
\n
$$
T_{A1} & T_{A2}: 10/2
$$
\n
$$
T_{RN}: 5.2/2
$$
\n
$$
T_{RP}: 5.8/2
$$
\n
$$
T_{BN}: 29/2
$$
\n
$$
T_{BP}: 3/2
$$

Simulations have shown the total delay through both auto-zero and differential amplifier stages to be less than 7 ns (see Figures B.9 and B.10).



Figure 7.5: Final Stage of the Sense Amplifier

## Final Stage

This is another inverting stage (see Figure 7.5), which amplifies the output VDIF from the differential amplifier to full logic levels. The  $W/L$  ratios are as follows;

P transistor: 8/2

N transistor: 3/2

Simulations have shown a maximum delay of 0.1 ns through this stage.

### CHAPTER 8. CONCLUSIONS

The folded element used in this design takes up an area of approximately  $22\times12$  $\mu m^2$ . Considering the fact that the supporting electronics can be buried under the memory elements, this implies a bit density of about  $3.5 \times 10^5$  bits/cm<sup>2</sup>.

Simulations of each analog stage of the critical path have shown their maximum delays to be as follows:

Sense driver circuit: 1.3 ns

Word driver circuit: 1 ns

Pre amplifier: 0.1 ns

Differential amplifier and auto-zero circuit:  $7$  ns

Final stage of sense amplifier: 0.4 ns

Thus, the total delay in the above stages add up to 9.8 ns. Simulations also have shown that the delay of a logic gate is very low, in the order of a fraction of a nanosecond. Therefore, it is possible to design the necessary decoders to have very low delays.

Since the extraction of stray capacitances, etc. was done according to the 2 micron C~IOS process supported by VTI, they are far more in excess of those that would occur in the Honeywell process. Therefore, the total delay shown above. is higher than the total delay that can be obtained by using Honeywell process parameters in

the simulations.

Also, the Honeywell process uses a finer lithography (1.2 micron) than that of 2 micron CMOS supported by VTI tools. Thus, the above delay scales down further for the Honeywell process used for MR memory fabrication. Therefore, it is possible to fabricate MR memory chips with an access time of 10 ns.

It is important to note that, with finer lithography, the cell size can be scaled in one dimension without any loss of signal level. Therefore, with improved lithography and materials, substantially higher densities as well as very high speeds can be expected from MR memories.

### **BIBLIOGRAPHY**

- [1] P. Antognetti and G. Massobrio. *Semiconductor Device Modeling with SPICE.*  New York: McGraw-Hill Book Company, Inc., 1988.
- [2] Joseph Di Giacomo. *FLSI Handbook.* New York: McGraw-Hill, Inc .. 1989.
- [3] Roubik Gregorian and Gabor C. Ternes. *Analog MOS Integrated Circuits for Signal Processing.* New York: John \Viley & Sons, Inc., 1986.
- [4] Vivek Mehra. *Implementation of a Sensing Technique for Non-volatile AIR* mem*ories.* MS Thesis, Iowa State University, 1988.
- [.5] A. V. Pohm, J. S. T. Huang, J. M. Daughton, D. R. Krahn and V. Mehra. *The Design of a One Megabit Non-volatile MR Memory Chip using*  $1.5 \times 5 \mu m^2$  *Cells.* IEEE Transactions on Magnetics, 24, No. 6 (Nov. 1988): 3117-3119.
- [6] A. V. Pohm, C. S. Comstock, J. M. Daughton and D. R. Krahn. *Ultra High Density Non-destructive Readout MR Memory Cells.* Comp Euro '89, Hamburg, Germany, May 1989.
- [il A. V. Pohm, C. S. Comstock and A. T. Hurst. *Quadrupled Non-destructive Outputs from MR Memory Cells using Reversed Word Fields.* Paper CA-13 in 34th Conference on Magnetism and Magnetic Materials, Boston, Nov. 1989.
- ~8: Douglas A. Pucknell and Kamran Eshraghian. *Basic* ~'LSI *Design.* Sydney: Prentice-Hall of Australia Pty Ltd., 1988.
- [9] K. T. M. Ranmuthu. 1. \V. Ranmuthu, A. V. Pohm, C. S. Comstock and M. Hassoun. 10-35 Nanosecond Magneto-Resistive Memories. International Magnetics Conference. Brighton, UK, 1990.
- [10] Brian Santo. *Solid State.* IEEE Spectrum, 26, No. 1 (Jan. 1989): 47-49.
- [l1J Neil Weste and Kamran Eshranghian. *Principles of CMOS VLSI Design.* Reading, Massachusetts: Addison-Wesley Publishers, 1989.
- [12J H. Y. Yoo, A. V. Pohm, J. H. Hur, S. W. Kenkare, and C. S. Comstock *Dynamic Switching Process of Sandwich-structured AIR Elements.* IEEE Transactions on Magnetics, 25, No.5 (Sept 1989): 4269-4271.

 $\ddot{\phantom{a}}$ 

 $\bar{\beta}$ 

**APPENDIX A. LAYOUTS** 

 $\mathcal{L}^{\text{max}}_{\text{max}}$  and  $\mathcal{L}^{\text{max}}_{\text{max}}$ 

 $\mathcal{L}^{\text{max}}_{\text{max}}$  and  $\mathcal{L}^{\text{max}}_{\text{max}}$ 



Figure A.1: Sense Circuit

 $\mathbf{r}$ 



 $\frac{1}{\sqrt{2}}\int_{0}^{\sqrt{2}}\frac{1}{\sqrt{2}}\left( \frac{1}{2}\frac{\sqrt{2}}{2}\right) \frac{1}{2}d\theta d\theta$ 

 $\hat{\mathcal{L}}$ 



 $\ddot{\phantom{0}}$ 

 $\frac{1}{2}$  .

 $\sim 1$ 

 $\ddot{\phantom{0}}$ 







তা

ਧੂਬ þ

 $\frac{1}{2}$ 

电话机

t

 $\ddot{\phantom{0}}$  .

그<br>- -

ф

सीन

j đ

 $\pm 44$ 

 $-$ vdd $-$ 

Ī

开中





Figure A.5: Final Stage of the Sense Amplifier

**APPENDIX B. WAVEFORMS** 

 $\mathcal{L}^{\text{max}}_{\text{max}}$  ,  $\mathcal{L}^{\text{max}}_{\text{max}}$ 

 $\mathcal{L}^{\text{max}}_{\text{max}}$ 

 $\sim 10^6$ 

 $\mathcal{L}^{\text{max}}$  ,  $\mathcal{L}^{\text{max}}$ 

 $\mathcal{A}$ 

 $10.0$ <br> $10^{-9}$ 12: 07:  $04/30/90$  $\ddot{\phantom{a}}$  $\ddot{\cdot}$  $\frac{1}{2}$  $\frac{1}{3}$  $\ddot{\phantom{a}}$  $\bar{z}$  $\cdot$  $\frac{1}{4}$  $open$ </u>  $\frac{1}{4}$ Ì  $\mathbf{i}$  $\frac{v}{1}$  (SG3) t  $\overline{1}$  $\begin{array}{c} 0 \\ 5 \end{array}$  $\vdots$  $\mathbf{i}$ t ime  $\ddot{\phantom{a}}$ j  $\mathbf{I}$  $2.3125$  $\frac{1}{2}$  $\frac{1}{4}$  $\frac{1}{2}$  $\mathbf{I}$  $\frac{1}{2}$ ţ Ì  $\vert \hspace{1mm} \cdot \hspace{1mm} \vert$  $\hat{\mathbf{I}}$  $HP-SPICE$  $\begin{matrix} 0.0 \\ 0.0 \end{matrix}$  $\frac{0}{4}$  $\frac{0}{2}$ .

Figure B.1: Sense voltage through the Sense Pass Transistor

 $\begin{array}{c} (5) \\ (6) \\ (1) \end{array}$ 



 $10.0$ <br>10.9  $\frac{19:30:05/11/19}{1 \text{ (sense)}}$ ŧ  $\ddot{\phantom{1}}$  $\mathbf{I}$  $V(SG3)$  $\begin{array}{c}\n5.0\n\end{array}$  $t$  ime ļ  $\frac{1}{4}$  $-2.3$   $-25$   $\ddot{\cdot}$ ţ  $\frac{1}{2}$  $\frac{1}{4}$  $HP-SPILE$  $\overline{0}$ .  $\overline{c}$ .0  $\overline{0}$ .  $E^{-0.1}$  $i (vm2)$ <br> $v (6)$ 



 $\ddot{\phantom{0}}$ 

 $54<sub>1</sub>$ 

 $\hat{\mathcal{L}}$ 

Figure B.4: Word Driver Output



 $55\,$ 

 $\ddot{\phantom{a}}$ 



Figure B.5: Pre-Amplifier Transfer Characteristics in the Operating Region

 $56$ 

 $\hat{\boldsymbol{\beta}}$ 

20: 02: 05/11/90  $\hat{\boldsymbol{\gamma}}$  $4.0$ vsenin  $2.0$  $2.31 - 25$  C  $HP-SPICF$  $\frac{1}{0.0}$  $-4.0$  $\frac{1}{2}$ 

Figure B.6: Pre-Amplifier Transfer Characteristics

 $57$ 

 $\hat{\mathcal{E}}_{\text{eff}}$ 

 $(4)$ 

 $10^{-9}$  $16:36:05/18/90$ Figure B.7: Pre-Amplifier Output Timing 5 . 5  $t$ ime  $\ddot{\phantom{0}}$  $2.\overline{3}$   $25$   $0$  $V(3)$  |  $V(4)$ **HOLIGE-GIL** . ...  $0.0^{+0}$  $\sqrt{(4)}$  – Output<br> $\sqrt{(3)}$  – Input  $4.0$  $\ddot{c}$ . 0

 $\hat{c}$  .  $\hat{c}$  $20.0$ <br> $10^{-9}$  $1/1041 = 05/18/90$ Figure B.8: Auto-zero Circuit Outputs Output of C<sub>B</sub> Input to CR Input to CB  $10.0$ time  $\mathsf I$ T<br>|<br>|}  $2.31$   $25$  C  $\vdots$ **Auto** <u>n es er</u> Ť  $\frac{1.5}{0.0}$ 5<br>25  $\frac{0}{2}$ 



59

 $\hat{\mathcal{L}}$ 





 $60.0$ <br>10<sup>-9</sup>  $17:04:05/18/90$  $40.0$ Ī time  $20.0$ Ŧ  $V(4)$  $\begin{array}{c} \begin{array}{c} \bullet \\ \bullet \\ \bullet \end{array} \end{array}$  $2.31.25$  $-0(14)$  $V(10)$ TO SP CF  $V(6)$ <sub>i</sub>  $V(4)$  - diff. amp<br>
v(10), V(14) -<br>
v(10), V(14) -<br>
auto zeroed<br>
inputs<br>
inputs<br>
v(6) - Auto  $0.0$  $4.0$  $\frac{0}{2}$ .



19: 47: 05/11/90  $\ddot{q}$ .  $\frac{1}{2}$  $v$  in  $\ddot{v}$  $\frac{1}{2}$ .  $2.31$   $25$  C  $17. cS·c<sub>1</sub>$  $0.0$   $0.0$  $\frac{0}{4}$  $\frac{1}{2}$ 

Figure B.11: Transfer Characteristics of the Final Amplifier Stage

 $62$ 

 $\hat{\mathcal{A}}$ 

 $\ddot{\phantom{0}}$ 

 $(3)$ 

 $10.0$ <br> $10^{-9}$  $17:04:05/18/90$  $\begin{array}{c}\n5.0\n\end{array}$ time  $\ddot{\phantom{0}}$ Output  $-2.31.25$ Input  $\frac{1}{0.0}$  $-4.0$  $\frac{0}{2}$ .



 $\begin{array}{c} \n \times 1 \\
 \times 1 \\
 \hline\n \end{array}$ 

 $\hat{r}=\hat{r}$  $10.0$ <br> $10^{-9}$  $\ddot{\phantom{0}}$  $17:00:05/18/90$  $\ddot{\phantom{0}}$  $\frac{1}{2}$ time.  $\ddot{\phantom{0}}$  $2.3125$ Input Output Î  $0.0$  0.0  $-6.6$  $\frac{0}{2}$ .

Figure B.13: Final Amplifier Stage Timing for a Negative going Pulse

 $\ddot{\phantom{0}}$ 

 $64$ 

 $\hat{\mathcal{A}}$ 

 $\begin{array}{c} \text{(b)} \\ \text{(c)} \\ \text{(d)} \end{array}$