Kevin Thomas
Cray Research
655 Lone Oak Road, Eagan, MN 55121
phone 651-683-3624, fax 651-683-3099
Email: kjt@cray.com
Gary Haussmann and Melinda Piket-May
Department of Electrical and Computer Engineering
Campus Box 425, University of Colorado, Boulder,CO 80309
phone 303-492-8719, fax 303-492-5323
mjp@boulder.colorado.edu
Roger J. Gravrok
Sequent Computer Systems
Bldg. D2, Ste. 219 Mailbox 82
800 Wisconsin Street
Eau Claire, WI 54703-3611
phone 715-830-7321, fax 715-833-2027
rjg@sequent.com
This paper presents information about the development of a versatile FD-TD solver ``LC.'' The paper will discuss the program implementation and design at Cray Research as well as application to engineering problems at CU-Boulder and Sequent. Many issues in the implementation have surfaced, concerning the problem of producing a graphical model editor/simulator that runs efficiently on various hardware systems. Finally the paper will explore how the solver's capabilities aid design engineers when investigating and solving packaging and interconnect design issues. Users can easily extract various electromagnetic quantities at arbitrary points in their model, allowing them to model and examine numerous structures without prototyping.
Current modeling tools provide information accurate and sufficient enough to design devices at the IC level. These modeling tools typically make use of circuit or equivalent circuit representations to compute the various parameters of a device, as is done with SPICE. Often times, however, these devices constitute only small, individual parts of a whole module. The entire module is composed not only of IC devices, but of the interconnects and packaging. It is at the level of the whole module where current modeling capabilities are limited or simply do not exist. To develop an equivalent circuit representation at this level, which can then be used in a model such as SPICE, electromagnetic effects must be examined and understood. To accomplish this requires the application of Maxwell's equations to the given problem.
All circuits, theoretically, may be examined from the standpoint of Maxwell's equations, and are thus a subset of problems to which these equations apply. In engineering practice, circuit problems may be accurately solved by much simpler relations between voltages, currents, and impedance rather than by the more complex relations between electric and magnetic fields described by Maxwell's equations. Historically, designer have used these simpler voltage/current relationships to model almost all circuit problems. In fact, at the low frequencies typical of past circuits, a certain intuition could be developed about how to design a board layout, about what effects a given dielectric would have on performance, and about how significant cross-talk, propagation delay, and coupling would be for a certain design. However, at higher frequencies, these issues are far less obvious. This is because the electrical path lengths at lower frequencies are small and thus the electrical characteristics are relatively constant; at higher frequencies, path lengths become electrically significant and begin to act as transmission lines and sometimes even as radiators. The simple circuit models are no longer accurate for many cases.
At the level of a chip, or a device comprised of several chips with interconnects, the electromagnetic environment is very complex and not necessarily well understood theoretically, numerically, or experimentally. For instance, coupling between vias can distort signals, mismatches between vias and signal lines can lead to ground-bounce, holes in ground planes may result in increased coupling effects between board layers, and metal traces with or without bends are likely to have reactive impedance components, components that can significantly degrade system performance at higher clock speeds. The simulation tool discussed in this paper was designed to to understand these effects, as well as to discover means to mitigate and/or overcome these effects.
It is becoming clear that supercomputer-based solutions of the fundamental Maxwell's equations will become increasingly important to the study of electromagnetics. By studying results generated by supercomputer runs, one may increase intuitive understanding of the subject, generate specific design rules, and determine where other simpler tools are accurate and sufficient. By developing techniques and tools that presently run on supercomputers, we will have tools which are fully developed and refined available in a few years when the power of supercomputer class machines of today becomes the standard for desktop workstations. This paper will discuss the development of general methods to extract relevant information from the simulation.
We have chosen to use the FD-TD method as our Maxwell's Equations solver. The FD-TD method is a very powerful computational tool [1]: it gives time-domain data, useful for transient analysis, and can yield frequency-domain data via Fourier Transforms. The FD-TD simulation automatically and self-consistently takes into account the full-wave effects of distributed electromagnetic wave coupling, radiation, ground loops and ground bounce. Given the variety of materials that FD-TD can model, very complex and detailed structures may be analyzed with FD-TD. The FD-TD technique can be applied to problems that scale up to the size of whole antenna structures or aircraft, and scale down to optical devices [2,3,4,5,6,7,8,9]. Taken together, all of these capabilities have made FD-TD an attractive and robust method for solving a number of electromagnetic interaction problems. The FD-TD method permits time-stepping Maxwell's equations to obtain the operational dynamics of ultra-high-speed nonlinear analog and digital circuits mounted on modern multilayer boards and multichip modules.
The FD-TD method is inherently robust and based on fundamental science, enabling it to be applied to a wide class of real-world commercial engineering applications. Previously a major limitation to the full scale industrial deployment of FD-TD tools has been the absence of a versitile GUI to generate models, and postprocess the vast amount of data that the FD-TD method can generate.
This paper will discuss an EM analysis development called ``LC''. It consists of a GUI interface with the ability to auto-mesh a graphically defined 3D user geometry. A number of different excitations are possible, and the outputs can yield not only field data, but also voltages, currents, impedances, inductances, capacitances and fluxes. Further, both time and frequency-domain data are available on output. LC also allows for 2-D visual plots of the full 3-D problem being simulated during time-stepping. The ability to visualize the 3D electrodynamics of a structure lends a great deal of intuitive insight and understanding to the user. This is also very useful in identifying problem areas with a particular design. Another very important and relevant feature of the LC tool is an interface to SPICE [10], which will analyze a circuit based on FD-TD field values linked to SPICE at each time step. The SPICE interface also generates output voltages and/or currents that modify the FD-TD field values which are then used during the next FD-TD time-step. The efficient implementation of LC on scalable parallel computers (under development) will allow for the examination of problems that are currently too computationally large or complex, conceivably making inroads into some of the grand challenges facing the electromagnetic engineering community.
Essentially LC is an integrated EM model editor, simulator, and analysis tool. It's composed of 100,000 lines of C and Fortran, uses OSF/Motif, and is portable to all commercial Unix platforms. Its companion batch simulator, FDTD, and plotting program, LCPlot, are also included in the executable distribution. The simulators can use multiple CPUs in parallel when running under Cray UNICOS, SGI IRIX, and Sun Solaris.
A full description of LC may be found on the LC web page [http://www.cray.com/lc]. It is beyond the scope of this paper to fully describe the features and capabilities built into LC. We have knowingly included very little information about the GUI because to present it well would make for a very long paper. A reader interested in more information about the graphical user interface is encouraged to look at the web page. Additional information about many points made in this paper are available there also.
LC is being developed to grow with the future. LC is currently providing useful engineering data to those associated with its development. It is at a stage where someone unfamiliar with the FD-TD method is able to tap into the power of the method and produce results with only a modest investment of time in learning the basic tool features. However, a fundamental understanding of the underlying EM principles is still necessary, and an understanding of the FD-TD method is beneficial. It is a goal of this development to shed new light on problems for the EM expert, and to teach by learning to qualitatively assess the EM physics of a given problem. This will lead to useful intuition for a non-EM expert as it applies to their technical needs, in turn enhancing their understanding of EM fundamentals. While it is always best to understand the methods of scientific computing, in reality, results are often paramount. For this reason, further refinement of the tool to make the understanding of the FD-TD method less essential is being made.
The LC tool is so named because it's initial development was driven by the need for an accurate three-dimensional characterization of the inductance (L) and capacitance (C) of complex electronic systems. Specifically, the creation of LC was motivated by the need to develop techniques and a corresponding tool that were both accurate and suitable for the analysis/design of circuits with clock speeds above 100 MHz. Beyond 100 MHz, circuit interconnect performance dictates system performance capabilities. Accurate electrical characterization of complex electronic structures in three dimensions is mandatory for good design. EM modeling challenges such as those associated with printed circuit boards, multichip modules, integrated circuit packages, connectors, and electromagnetic interference are currently being performed. LC may be used to look at signal- transmission characterization and power distribution modeling. The characterization requirements for these problems include an intuitive and visual understanding of the electrodynamics along with a quantitative description of L,C,R,Z,and Td parameters.
While the meshing procedure occurs only once per simulation, the time step procedure is repeated for hundreds or thousands of iterations, until the simulated time has progressed to a point where all of the interesting activity has ceased. This iteration is usually the largest computational burden of the simulation, lasting anywhere between a few seconds for a small model to several hours for a large one.
The large memory bandwidth requirement of FD-TD limits its performance in some hardware configurations. Seven operands are fetched, and one stored, with only six operations performed per field update. Since RISC multiply-add fusing and vector chaining reduces the effective number of operations to four, two operands must move for every operation performed.
The number of operands can be reduced when updates are performed in a homogeneous region. In this case, two operand fetches from main memory can be eliminated, reducing the total to five per update. LC automatically detects global conditions of this type and employs an optimized update when possible.
Since the field updates are independent, parallel FD-TD is straightforward. The grid can be divided into as many regions as necessary, with each region assigned to a processor. In a shared memory computer, the only limiting factor is the size of the regions. If the regions become too small, the processors have insufficient work available between synchronization points. In a distributed memory computer, an additional constraint of communication overhead is present; the field values in the adjacent region are required when updating fields on the edge of a region. Again, when the regions are large, this communication is a small percentage of the overall run time. A hybrid computer architecture, such as the Cray Origin 2000, can be programmed efficiently as a shared memory computer as long as the grid regions are much larger than the hardware page size; most of the memory will reside on the local compute node, with inter-node communication occurring at the region and page boundaries.
The Berenger perfectly matched layers (PML) boundary condition [12,13] is also available in LC. It is a much more general absorbing boundary condition, and is useful for models with complex materials and geometries. Although it introduces a large amount of computation to the simulation, the algorithm has two useful features for managing the load. First, it is structured in a very analogous way to the FD-TD algorithm. Thus, most of the same techniques for efficiently updating cells in the main FD-TD also apply to the PML region. Second, the number of layers in the absorbing region is user-defined. The number of layers directly influences the amount of memory and computation required for the boundary condition. It also determines the accuracy (its ability to absorb outgoing energy).
A SPICE circuit can be used to drive an interconnect simulated with FD-TD, or it can be used as a load. Any number of nodes of the circuit can connected via ports, and any number of independent circuits can be run simultaneously as well.
Rather than incorporating a special version of SPICE as a subprogram of an FD-TD simulation, LC uses a communication stream to interface with standard SPICE 3 from the University of California, Berkeley. One advantage of this approach is that the user is free to substitute modified versions of SPICE 3, as long as the SPICE 3 user command set is not lost. Thus, new circuit element types and models can be used. Unfortunately, the communication overhead is large, so simulations which include SPICE circuits run much slower than SPICE-free models. A new approach is being investigated, with a tightly-linked SPICE being used for low-overhead standard SPICE simulations, but preserving the capability of interfacing with new versions of SPICE as well.
To obtain some insight into the performance characteristics of the FD-TD solver in LC, a benchmark problem was created. It was designed to be small enough to be practical to run on a desktop computer. The benchmark was run on as many computers as available, and the results are shown in figure 1.
The objective of this effort is to fully develop an interactive electromagnetic (EM) design and analysis tool (LC) that will take complete advantage of the coming class of scalable parallel computers. The basic FD-TD solver that drives this tool is superbly suited to scale to parallel architectures. The efficient implementation of this tool on scalable parallel computers will allow for the examination of problems that are currently too computationally large or complex, conceivably making inroads into some of the grand challenges facing the electromagnetic engineering community. This will allow larger problems to be solved with more accuracy, in turn yielding information never available before. From an entry-level workstation to a networked workstation environment to the latest massively parallel processor (MPP) type machine, this tool will provide the user with state-of-the-art data applicable to their technological interest.
As part of a contract with NASA through the EMCC (Electromagnetics Code Consortium), Cray developed a parallel FD-TD application program for computing RCS (radar cross section). Implemented using PVM (Parallel Virtual Machine) message passing, the resulting software is portable to all HPC hardware. Code validation and testing proved that the software achieved over 50% of the theoretical peak performance, while scaling linearly up to hundreds of processors [14]. We have leveraged this experience in developing our scalable parallel implementation of LC.
Several of the implementation details of LC effect parallelizing efforts. The FD-TD algorithm itself adapts to a parallel implementation easily, because the data dependence and update rule of the field quantities is regular, straightforward and static. However, LC design requires the ability to model elements - including material slabs, field sources, and data probes - with a large variety of shapes and sizes. These elements can span two or more processors in arbitrary ways, known only at run time. The program has to determine, at run time, how to distribute any probes and sources that the user has placed into the simulation.
The simulation implementation must analyze data structures generated by the user interface section of the program. When the FD-TD grid is divided amongst the processors, all of the LC elements are also examined and divided. These LC elements are divided and placed spatially, in the same manner as the FD-TD grid itself. If an element such as a data probe spans multiple processors, the element is divided into pieces: one piece of the probe goes onto each processor. During the simulation the pieces combine their data so that they appear and generate output as if they were one single large probe (figure 2.
![]() |
Early tests using the batch form of LC on a small (8 processor) Origin 2000 show a linear to superlinear speedup on a small microstrip simulation. Larger applications do not achieve superlinear performance, but show better scalability than the smaller applications (figure 3). The break point at 20-25 processors is the point where each processor can fit all of its data in local cache, explaining the sudden slope increase in the curve at 25+ processors.
![]() |
The simulation is performed in the time domain, allowing for a simple analysis of the transient model response. Basic values such as voltage, current, electric charge, and magnetic flux at any point within the model can be displayed as the simulation progresses. These values can be algebraically combined and those results displayed as well.
In addition to these localized measurements, the overall electrodynamic performance can be visualized. Characteristics such as ground plane current paths, power flow, and charge distribution can be displayed as animated color-shaded images.
The per-unit-length capacitance and inductance can be calculated for continuous segments of a signal path. Both self and mutual capacitances and inductances can be obtained, as well as impedance and propagation delay.
An arbitrary signal path can be modeled as a multiport network ``black box'' via scattering parameters. These power ratios can characterize the frequency domain response of a complex discontinuity. The basic calculation provides S-parameters for a 2-port network, but several calculations can be performed to obtain results for any number of ports.
The total radiated power loss of a circuit can be calculated in the frequency domain. The behavior of a structure can be optimized, either identifying and reducing unwanted radiation or enhancing desired radiation.
The performance of antennas may be studied in the near or far field.
Indicative of the inherently difficult challenge of accurately modeling the true inductive effects of three-dimensional structures, a number of different modeling methodologies have been previously employed. One such technique attempts to model the 3D inductance by equating the volumetric electromagnetic field energy to the energy storage potential of an inductor. Commonly techniques that place computational simplicity as a high priority, make simplifying assumptions to the solution of Maxwell's basic equations, and furthermore, require the code, or worst yet, the user, to specify the direction(s) and distribution of the return path currents. Each technique, including the two summarized below, features inherent advantages and tradeoffs. In the current work, a priority was given to 3D modeling accuracy. The current techniques employ a simple method that uses the characteristic definitions of inductance to quantitatively determine the true inductive response directly from the field data. Unapproximated solutions to Maxwell's equations are used within the FD-TD engine to accurately determine the true return-path distributions. The user controls the tradeoff between accuracy and computational-resource usage.
Inductance is defined as the ratio of the magnetic flux,
, through a surface S, that links
a current I flowing through the surface.
| (1) |
In the FD-TD grid, these integrals translate into sums as given by:
| (2) |
Where
.
Thus, an inductance per unit length can be calculated by taking the ratio of the sum of the H field values over the surface, S, and the sum of the H fields in the direction of dL divided by the length along the transmission line that the surface extends over.
The capacitance is defined as the ratio of the charge in an enclosed surface to the electromagnetic potential of the surface region.
| (3) |
In the FD-TD grid, these integrals translate into sums as given by:
| (4) |
For example, given a two conductor transmission line as shown in Figure 4, the flux
integral would be performed on the surface S with the direction of the surface
vector DS as shown. The current integral would be performed along the contour
C.
This is analogous to resistance, which is the quantity describing the ratio of induced voltage drop divided by the current that created that drop: R = V/I.
For a given electromagnetic pathway (such as the signal conductor shown in figure 4) the actual value of inductance is determined by the amount of flux that builds up around the trace. The total amount of flux generated is entirely determined by geometries and spacings of the trace carrying the original signal current (I) and its proximity to the conductor that carries the resulting return-path current (i). Larger separations between the signal and return paths permit more flux generation per unit of current with results in larger inductance. (Since H-fields are conservative note that all of the flux lines are trapped between the two conductors, which complete the circuit, carrying the signal and return-path currents.)
With this method, the challenge of modeling 3D inductance simply reduces to the user properly defining the closed surface over which the flux integration is performed. Call this closed surface the flux net (S). The incremental inductance for the segment of trace 1 from A to B is calculated using the flux net defined by the plane ABCD. The return current flows from C to D. The upper and lower edges of this flux net are defined by the edges of the conductors that carry the signal and return-path currents.
Mutual inductance between trace 1 and 2 is modeled by calculating the total amount of flux caught in ABCD when the signal current (I2) is sent down trace 2. The above illustrative example was simple and 2D in nature. Consider how this simple method is used to quantify the dramatic increase in effective trace inductance when a slot is cut in the ground plane. Now the return path current is diverted around the slot. This new return path converts the simple planer flux net in figure 4 to a hockey net configuration. An example simulation of the above structures uncovers a tripling of effective trace inductance due to the slot in the ground plane.
Gauss's Law guarantees that the total flux caught in the flux net is independent of the shape of the net, permitting the user to define the geometrically simplest flux net between the signal and return-path conductors. Rectilinear flux nets are easily and naturally defined in the FD-TD grid.
The intuitive understanding of this fairly simple 3D structure is enhanced by examining a return-path current diverting around the slot in the ground plane. Visualizations of the full-wave solutions to Maxwell's Equations enable the user to understand where the real return-path currents will flow in complex 3D structures. Understanding and modeling return-path current flow is paramount to accurate 3D inductance modeling. LC enables the packaging engineer to harness the power of Maxwell's Equations.
Determining the equivalent lumped inductance of a 3D structure is of critical importance to electrical engineers involved in package design. Physical and electrical design restrictions can be met and optimal performance can be achieved if this inductance is known before the prototyping stage. As mentioned above, signal-path inductance can be modeled with:
| |
(5) |
In this configuration, an input signal is excited and the voltage (V) across the input is measured. The signal path is shorted out at the output, where the current (I) is measured. With these two sets of data available, the inductance of the system can now be calculated using equation 6
![]() |
(6) |
This approach does not depend on the interior composition of the structure itself, but rather on the ratio of the voltage across the input to the output current time derivative. The voltage-to-current relationship shown in (figure 6)develops as a direct result of the structure's inductive element.
It is clear that this approach can be useful in determining the optimal physical parameters of a system while staying within electrically-dictated inductance thresholds that often accompany high speed digital applications. This method can be integral in the development of certain design guidelines for structures similar to the one in figure 7.
We will continue our work as described on the parallel algorithm. Certain LC elements, such as the SPICE loads and RCS data, will require additional work over that of splitting elements over processors and collecting the data from all relevant processors. The graphical interface must be re-tooled to display and plot data from a parallel simulation in real time, as it does in the serial version.
We are planning to investigate the use of grid partitioning to employ specialized field updates to reduce the amount of memory and memory bandwidth required during the simulation. Using this technique, the amount of memory required can be reduced by up to two thirds, and the processing speed can be increased by a factor of two.
We plan to add new algorithm and postprocessing features that are requested by users. There are many available algorithms in the literature for specific cases. It is not trivial to generalize them; however, if we succeed we will be making the capabilities available to a more general audience. One specific algorithm of interest is taking into account losses due to skin effects. Another one is the modeling of dispersive materials [16,17,18].
Typical LC models have not been large enough to be dramatically effected by numerical dispersion. However, higher frequency analysis and large parallel simulations will motivate the use of higher order methods. LC already has the ability to use traditional 4th order FD-TD updates. Once interface problems between 2nd and 4th order schemes are better solved, the use of higher order schemes within LC will be more routine than they are today. The 3DM24 method described below will be an option in LC to be used when high phase accuracy is needed [19].
The modified fourth order method, also referred to as M24 or the Modified (2,4) Scheme [20] is designed to reduce the numerical dispersion error inherent in the FD-TD method. The M24 scheme augments the basic fourth order difference operator with additional terms, and allows for some varying of coefficients in the operator. By varying the coefficient values, we can produce operators which have a very low dispersion error at grid resolutions well below the minimum resolutions required for the second order FD-TD scheme.
The 3DM24 scheme, when optimized for a specific frequency, provides at least an order of magnitude improvement in dispersion error over the original fourth order scheme. Since this optimization was done for a specific frequency, the improvement in dispersion errors at other frequencies will not be as dramatic. However, the optimization criteria can easily be changed to minimize dispersion error over a wider frequency band, at the expense of increasing the error for any specific frequency. Such an optimization was done for the two dimensional M24 scheme [21] which produced a scheme with low dispersion error over all frequencies of interest.
This work has direct application for analysis and design of advanced electronic components such as multichip modules (MCM) and mixed signal modules. The capabilities of LC address requirements that are currently being pursued in both the defense and commercial sectors. It is directly applicable to existing efforts in design and analysis of advanced packaged modules, low cost electronic packaging, phased array antennas, MCMs, mixed signal modules, and the study of electromigration phenomenology. Furthermore, the capabilities of LC will provide unique and innovative approaches to solutions for developing first pass reliable design and performance of advanced electronic systems. We eventually envision a tool that will not only be able to address the complex electromagnetic interactions within individual devices and components but will also address coupling of energy into electronic devices from external sources (for example noise or jamming signals).
LC has been the culmination of approximately a decade of electromagnetics research and development at Cray. This experience includes CRADAs with National Labs, government funding of Radar Cross Section (RCS) work, and code development at Northwestern University and the University of Colorado at Boulder.
Having a design tool such as LC available gives the developer a step function improvement in reliability. It allows early and accurate evaluation of key technologies which reduces the number of prototypes and therefore allows a shorter time to market. Ideally the product design will be correct by design the first time. Overall the designer needs to see and understand to succeed. LC is an innovative and intuitive engineering tool that can educate and empower even the EM novice.