Traditionally, FPGAs are not considered suitable for low-power application design because of higher quiescent power, significant energy dissipation during start-up, and higher dynamic power due to longer interconnects and overheads for reconfigurability. However, with advances in FPGA manufacturing technologies and growing demand for feature-rich mobile applications in both civilian and military communities FPGAs are being considered an attractive device for applications deployed in power-constrained environments. Some of the reasons for considering FPGA for energy efficient application design are:
— In general, FPGAs are denser, use lower supply voltage, and provide more computation per Watt than the previous generation of devices. Contrary to the concern, the latest 90 nm devices from both Xilinx and Altera are more power efficient. According to a previous article (http://www.fpgajournal.com/articles/20040914_virtex4.htm), the Xilinx Virtex 4 is more than 50% more power efficient than their previous generations
— Non-volatile Flash based configuration memory available in some FPGAs (e.g. Actel ProASIC PLUS) allows rapid and power efficient startup
— Partial reconfiguration enables use of a smaller number of FPGAs and dynamic reconfiguration for low-power design (e.g. Xilinx Virtex-II series)
— Several FPGAs such as Xilinx Virtex-II Pro and ATMEL FPSLIC integrate microprocessor(s) within the reconfigurable fabric to provide the designers a choice between high performance due to FPGAs and low power due to the processors. Some such devices also support sleep-modes for the integrated processor for low-power design
— Changing communication standards and growing consumer demand for processing voice, photo, audio, graphics, video, etc. and shortening design cycle for handheld devices make FPGA an attractive alternative to both ASICs and ISA based processors
— Military and Aerospace applications such as remote target detection, Unattended Ground Sensors, and Unmanned Aerial Vehicles have a constant need of higher performance and are increasingly being deployed in power-constrained environments
— Several academic research initiatives have shown that it is possible to design energy efficient signal processing kernels using algorithmic techniques [1,2,3]. These designs have been demonstrated to have fared better than low-power microprocessors in terms of both time and energy performance
— There is a growing focus on next generation FPGA devices to support low-power features such as dual V DD/Vt gates and dual supply voltage
However, when we consider the design tools provided by the FPGA and EDA vendors such as Altera, Cadence, Mentor, Synopsis, Synplicity, and Xilinx the support for energy efficient application design using FPGAs is evolving at best. Most of the FPGA vendors provide tools to estimate energy dissipation by the designs. Xilinx XPower and Actel SmartPower are two such examples. However, it will be clear as we go, that estimating energy dissipation is a necessary but not sufficient support for energy efficient application design.
In the following, we will discuss some of the issues that need to be addressed for energy efficient application design using FPGAs. We focus on signal processing applications with streaming input arriving at a pre-specified rate. We do not assume a target device but consider the design problem to be a generic device selection problem where a number of devices are evaluated to identify suitable ones based on the specified performance requirements.
Figure 1: Performance Tradeoff based on Block size for a Blocked Matrix Multiply Core |
We view the application as a set of tasks. We assume that multiple energy efficient designs (or IP cores) are available for each application task. This is a valid assumption as several academic research groups have demonstrated the availability of such IP cores developed using the available tools [1,2]. Multiple designs allow a tradeoff among the three performance metrics; area, energy, and latency (Figure 1). Thus application design can be described as the selection and scheduling of appropriate implementation for each task such that the design meets the performance constraints. Thus some of the issues that need to be addressed during low power application design using FPGAs are the following:
— Based on the input rate, it is possible to switch off the devices between processing of two inputs and save energy. However, this options needs to be evaluated against time and energy dissipated to restart the devices
— Reconfiguration is attractive as it allows FPGAs to provide the most efficient configuration for each task being processed and due to time sharing, it also reduces the overall energy dissipation. However, this option needs to be evaluated for energy and time cost for reconfiguration
— Availability of multiple devices, IP cores, and algorithms result in a large design space that must be traversed efficiently
Addressing the above issues is referred to as high-level design space exploration (DSE) for energy efficient application design using FPGAs. This is in contrast to low-level design space exploration performed by the available FPGA design tools that explore various binding options during the design process. To the best of our knowledge, none of the commercial design tools for FPGA support high-level DSE. However, high-level DSE is not an alternative for the available design tools; instead, both are complementary to each other. The solution based on high-level DSE can be used as a starting point (golden reference) by the available FPGA design tools.
In this article, we discuss a design environment suitable for high-level design space exploration. We exploit the model-integrated computing (MIC) technique to develop a user-friendly graphical modeling and design space exploration environment using Generic Modeling Environment (GME). Use of MIC allows us to develop an open and extensible design environment.
Energy Efficient Application Design using the MIC Approach
The key idea of the MIC approach is the extension of the scope and usage of models such that they form the “backbone” of a model-integrated system development process (http://www.isis.vanderbilt.edu/research/research.html). Using the MIC technology, a designer captures the information relevant to the system being designed in the form of high-level models. The high-level models can explicitly represent the target application, target hardware, and dependencies and constraints among the different components of the models. Such models act as a repository of information that is needed for analyzing the system. Generic Modeling Environment, GME (GME 4) is a graphical tool-suite that enables development of a modeling language for a domain, provides graphical interface to model specific problem instances for the domain, and facilitates integration of tools that can be driven through the models. A metamodel (modeling paradigm) is a formal description of model construction semantics. Once the user specifies the metamodel, it can be used to configure GME itself to present a modeling environment specific to the problem domain. Model interpreters are software components that translate the information captured in the models to drive integrated tools that estimate the performance (latency, energy, throughput, etc.) of a system. Feedback interpreters are software components that analyze the output generated by the integrated tools and update the models. These interpreters are based on the model construction semantics and thus are suitable for any model based on a given modeling paradigm. Therefore, these interpreters are essentially automation tools that, once developed, are used for several system design problems.
In the following, we discuss two simple yet powerful models that enable specification of the high-level DSE problem and allow integration (through GME) of various techniques and tools to the design environment. We discuss two such integrations and extensions.
Application Modeling
A popular model for applications is a data flow graph (DFG). DFG can be easily extended to meet the requirements of application design using FPGAs. Hierarchical DFG with alternatives is one such extension [4]. The hierarchical DFG is represented as a directed acyclic graph ( Figure 2). The nodes associated with the graph are classified into three types; leaf, compound and alternative. The leaf node represents an application task (or kernel) mapped onto a device operating in a particular configuration and is associated with the performance estimate of the task. The alternative nodes capture alternatives associated with a task. An alternative can only be a leaf node or a compound node. The alternatives implement the same task but model different algorithm or implementation on different FPGAs. A compound node contains a hierarchical data flow graph with alternatives. The compound nodes allow easy management of large application models and are responsible for the hierarchical nature of the model. The directed edges connecting the nodes represent the temporal dependencies among the nodes.
Figure 2 : Hierarchical Data Flow Model with Alternatives
|
Modeling Target Devices
Candidate devices are modeled based on the configurations supported by the device. In addition, given two configurations A and B, we assume that the transitions from A to B and B to A are associated with transition costs (latency and energy). The model is based on an augmented finite state machine (FSM). Figure 3 shows a sample model for a device with 3 configurations represented by three nodes S0, S1, S2. Each pair of nodes is connected with a pair of directed edges. Each edge corresponds to a reconfiguration from the configuration represented as the source node to the configuration represented as the destination node. Each edge is also associated with a 2-tuple (latency cost, energy cost). Each configuration is associated with an estimate of average power consumed while idling (P1, P2) in Figure 3. The model also indicates a default configuration and shut-down state.
Figure 3 : Modeling Device States and Transition Costs
|
Using the Models for High-level Design Space Exploration
Rapid High-level Design Space Exploration
We explored the option of integrating an ordered-binary decision diagram based design space exploration tool (DESERT), and a high-level performance estimator (HiPerE) for heterogeneous embedded system. DESERT, a design space exploration tool developed at Vanderbilt University, allows us to evaluate a large design space based on constraints. The constraints are of two types; performance constraints and design constraints. The performance constraints specify the latency and energy requirements. Performance constraints are of the form “metric < constant” where the metric can be latency or energy. The design constraints specify valid combinations of task mappings. For example, if multiple FPGAs are available, a design constraint of the form “task A1 implemented on FPGA B1 is not compatible with task A2 implemented on FPGA B2” will result in selection of only the designs for which the above constraint holds true.
Using the above constraints, the design space can be quickly evaluated to identify a set of designs that satisfy the constraints. However, DESERT is not an optimization tool. Therefore, we use HiPerE to further evaluate the designs selected by DESERT to identify the final design the meets the overall performance requirement. HiPerE evaluates system-level energy dissipation and latency. In order to provide a rapid estimate, HiPerE operates at the task level abstraction of the application. In addition to the task execution costs, various other aspects considered by HiPerE for accurate performance estimation are data access cost, parallelism in the system, energy dissipation when a component is idle, state transition cost, and duty cycle specification. Our experiments using both DESERT and HiPerE have demonstrated that design space of size approximately 10 5 to 10 6 designs can be evaluated within 5 to 10 minutes.
Efficient Design with Run-time Reconfiguration
Figure 4 : Modeling Reconfiguration
|
Pseudo tasks and design constraints are used to extend the hierarchical data flow graph to enable modeling of reconfiguration. This technique introduces a pseudo task (as an alternative node) between some pairs of tasks where a pair constitutes a source and a destination task. Two tasks are a source-destination pair if both the tasks are mapped onto the same FPGA and the execution of the destination task follows the execution of the source task on the mapped FPGA. The pseudo task models reconfiguration ( Figure 4). The alternatives for this pseudo task are a set of possible reconfigurations determined based on the unique pairs of design choices available for the source and destination tasks. Each reconfiguration is associated with a performance cost. A set of design constraints are specified to ensure that appropriate reconfiguration is chosen based on the mappings chosen for the source and destination tasks. Thus, when design space exploration is performed using DESERT correct reconfigurations are also automatically chosen. In addition, reconfiguration costs also get added to the overall performance of each design while evaluating the designs against the performance constraints.
Conclusion
Presently, we are working on an academic version of a design space exploration tool based on the models discussed at our research group at University of Southern California along with Vanderbilt University. The tool, MILAN (Model-based Integrated Simulation) is a freely available open source design environment available at http://milan.usc.edu/.
References
[1] Energy-efficient and parameterized designs for fast Fourier transform on FPGAs. S. Choi, G. Govindu, J. Jang, V. K. Prasanna. IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003.
[2] Low-power high-level synthesis for FPGA architectures. D. Chen, J. Cong, Y. Fan. International Symposium on Low Power Electronics and Design, 2003.
[3] Low power digital design in FPGAs: a study of pipeline architectures implemented in a FPGA using a low supply voltage to reduce power consumption. A. Garcia, W. Burleson, J. L. Danger. IEEE International Symposium on Circuits and Systems, 2000
[4] Modeling methodology for integrated simulation of embedded systems. A. Ledeczi, J. Davis, S. Neema, A. Agrawal. ACM Transactions on Modeling and Computer and Simulation, Vol. 13, No. 1, 2003.