Previous Volume 23, 2004, No. 5-6 Next
Performance Evaluation and Implementation of two Adaptive Routing Algorithms for XGFT Networks
Heikki KARINIEMI, Jari NURMI
 
Abstract
EXtended Generalized Fat Trees (XGFT) are Bidirectional Multistage Interconnection Networks (BMIN). They are more scalable for different system sizes and different performance requirements than fat trees from which they have evolved. The improved scalability has been achieved by allowing switches with different number of ports to be used in different switch stages of these hierarchical networks. XGFTs can be constructed from two separate networks for routing packets upwards and downwards in the XGFT. These up-routing and down-routing networks can be implemented separately with small switches which are connected to each other within the switch nodes of the XGFT. This kind of XGFT achieves higher performance if its topmost root switches are connected to each other with additional links, and if adaptive Turn-Back-When-Possible (TBWP) routing algorithm is used instead of shortest-path routing algorithms. This paper shows that the TBWP has always simple and feasible hardware implementations independently of the structure of the XGFT. This is achieved by address space encoding which eliminates complex computations from the routing decision functions. This paper presents also a new shortest-path routing algorithm named Turn-Back (TB). The~TB~algorithm was designed for such XGFT implementations where the up-routing and down-routing of the packets is performed with one larger switch block within the switch nodes, and where shortest-path routing produces good performance. It is shown in this paper that the TBWP and TB route packets correctly to their destinations. In addition, the performances of the routing algorithms are evaluated with simulations and compared. Simulation results show that the TB is able to produce higher performance than the TBWP with different traffic patterns. They also show that the performance of the XGFTs could be improved by suitable mapping of the communicating.
 
A Flip-Flop Matching Engine to Verify Sequential Optimizations
Solaiman RAHIM, Bruno ROUZEYRE, Lionel TORRES
 
Abstract
Equivalence checking tools often use a flip-flop matching step to avoid the state space traversal. Due to sequential optimizations performed during synthesis (merge, replication, redundancy removal, ...) and don't care conditions, the matching step can be very complex as well as incomplete. If the matching is incomplete, even the use of a fast and efficient SAT solver during the combinational equivalence-checking step may not prevent the failure of this approach. In this paper, we present a flip-flop matching engine, which is able to verify optimized circuits and handle don't care conditions.
 
An Evolvable Combinational Unit for FPGAs
Lukas SEKANINA, Stepan FRIEDL
 
Abstract
A complete hardware implementation of an evolvable combinational unit for FPGAs is presented. The proposed combinational unit consisting of a virtual reconfigurable circuit and evolutionary algorithm was described in VHDL independently of a target platform, i.e. as a soft IP core, and realized in the COMBO6 card. In many cases the unit is able to evolve (i.e. to design) the required function automatically and autonomously, in a few seconds, only on the basis of interactions with an environment. A number of circuits were successfully evolved directly in the FPGA, in particular, 3-bit multipliers, adders, multiplexers and parity encoders. The evolvable unit was also tested in a simulated dynamic environment and used to design various circuits specified by randomly generated truth tables.
 
An FPGA Implementation of a Montgomery Multiplier Over GF(2^m)
Nele MENTENS, Siddika Berna Ors Bart PRENEEL, Joos VANDEWALLE
 
Abstract
This paper describes an efficient FPGA implementation for modular multiplication in the finite field GF(2^m) that is suitable for implementing Elliptic Curve Cryptosystems. We have developed a systolic array implementation of a~Montgomery modular multiplication. Our solution is efficient for large finite fields (m=160-193), that offer a high security level, and it can be scaled easily to larger values of m. The clock frequency of the implementation is independent of the field size. In contrast to earlier work, the design is not restricted to field representations using irreducible trinomials, all one polynomials or equally spaced polynomials.
 
A Simple PLL-Based True Random Number Generator for Embedded Digital Systems
Milos DRUTAROVSKY, Martin SIMKA, Viktor FISCHER, Frederic CELLE
 
Abstract
The paper presents a simple True Random Number Generator (TRNG) which can be embedded in digital Application Specific Integrated Circuits (ASICs) and Field Programmable Logic Devices (FPLDs). As a source of randomness, it uses on-chip noise generated in the internal analog Phase-Locked Loop (PLL) circuitry. In contrast to traditionally used free-running oscillators, it uses a novel method of randomness extraction based on two rationally related synthesized clock signals. The generator has been developed for embedded cryptographic applications, where it significantly increases the system security, but it can be used in a wide range of other applications. The functionality of the proposed solution is demonstrated for the Altera Apex FPLD family, but the same principle can be used for all recent ASICs or FPLDs that include an on-chip reconfigurable analog PLL. The quality of the TRNG output is confirmed by applying special DIEHARD and NIST statistical tests, which pass even for high output bit-rates of several hundreds of Kbits/s.
 
A Generic Dual Core Architecture with Error Containment
Thomas KOTTKE, Andreas STEININGER
 
Abstract
The dual core strategy allows to construct a fail-silent processor from two instances (master/checker) of any arbitrary standard processor. Its main drawbacks are its vulnerability with respect to common mode failures and the existence of residual single points of failure. In this paper we propose a generic frame that systematically eliminates these drawbacks. First, we employ temporal redundancy to cope with common mode failures. Unlike similar approaches we can ensure error containment even if -- as a result of the temporal redundancy -- the comparison by the checker core is delayed. We attain this by introducing a specific delay element for outgoing data. Second, we perform a systematic analysis of potential single points of failure and eliminate these by careful layout, self-checking circuits and similar methods. We finally validate our approach by means of exhaustive fault injection experiments. The results indicate a 100% self-checking coverage for stuck-at faults and complete error containment. Since the proposed framework has been kept generic in the sense that the individual standard processor cores are treated as black boxes, these results are valid independent of the core actually used.
 
Built-In Self-Test Quality Assessment Using Hardware Fault Emulation In FPGAs
Abilio PARREIRA, J. Paulo TEIXEIRA, Marcelino B. SANTOS
 
Abstract
This paper addresses the problem of test quality assessment, namely of BIST solutions, implemented in FPGA and/or in ASIC, through Hardware Fault Emulation (HFE). A novel HFE methodology and tool is proposed, that, using partial reconfiguration, efficiently measures the quality of the BIST solution. The proposed HFE methodology uses Look-Up Tables (LUTs) fault models and is performed using local partial reconfiguration for fault injection on Xilinx(TM) Virtex and/or Spartan FPGA components, with small binary files. For ASIC cores, HFE is used to validate test vector selection to achieve high fault coverage on the physical structure. The methodology is fully automated. Results on ISCAS benchmarks and on an ARM core show that HFE can be orders of magnitude faster than software fault simulation or fully reconfigurable hardware fault emulation.
 
Knapsack Model and Algorithm for Hardware/Software Partitioning Problem
Abhijit RAY, Wu JIGANG, Thambipillai SRIKANTHAN
 
Abstract
Efficient hardware/software partitioning is crucial towards realizing optimal solutions for constraint driven embedded systems. The size of the total solution space is typically quite large for this problem. In this paper, we show that the knapsack model could be employed for the rapid identification of hardware components that provide for time efficient implementations. In particular, we propose a method to split the problem into standard 0-1 knapsack problems in order to leverage on the classical approaches. The proposed method relies on the tight lower and upper bounds for each of these knapsack problems for the rapid elimination of the sub-problems, which are guaranteed not to give optimal results. Experimental results show that, for problem sizes ranging from 30 to 3000, the optimal solution of the whole problem can be obtained by solving only 1 sub-problem except for one case where it required the solution of 3 sub-problems.
 
Previous Volume 23, 2004, No. 5-6 Next