cordic
Folders and files
Name | Name | Last commit date | ||
---|---|---|---|---|
parent directory.. | ||||
This is the latest (and greatest!) in a long series of fully-unrolled CORDIC processors written by Larry Doolittle, with occasional help from Ming Choy and Gang Huang. It is written in portable Verilog, synthesizable for pretty much any FPGA; part of the Verilog is in turn composed by a Python program. The license for all these files is based on 3-clause BSD. See LICENSE.md, enclosed. Good reference material on CORDIC hardware in general is given by Ray Andraka at http://andraka.com/cordic.php and of course see the general and long-winded background at https://en.wikipedia.org/wiki/CORDIC The module cordicg_b22 (in cordicg_b22.v) can be instantiated in application code. One of its parameters (width) sets the bit width of the X and Y input and output ports. The phase input and output are one bit wider than the X and Y ports. See below for additional configuration. The "_b22" above refers to the internal data path width of the CORDIC computations, necessarily larger than the port width. Note that the file cordicg_b22.v is not included in the source tarball; rather, it is generated by the Python program cordicgx.py using python3 cordicgx.py 22 cordicg_b22.v This rule is embedded in the Makefile, so it's equivalent to say make cordicg_b22.v DPW=22 Of course, there's nothing magic about the choice of 22, it's your choice to balance DSP accuracy and resources. See perf.png, another Makefile target. The complete parameter list for cordicg_$DPW is: width [2, $DPW-1] port width nstg [2, $DPW] number of CORDIC stages def_op [0, 3] see below The number of logic elements scales approximately as 3*DPW*nstg. Latency is nstg cycles. Phase ports are in "natural" binary units, such that wrapping around the finite-width digital word is a 2*pi wrap of conventional angle. As such, it can be interpreted equally well as signed or unsigned. X and Y ports are always signed, even when X is used as a radius output in R->P mode. You are expected to know something about CORDIC when setting up their scaling: a CORDIC engine has an intrinsic gain of about 1.64676 (asymptotic value for a large number of stages). Also be aware that a full-scale input on both X and Y has a radius sqrt(2) larger than just full scale in one axis. This module does not detect or saturate overflows; it just wraps, which is useless, so don't let that happen. The 2-bit op input selects the operation mode as follows: 0 Polar->Rectangular "rotation" (phaseout will be close to zero) 1 Rectangular->Polar "vectoring" (yout will be close to zero) 2 not used 3 Follow All three data inputs are used in all modes. To get an ordinary P->R computation with op=0, set yin to zero. It`s also possible to use that mode for general vector rotation of the input (x,y) vector by angle phasein. To get an ordinary R->P computation with op=1, set phasein to zero. A non-zero phasein in that mode will simply be added to the answer. See below for info on follow mode. The op input is allowed to vary cycle-by-cycle. Feel free to interleave R->P with P->R computations. One pipelined CORDIC computation, based on the three data inputs and the op control input, starts on every (posedge clk). The def_op parameter sets the initial value of the op port; in use cases where op is constant, setting def_op to match can help the synthesizer optimize away unused resources. Version 27 has an important API name change: the module you instantiate now has its data path bit-width encoded in the name, as described above. The generator and testbenches were rewritten from Matlab/Octave and awk to Python, tested compatible with both python2 and python3. The resulting ports and functionality of the Verilog module are unchanged. This version also gets rid of the Verilog include file and its hidden configuration state, that made previous versions needlessly hard to incorporate in larger projects. The previously hidden data path width now shows up as part of the module name, and the previously hidden number of stages is now a parameter, as described above. Thus if you previously instantiated a cordicg(), that in turn included cordicg.vh generated with o=22 and s=20, you would now instantiate cordicg_b22() and set its parameter nstg to 20. The code base is now drawn from the LBNL-ATG Bedrock project, with an actual license! Version 26 has no changes to the core synthesizable code; it improves the documentation, fixes the generated Verilog code for 33 < o < 56, and adds provision to check synthesis on three generations of Xilinx chips. It is interesting to compare and contrast the maximum speed this CORDIC engine can run (in its default configuration with 18-bit ports, 22-bit internal data path witch, and 20 stages) on the various architectures. Summary: part speed LUTs chip price chip LUTs CORDIC price xc3s1000-ft256-5 7.2 ns 1686 46.60 15360 4.87 xc6slx45t-fgg484-3 5.2 ns 1596 84.39 27288 4.78 xc7a100t-fgg484-2 3.8 ns 1564 141.25 63400 3.31 xc7k70t-fbg484-1 3.1 ns 1629 127.21 41000 4.86 5CSXFC6D6F31C8N 4.4 ns 2320 226.89 110000 4.78 where the price was as of 2014-05-25 at Digi-Key. The Xilinx synthesizer is XST 14.7. The CORDIC price is an upper limit, since it assumes all the non-LUT resources on the chip are valueless. Semi-incompatible change between versions 24 and 25: the op parameter is two bits instead of one. Just pad on the left with zero (as will be performed by default in Verilog), and there will be no change in function. The new bit enables follow mode, where the rotation phase is the negative of the previous operation. For such cycles, the input phase is ignored. The hardware required to implement this new mode is small, about one logic cell per stage, and even that should be stripped away by the synthesizer if op[1] is hard wired to zero. The follow mode's computation can be also performed by two successive passes through the CORDIC engine; using follow mode saves a factor of two in latency, and reduces round-off error. Incompatible change between versions 23 and 24: the phase of the rectangular to polar conversion has been changed by pi. That means that when op==1, the angle output is truly atan2(y,x), and the x (R) output in that mode is now positive. The other new feature of version 24, besides an additional test bench mode, is the parameter op_def. Use cases with constant op input can set this parameter to match, and might save some gates and/or timing when synthesizing for Xilinx. The test platform is Icarus Verilog, Xilinx XST, using Debian GNU/Linux s the operating system. The code is generally standards-based and not version-specific. As shipped, results of the regression test fired off by "make" are: python3 cordicgx.py 22 cordicg_b22.v iverilog -Wall -DSIMULATE -Wno-timescale -DDPW=22 -pnstg=20 -o cordicg_tb cordicg_tb.v cordicg_b22.v cstageg.v addsubg.v vvp -N cordicg_tb +op=0 > cordic_ptor.dat Check of x,y,theta->x,y python3 cordic_check.py cordic_ptor.dat test covers 15958 points, maximum amplitude is 90325 counts peak error 1.25 bits, 0.0010 % rms error 0.36 bits, 0.0003 % PASS vvp -N cordicg_tb +op=1 > cordic_rtop.dat Check of x,y,theta->r,theta python3 cordic_check.py cordic_rtop.dat test covers 7979 points, maximum amplitude is 129001 counts peak error 1.06 bits, 0.0008 % rms error 0.36 bits, 0.0003 % PASS vvp -N cordicg_tb +rmix=1 > cordic_bias.dat Check of downconversion bias python3 cordic_check.py bias cordic_bias.dat test covers 6102 points averages 0.027 -0.007 PASS vvp -N cordicg_tb +op=3 > cordic_fllw.dat Check of follow mode python3 cordic_check.py cordic_fllw.dat test covers 11968 points, maximum amplitude is 129001 counts peak error 3.27 bits, 0.0025 % rms error 0.41 bits, 0.0003 % PASS Note that the theoretical lower limit for peak error is 0.5, and for rms error is 1/sqrt(12) = 0.29. More information about the accuracy behavior is given in a plot you can create with "make perf.png". Happy computing! Larry Doolittle <[email protected]> March 10, 2020