From: Nico Coesel on 23 May 2010 04:26 Gabor <gabor(a)alacron.com> wrote: >On May 21, 6:19=A0pm, Philip Pemberton <usene...(a)philpem.me.uk> wrote: >> OK, this is nuts... >> >> With ISE Synthesizer set up like this: >> =A0 Optimisation Goal: =A0 AREA >> =A0 Optimisation Effort: NORMAL >> >> The core works fine (the timing is a little out, but not bad enough to >> pooch the whole thing). If I set it up like this: >> =A0 Optimisation Goal: =A0 SPEED >> =A0 Optimisation Effort: NORMAL >> >> Then the whole thing stops working -- it outright fails to read/write the >> SDRAM. I can access the SDRAM controller's cache (32 bytes of the current >> page), but accessing an out-of-page address returns garbage. >> >> If I do the same thing on Quartus? Well, the timing looks better in SPEED >> mode, but it still works fine on the DE1. >> >> What the *bleep* is going on? >> >> -- >As for SPEED vs. AREA, in Xilinx FPGA's you very often >get the best overall timing results using AREA optimization >rather than speed. This is probably because the route >portion of your total path delay is large. This shows up >in larger designs and larger parts especially since the >worst case routing delays grow with the design size. Actually this is a bit of black art. I also get good results by adjusting the 'pack factor' (IIRC) which puts related logic closer together. IMHO it takes some trial and error to find the optimum place & route settings for a design which gets close to the limits of the FPGA regarding speed and/or size. -- Failure does not prove something is impossible, failure simply indicates you are not using the right tools... nico(a)nctdevpuntnl (punt=.) --------------------------------------------------------------
From: Philip Pemberton on 23 May 2010 05:14 On Sat, 22 May 2010 20:11:25 -0700, Gabor wrote: > As others have mentioned, you probably have some unconstrained paths > causing timing violations. [...] OK, I've just set up these constraints: #Created by Constraints Editor (xc3s700a-ft256-4) - 2010/05/21 NET "CLOCK" TNM_NET = CLOCK; TIMESPEC TS_CLOCK = PERIOD "CLOCK" 25 MHz HIGH 50%; #Created by Constraints Editor (xc3s700a-ft256-4) - 2010/05/23 INST "SDRAM_A<0>" TNM = sdram_outs; INST "SDRAM_A<1>" TNM = sdram_outs; INST "SDRAM_A<2>" TNM = sdram_outs; INST "SDRAM_A<3>" TNM = sdram_outs; INST "SDRAM_A<4>" TNM = sdram_outs; INST "SDRAM_A<5>" TNM = sdram_outs; INST "SDRAM_A<6>" TNM = sdram_outs; INST "SDRAM_A<7>" TNM = sdram_outs; INST "SDRAM_A<8>" TNM = sdram_outs; INST "SDRAM_A<9>" TNM = sdram_outs; INST "SDRAM_A<10>" TNM = sdram_outs; INST "SDRAM_A<11>" TNM = sdram_outs; INST "SDRAM_BA<0>" TNM = sdram_outs; INST "SDRAM_BA<1>" TNM = sdram_outs; INST "SDRAM_CAS_N" TNM = sdram_outs; INST "SDRAM_CKE" TNM = sdram_outs; INST "SDRAM_CLK" TNM = sdram_outs; INST "SDRAM_CS_N" TNM = sdram_outs; INST "SDRAM_DQ<0>" TNM = sdram_outs; INST "SDRAM_DQ<1>" TNM = sdram_outs; INST "SDRAM_DQ<2>" TNM = sdram_outs; INST "SDRAM_DQ<3>" TNM = sdram_outs; INST "SDRAM_DQ<4>" TNM = sdram_outs; INST "SDRAM_DQ<5>" TNM = sdram_outs; INST "SDRAM_DQ<6>" TNM = sdram_outs; INST "SDRAM_DQ<7>" TNM = sdram_outs; INST "SDRAM_DQ<8>" TNM = sdram_outs; INST "SDRAM_DQ<9>" TNM = sdram_outs; INST "SDRAM_DQ<10>" TNM = sdram_outs; INST "SDRAM_DQ<11>" TNM = sdram_outs; INST "SDRAM_DQ<12>" TNM = sdram_outs; INST "SDRAM_DQ<13>" TNM = sdram_outs; INST "SDRAM_DQ<14>" TNM = sdram_outs; INST "SDRAM_DQ<15>" TNM = sdram_outs; INST "SDRAM_DQ<16>" TNM = sdram_outs; INST "SDRAM_DQ<17>" TNM = sdram_outs; INST "SDRAM_DQ<18>" TNM = sdram_outs; INST "SDRAM_DQ<19>" TNM = sdram_outs; INST "SDRAM_DQ<20>" TNM = sdram_outs; INST "SDRAM_DQ<21>" TNM = sdram_outs; INST "SDRAM_DQ<22>" TNM = sdram_outs; INST "SDRAM_DQ<23>" TNM = sdram_outs; INST "SDRAM_DQ<24>" TNM = sdram_outs; INST "SDRAM_DQ<25>" TNM = sdram_outs; INST "SDRAM_DQ<26>" TNM = sdram_outs; INST "SDRAM_DQ<27>" TNM = sdram_outs; INST "SDRAM_DQ<28>" TNM = sdram_outs; INST "SDRAM_DQ<29>" TNM = sdram_outs; INST "SDRAM_DQ<30>" TNM = sdram_outs; INST "SDRAM_DQ<31>" TNM = sdram_outs; INST "SDRAM_DQM<0>" TNM = sdram_outs; INST "SDRAM_DQM<1>" TNM = sdram_outs; INST "SDRAM_DQM<2>" TNM = sdram_outs; INST "SDRAM_DQM<3>" TNM = sdram_outs; INST "SDRAM_RAS_N" TNM = sdram_outs; INST "SDRAM_WE_N" TNM = sdram_outs; #Created by Constraints Editor (xc3s700a-ft256-4) - 2010/05/23 TIMEGRP "sdram_outs" OFFSET = OUT 10 ns AFTER "CLOCK"; TIMEGRP "sdram_outs" OFFSET = IN 10 ns VALID 10 ns BEFORE "CLOCK"; Now I can build the core with OPTIMIZE=area or OPTIMIZE=speed, and it works fine. Question: do these timing constraints look sane? I figured since I'm using a 270-degree shifted version of a DCM'd version of the input clock, the timing settings should be around a quarter of Tclk_period (Clk period is 40ns for 25MHz, so that would be 10ns). CLOCK is the 25MHz crystal input, MCLK is the output from the first DCM (a *25, /25 "multiplier" that effectively acts as a buffer and duty cycle corrector). SDRAM_CLK is an output from the FPGA to the SDRAM, which is sourced from the CLK270 output of the second DCM. Thanks, -- Phil. usenet10(a)philpem.me.uk http://www.philpem.me.uk/ If mail bounces, replace "10" with the last two digits of the current year
From: Brian Drummond on 23 May 2010 06:07 On 23 May 2010 09:14:47 GMT, Philip Pemberton <usenet10(a)philpem.me.uk> wrote: >On Sat, 22 May 2010 20:11:25 -0700, Gabor wrote: > >> As others have mentioned, you probably have some unconstrained paths >> causing timing violations. [...] > >OK, I've just set up these constraints: >#Created by Constraints Editor (xc3s700a-ft256-4) - 2010/05/23 >TIMEGRP "sdram_outs" OFFSET = OUT 10 ns AFTER "CLOCK"; >TIMEGRP "sdram_outs" OFFSET = IN 10 ns VALID 10 ns BEFORE "CLOCK"; > >Now I can build the core with OPTIMIZE=area or OPTIMIZE=speed, and it >works fine. > >Question: do these timing constraints look sane? I figured since I'm >using a 270-degree shifted version of a DCM'd version of the input clock, >the timing settings should be around a quarter of Tclk_period (Clk period >is 40ns for 25MHz, so that would be 10ns). Given such a slow clock they look OK. But seeing that has prompted some memories (it's a few years since I set up constraints for SDR SDRAM). The key to getting good I/O timing is to ensure the tools place the I/O registers in the right place - the IOBs rather than the core logic. Then there is no routing involved, and the constraints really only act as a sanity check. (at 200MHz they may alert you to the wrong output standard) If some of your registers were in the IOBs and others weren't, the latter are subject to additional routes of random lengths, and here the constraints WILL help, by forcing PAR to keep these routes down. (and 10ns should be easily achievable). Look at the I/O report near the end of the Map Report (.mrp) file. For each I/O pin you will see a lot of information including the I/O standard, and the registers in the IOB for that pin. For an output pin (e.g. address) I want to see OFF or OUTFF in that list. For an I/O pin (data) I want to see IFF/INFF, OFF/OUTFF and ENBFF which tristates the pin. (Signal names seem to have changed with tool versions) Getting what you want can take some fiddling. For example, you may need to duplicate registers in your code; one to feed the pins and another to use the signal internally. Then you need to convince the synthesis tool to leave them alone; apply the "equivalent-register-removal = no" attribute to the appropriate regs. And check the .MRP file. Loop until done. A few tool versions ago, you also needed to replicate the tristate signal for each ENBFF, and ensure it was the right polarity (active low) but this may have been improved. Downside to all this is that while you have REALLY GOOD external timings, you have lengthened the internal routes by a few ns. So I keep heavy processing hidden behind a second register where that is likely to be a problem. At 25MHz, feel free to ignore all the above, but it may help to see some of what's going on beneath the hood. - Brian
From: Philip Pemberton on 23 May 2010 14:21 On Sun, 23 May 2010 11:07:37 +0100, Brian Drummond wrote: > Given such a slow clock they look OK. Always good to know :) I'm toying with the idea of running the SDRAM controller faster than the CPU core (the limiter is the CPU -- it manages about 60MHz on a Cyclone2 IIRC; Xst reckons about 47MHz for the entire SoC on a Spartan3A XC3S700A-4C). > But seeing that has prompted some memories (it's a few years since I set > up constraints for SDR SDRAM). Yeah, it seems a lot of folk have moved onto DDR or DDR2. SDR-SDRAM seems to have the edge in ease-of-use, but loses out on raw speed. But that said, neither of them can match an SRAM clock-for-clock because of the refresh, precharge and select cycles, and the access latency. Although the caching in the sdram_wb core makes that a bit of a moot point, especially for sequential WISHBONE accesses. > Look at the I/O report near the end of the Map Report (.mrp) file. For > each I/O pin you will see a lot of information including the I/O > standard, and the registers in the IOB for that pin. For an output pin > (e.g. address) I want to see OFF or OUTFF in that list. For an I/O pin > (data) I want to see IFF/INFF, OFF/OUTFF and ENBFF which tristates the > pin. (Signal names seem to have changed with tool versions) Oh, that explains a lot! The "broken" version shows blanks under "Reg(s)" for all the SDRAM pins. The "working" version shows a mix of "OFF1", "IFF1" and blank (only SDRAM_CLK and SDRAM_CKE are blank, which is fair enough -- CLK comes from the DCM, CKE is grounded). Thanks, I'd looked at the Map report, but previously didn't really know what I was looking for, which explains why I didn't pick up on the FFs not being pushed into the IOBs... It seems I set "Pack I/O Registers into IOBs" to "Yes" on the working version (which causes A LOT of warnings), while it's set to "Auto" in the "broken" version. Can I force FFs in the IOBs in the UCF constraints, or do I need to do that with a "// synthesis IOB=FORCE" constraint in the Verilog source? > At 25MHz, feel free to ignore all the above, but it may help to see some > of what's going on beneath the hood. Well, I'm trying it out at 25MHz because I figure the lower my master clock is, the easier it's going to be to make the thing work. Then once it's working, I can look into making it work on a faster clock. Ideally I'd like to get it going at 50MHz or so -- a lot of processing is going to happen in the FPGA (using hardware implementations of the algorithms I'm using) but the CPU (a hacked up version of the LatticeMico32) will be doing a lot of the integer work, framebuffer updating, and so on. Plan #2 is to rig up an LCD controller that can act as a WISHBONE master, then wire that up to one of the spare master ports on the CONMAX bus arbiter. Then I can use any area of main RAM as the framebuffer, and do away with the messy business of having a separate framebuffer RAM. If any of you guys want to see this code, let me know and I'll stick it online. It's pretty ropey code, but it might do as an example to show how to make the LM32 work on non-Lattice hardware (and how to make the toolchain behave itself). On a final note: the ISSI datasheet for the RAM chip appears to be outright WRONG. It specifies 4096 refresh cycles per 64ms, but if the refresh rate is that low I get data readback errors. If I use the refresh rate for the Industrial-graded chip (4096 per 32ms), or even 4096 cycles per 50us, then it works fine... Yes, I'm using a "Commercial" grade part, not the "Industrial" part. Unless mine has been mismarked.... -- Phil. usenet10(a)philpem.me.uk http://www.philpem.me.uk/ If mail bounces, replace "10" with the last two digits of the current year
From: Brian Drummond on 23 May 2010 20:26
On 23 May 2010 18:21:25 GMT, Philip Pemberton <usenet10(a)philpem.me.uk> wrote: >It seems I set "Pack I/O Registers into IOBs" to "Yes" on the working >version (which causes A LOT of warnings), while it's set to "Auto" in the >"broken" version. Can I force FFs in the IOBs in the UCF constraints, or >do I need to do that with a "// synthesis IOB=FORCE" constraint in the >Verilog source? UCF is a bit too late for synthesis... the only tool that reads it is NGDbuild, aka "Translate", which embeds the UCF information in other files passed downstream. I don't do Verilog but it makes sense that there's an equivalent to setting attributes for such things in VHDL. And applying them directly to the correct signals will save warnings elsewhere... Be aware that XST is finicky though. Your "FORCE" attributes may merely result in "constraint is being ignored" warnings unless everything else lines up right (duplicate regs not being optimised away) so if you don't get what you expect in the .mrp, check the synth report carefully... - Brian |