Prev: PMC or XMC based on Altera parts (preferably Stratix)
Next: domain crossing and clock synchronisation for a high frequency timer
From: KJ on 2 Dec 2009 22:25 On Dec 2, 6:06 pm, "aleksa" <aleks...(a)gmail.com> wrote: > I just found out that even the FDRSE > version doesn't work on the long run. > Forgot to add on my previous post, that another effective technique (in most cases it is the most effective technique) is to simply bactrack and see where it leads. In your case, in the OP you said... "I wrote a test prog that has failed after several seconds. READY was set to '1' when it should have stayed '0'." Forget the part about "it should have stayed '0'", focus on what actually *did* happen which is that READY did go to '1'. Now backtrack by looking at your code and see what does this imply? From your posted code, it means that one of the two conditions must have occurred around the rising edge of CLK. 1. ACTION='1' and ACTIONCODE="00" 2. ACTION='1' and ACTIONCODE="01" and ACTIONBIT = '1'. Even if you think you know that neither condition could have occurred, you must be wrong. One of those two conditions MUST have occurred because READY did get set to 1. Now you continue the backtracking by forming a hypothesis about what must have occurred to meet the conditions for each path. Keep doing this for each path until you spot a likely source of the problem. Then try to verify that this is the case and only then do you fix it. You can keep this backtracking as a mental exercise until you're ready to test the most likely hypothesis. Alternatively, you can try to verify a particular hypothesis before moving on to cut down on the number of paths you need to analyze. To do this though you would have to modify the design in some fashion. Even if it means bringing out additional signals to see what is going on, that is a modification that can make the problem 'go under'. When a problem disappears, but you don't know why, that is the worst of all worlds because you can end up thinking that you've somehow 'fixed' something when in fact you haven't...and deep in your heart you know that problem is still there. The important thing is to forget about what *should* be occurring and simply take the facts (READY = '1') and use the source code of the design to backtrack through what must have transpired in order to make this event occur. By making modifications (like the rewrite that you did) without understanding why the original failed, you're making changes but you can't answer the simple question of why the original didn't work. Kevin Jennings
From: RCIngham on 3 Dec 2009 04:46 [snip /] > >By making modifications (like the rewrite that you did) without >understanding why the original failed, you're making changes but you >can't answer the simple question of why the original didn't work. > >Kevin Jennings > This is sometimes called the "stochastic design method", and is related to the proposed simian rewrite of the complete works of Shakespeare - the time-to-complete is utterly unpredictable... ;-) Cheers, Robert --------------------------------------- This message was sent using the comp.arch.fpga web interface on http://www.FPGARelated.com
From: aleksa on 4 Dec 2009 17:25 > 1. Verify that the timing report for Fmax is greater than the actual > clock frequency. > 2. Verify that the setup time requirement listed in the timing report > for each input is actually being met in the real system. All timings are verified, however there is one problem. This is what I had in my 1st version of UCF: OFFSET = OUT 15 ns AFTER "CLK"; -- ALL pins are constrained. After viewing the reports, I've seen that not only SCODE0 is affected by that constraint, but also the complete DBUS. (Slave CPU writes with SCODE0, Master CPU reads with DBUS) My thinking was/is: the master CPU will not read the data regs until the status reg shows READY='1', so there is no need to optimize timing from CLK to DBUS. So I replaced my UCF with this 2nd version: INST "SCODE0" TNM = CLK_OUT; -- constrain SCODE0 only TIMEGRP "CLK_OUT" OFFSET = OUT 15 ns AFTER "CLK"; At first, that worked. However, after changing my READY code things started to go wrong (and that is when I posted to this NG). Now, I have reverted back to 1st UCF version, and the problem is, I dare to say, gone. Q: has that really solved my problem? I really don't see anything wrong with my VHDL code and I had that test prog running for hours w/o errors now. > 3. Have the timing analyzer analyze all clock domain crossings or look > at the final implementation for clock domain crossings > - Does every clock domain crossing meet that requirement? > - Did you verify that the requirement is met by viewing the final > implementation? I have used timing analyzer (TA) yesterday for the first time, so I don't have much experience. TA shows a list of constrained and unconstrained paths. I did my best and removed almost all of the unconstrained items, only the "Maximum Data Path: CLK to FF" have left, and they all have the delay of only 2.2ns. This is what I now have in my 3rd UCF: for every global clock: NET "CLK" TNM_NET = CLK; TIMESPEC TS_CLK = PERIOD "CLK" 25 ns HIGH 50%; OFFSET = IN 10 ns VALID 15 ns BEFORE "CLK"; OFFSET = OUT 15 ns AFTER "CLK"; next, all combinations of: TIMESPEC TS_CLK1_2 = FROM "CLK1" TO "CLK2" 15 ns; and: TIMESPEC "TS_P2P" = FROM "PADS" TO "PADS" 15 ns; > Have the timing analyzer analyze all clock domain crossings How? Like this: "TIMESPEC TS_CLK1_2 = FROM "CLK1" TO "CLK2" 15 ns;"? Since I now know a little more than yesterday, I went back to 2nd UCF file, hoping to see why that failed. TA did show some errors, but nothing connected to the problem I was seeing, at least I think so. Plenty of unconstrained items, but, again, no apparent connection.. In other words, I have it now working, but am not sure if the problem is really solved, or I'm just currently lucky. > Some of your comments are > contradictory (only one clock, but there are multiple things being > clocked, there are multiple clocks) Well, there are three clocks, but only one (CLK) is important here: - MASTERCLK just toggles WR0, and then CLK copies it to its domain. - SCLK is connected to ordinary pin, and gets sampled with CLK. - (MASTERCLK and CLK are connected to GCLK pins) > - If multiple bits get moved from one domain to another (maybe the two > bits of 'ACTIONCODE' as an example) what one *other* signal is there > that tells you that it is OK to sample these signals and that they are > guaranteed valid? Only one bit is moved: SCODE0 to SHIFTIN when SCODE1='0' and rising SCLK. The signal that tells me that it is OK to sample is rising SCLK with SCODE1='1' and SCODE0='0'. Read my second post, maybe is not commented well, but its all there.
From: aleksa on 5 Dec 2009 04:40 > This is what I had in my 1st version of UCF: > OFFSET = OUT 15 ns AFTER "CLK"; -- ALL pins are constrained. I forgot to mention that I also had PERIOD and OFFSET IN for CLK and PERIOD, OFFSET IN and OFFSET OUT for all other clocks.
From: KJ on 5 Dec 2009 12:06
On Dec 4, 5:25 pm, "aleksa" <aleks...(a)gmail.com> wrote: > > Now, I have reverted back to 1st UCF version, and the problem > is, I dare to say, gone. > > Q: has that really solved my problem? > Form your original post you said... "In real world that didn't work. I wrote a test prog that has failed after several seconds. READY was set to '1' when it should have stayed '0'" Unless you can explain at least to yourself the chain of events that allowed 'READY' to be set to 1 when it should have stayed 0, I would say that no you haven''t really solved the problem because you really don't quite understand the problem. There are many things that one can change to make a problem seem to disappear, but usually they only disappear for some period of time only to reappear later...and this later time is usually at the most inopportune moment and you'll be under some real heat to fix the problem. > I really don't see anything wrong with my VHDL code and I had > that test prog running for hours w/o errors now. > Try heating and cooling the various parts with cold spray and a heat gun and see if it all still works. Look at it this way... - You had a failure (described in your original post) - You haven't explained the reason for the failure - You've put in changes that make the problem less frequent (code changes and constraint changes) - You currently have something that appears to be working (it hasn't failed after several hours) but can't explain why previous versions didn't Now ask youself, if you were the end user rather than the designer, would you feel confident that the issue has been put to rest and will never come back? > > TA shows a list of constrained and unconstrained paths. > > I did my best and removed almost all of the unconstrained items, > only the "Maximum Data Path: CLK to FF" have left, and they > all have the delay of only 2.2ns. > > This is what I now have in my 3rd UCF: > Again, more changes without understanding why the system failed in the first place. > > Since I now know a little more than yesterday, I went back to 2nd UCF > file, hoping to see why that failed. TA did show some errors, You went to the wrong place, put a scope or logic analyzer on the failing hardware. > but nothing connected to the problem I was seeing, at least I think so. > Plenty of unconstrained items, but, again, no apparent connection.. > > In other words, I have it now working, but am not sure if the > problem is really solved, or I'm just currently lucky. > Since you don't understand why it failed, you're getting lucky. There are also two forms of luck. It would be 'good luck' if you happened to stumble upon the fix without understanding the failure. It will be 'bad luck' if this change has only made the problem go away on this board (or some small set of boards) but it comes back when the design goes into production and it resurfaces. Design problems are like submarines, unless you target and sink them, they will re-surface. I'd strongly suggest reading and following the guidelines I outlined in my second posting on December 2 regarding how to debug. That process will lead you to understanding why your original two cracks at it failed. From that knowledge you'll be able to know (not guess) at whether or not your last attempt actually fixes the problem or you got lucky. Remember to start with your older failed attempts since they fail more frequently (you can't debug something that appears to be working). You need to know why something failed before you can evaluate whether you've fixed it or covered it up. Kevin Jennings |