From: Don Goodeve on 1 Apr 2010 14:28 Hi I am currently working on upgrading a legacy embedded application using CFC for real-time storage. When the system was designed (circa 2000/2001), the Hitachi controllers prevalent in the CFC market could sustain high write speeds for small numbers of sectors (ie. 4). Since then though, as published card speeds have improved, the small writes used by this system when applied to newer cards result in missed deadlines; ie. it does not work. My first steps have been to free up some (precious) RAM to increase my buffering. The buffering is now up to 32 sectors with (typical) 16-sector writes, but I am still not getting anywhere close to the performance required. I am writing to the CFC card (512MB, 1GB) in True-IDE mode. There are two modes I operate in. In the first there is one write stream, writing up to 16 sectors at a time (16x256x16-bits). In the second case there are two such streams. Each sector represents 580usec of audio - so a typical write (16 sectors) will occur every 9.2msec. This represents a sustained write-speed of 88.2KB/sec. I am writing to the card using the standard WRITE_SECTORS (0x30) operation code. I am seeing significant delay on the completion of the data transfer before the card comes ready again; long enough that I miss deadlines and have to crash-stop the recording. One write stream works (88.2K a second to contiguous sectors in 32-sector LBA blocks); two fails consistently (interleaved writes to independent LBAs in 32-sector LBA blocks) on all cards I have tried (except the old Hitachi-based cards which work great, but I can no longer obtain). Typically my writes should start aligned on LBA multiples of 16 but this is not currently guaranteed by my code. No write will cross a 32-sector LBA boundary. Contiguousness of the writes is however guaranteed; ie. I write sectors with LBAs 32768 - 32783, then 32784-32799 (aligned within 32-sector block), or I could complete the write as 32768-32778, then 32779-32885, then 32886-32799. My writes will never cross a 32-sector boundary. In some circumstances I start a write part-way through a 32-sector 'cluster'. Note that my FAT clusters are aligned on 32-LBA boundaries (ie. all sector start LBAs are multiples of 32). The delay is such when I have two such stream running (to different FAT clusters - using a 32 sector cluster size) I cannot meet my deadlines. My conclusion from this is that the traffic hitting the card is causing a lot of internal management to occur which results in the type of delays I am seeing - I know these cards ought to be able to sustain of the order of several MB/sec of sustained write. I am seeing them fail at 176KB/sec - an order of magnitude off my expectation. So what I need to figure out is the following: --- 1) Is there a minimum write size I should be using to optimize performance? I am using 512MB and 1GB cards. I am guessing that aligning to a 32-sector boundary may improve things in the average - but not knowing the underlying 'erase unit' structure of the cards and the internal management algorithms, I do not know if this will help. In my application (which is severely memory constrained) the buffer space required to perform these larger writes is at a premium... 2) If 'n' is the internal card erase unit size, am I correct in assuming that LBAs that are multiples of 'n' align on erase unit boundaries? Could it be that there is an offset? 3) Is there any suggestions you can give on optimizing write traffic to the card? If I can do so with small write units (less than 32 sectors at a time) that would be ideal. Thanks... Don Goodeve --------------------------------------- Posted through http://www.EmbeddedRelated.com
From: Vladimir Vassilevsky on 1 Apr 2010 15:04 Don Goodeve wrote: > Hi > > I am currently working on upgrading a legacy embedded application using CFC > for real-time storage. [...] > 1) Is there a minimum write size I should be using to optimize performance? More sectors you write at once, the better. Practically, there is little difference if you write ~ 32 sectors or more. > I am using 512MB and 1GB cards. I am guessing that aligning to a 32-sector > boundary may improve things in the average - but not knowing the underlying > 'erase unit' structure of the cards and the internal management algorithms, > I do not know if this will help. In my application (which is severely > memory constrained) the buffer space required to perform these larger > writes is at a premium... > > 2) If 'n' is the internal card erase unit size, am I correct in assuming > that LBAs that are multiples of 'n' align on erase unit boundaries? Could > it be that there is an offset? That is not very important. A modern card sustains 20..30 MB/sec regardless of alignment. > 3) Is there any suggestions you can give on optimizing write traffic to the > card? If I can do so with small write units (less than 32 sectors at a > time) that would be ideal. * Mind the overhead of the filesystem. It represents a significant load to the CPU and the bus; so your transfer could be very well limited by that. * The data stream from/to CF card could be rather nonuniform; you can expect sudden delays ~hundreds of milliseconds. The buffering should be deep enough to accomodate for those delays. Vladimir Vassilevsky DSP and Mixed Signal Design Consultant http://www.abvolt.com
From: Jeffrey Creem on 2 Apr 2010 09:05 Vladimir Vassilevsky wrote: > > > Don Goodeve wrote: > >> Hi >> I am currently working on upgrading a legacy embedded application >> using CFC >> for real-time storage. > > [...] > >> 1) Is there a minimum write size I should be using to optimize >> performance? > > More sectors you write at once, the better. Practically, there is little > difference if you write ~ 32 sectors or more. > >> I am using 512MB and 1GB cards. I am guessing that aligning to a >> 32-sector >> boundary may improve things in the average - but not knowing the >> underlying >> 'erase unit' structure of the cards and the internal management >> algorithms, >> I do not know if this will help. In my application (which is severely >> memory constrained) the buffer space required to perform these larger >> writes is at a premium... >> >> 2) If 'n' is the internal card erase unit size, am I correct in assuming >> that LBAs that are multiples of 'n' align on erase unit boundaries? Could >> it be that there is an offset? > > That is not very important. A modern card sustains 20..30 MB/sec > regardless of alignment. > This has not quite been my experience. If one is doing large writes that are many multiples of the erase unit, it is true however, if system limitations are such that this can't be done and especially of some aspect of the filesystem causes one to do non-linear writes, performance can drop by factors of 10 quite easily.
From: Mark Borgerson on 2 Apr 2010 10:16 In article <km6h87-7ga.ln1(a)newserver.thecreems.com>, jeff(a)thecreems.com says... > Vladimir Vassilevsky wrote: > > > > > > Don Goodeve wrote: > > > >> Hi > >> I am currently working on upgrading a legacy embedded application > >> using CFC > >> for real-time storage. > > > > [...] > > > >> 1) Is there a minimum write size I should be using to optimize > >> performance? > > > > More sectors you write at once, the better. Practically, there is little > > difference if you write ~ 32 sectors or more. > > > >> I am using 512MB and 1GB cards. I am guessing that aligning to a > >> 32-sector > >> boundary may improve things in the average - but not knowing the > >> underlying > >> 'erase unit' structure of the cards and the internal management > >> algorithms, > >> I do not know if this will help. In my application (which is severely > >> memory constrained) the buffer space required to perform these larger > >> writes is at a premium... > >> > >> 2) If 'n' is the internal card erase unit size, am I correct in assuming > >> that LBAs that are multiples of 'n' align on erase unit boundaries? Could > >> it be that there is an offset? > > > > That is not very important. A modern card sustains 20..30 MB/sec > > regardless of alignment. > > > > This has not quite been my experience. If one is doing large writes that > are many multiples of the erase unit, it is true however, if system > limitations are such that this can't be done and especially of some > aspect of the filesystem causes one to do non-linear writes, performance > can drop by factors of 10 quite easily. > A good example would be a dos-like system that updates a FAT with every write. That will probably cause a block-erase delay with every FAT write. It may cause other delays if the repeated writes to the FAT activate a wear-leveling mechanism that translates the address to move the FAT to a new physical block. Mark Borgerson
|
Pages: 1 Prev: TCP/IP throughput with lwip in bf536 Next: Direct Client Requirement |