From: Don Goodeve on
Hi

I am currently working on upgrading a legacy embedded application using CFC
for real-time storage. When the system was designed (circa 2000/2001), the
Hitachi controllers prevalent in the CFC market could sustain high write
speeds for small numbers of sectors (ie. 4). Since then though, as
published card speeds have improved, the small writes used by this system
when applied to newer cards result in missed deadlines; ie. it does not
work. My first steps have been to free up some (precious) RAM to increase
my buffering. The buffering is now up to 32 sectors with (typical)
16-sector writes, but I am still not getting anywhere close to the
performance required.

I am writing to the CFC card (512MB, 1GB) in True-IDE mode. There are two
modes I operate in. In the first there is one write stream, writing up to
16 sectors at a time (16x256x16-bits). In the second case there are two
such streams. Each sector represents 580usec of audio - so a typical write
(16 sectors) will occur every 9.2msec. This represents a sustained
write-speed of 88.2KB/sec. I am writing to the card using the standard
WRITE_SECTORS (0x30) operation code. I am seeing significant delay on the
completion of the data transfer before the card comes ready again; long
enough that I miss deadlines and have to crash-stop the recording. One
write stream works (88.2K a second to contiguous sectors in 32-sector LBA
blocks); two fails consistently (interleaved writes to independent LBAs in
32-sector LBA blocks) on all cards I have tried (except the old
Hitachi-based cards which work great, but I can no longer obtain).

Typically my writes should start aligned on LBA multiples of 16 but this is
not currently guaranteed by my code. No write will cross a 32-sector LBA
boundary.

Contiguousness of the writes is however guaranteed; ie. I write sectors
with LBAs 32768 - 32783, then 32784-32799 (aligned within 32-sector block),
or I could complete the write as 32768-32778, then 32779-32885, then
32886-32799. My writes will never cross a 32-sector boundary. In some
circumstances I start a write part-way through a 32-sector 'cluster'. Note
that my FAT clusters are aligned on 32-LBA boundaries (ie. all sector start
LBAs are multiples of 32).

The delay is such when I have two such stream running (to different FAT
clusters - using a 32 sector cluster size) I cannot meet my deadlines.

My conclusion from this is that the traffic hitting the card is causing a
lot of internal management to occur which results in the type of delays I
am seeing - I know these cards ought to be able to sustain of the order of
several MB/sec of sustained write. I am seeing them fail at 176KB/sec - an
order of magnitude off my expectation.

So what I need to figure out is the following:
---
1) Is there a minimum write size I should be using to optimize performance?
I am using 512MB and 1GB cards. I am guessing that aligning to a 32-sector
boundary may improve things in the average - but not knowing the underlying
'erase unit' structure of the cards and the internal management algorithms,
I do not know if this will help. In my application (which is severely
memory constrained) the buffer space required to perform these larger
writes is at a premium...

2) If 'n' is the internal card erase unit size, am I correct in assuming
that LBAs that are multiples of 'n' align on erase unit boundaries? Could
it be that there is an offset?

3) Is there any suggestions you can give on optimizing write traffic to the
card? If I can do so with small write units (less than 32 sectors at a
time) that would be ideal.

Thanks...

Don Goodeve




---------------------------------------
Posted through http://www.EmbeddedRelated.com
From: Vladimir Vassilevsky on


Don Goodeve wrote:

> Hi
>
> I am currently working on upgrading a legacy embedded application using CFC
> for real-time storage.

[...]

> 1) Is there a minimum write size I should be using to optimize performance?

More sectors you write at once, the better. Practically, there is little
difference if you write ~ 32 sectors or more.

> I am using 512MB and 1GB cards. I am guessing that aligning to a 32-sector
> boundary may improve things in the average - but not knowing the underlying
> 'erase unit' structure of the cards and the internal management algorithms,
> I do not know if this will help. In my application (which is severely
> memory constrained) the buffer space required to perform these larger
> writes is at a premium...
>
> 2) If 'n' is the internal card erase unit size, am I correct in assuming
> that LBAs that are multiples of 'n' align on erase unit boundaries? Could
> it be that there is an offset?

That is not very important. A modern card sustains 20..30 MB/sec
regardless of alignment.

> 3) Is there any suggestions you can give on optimizing write traffic to the
> card? If I can do so with small write units (less than 32 sectors at a
> time) that would be ideal.

* Mind the overhead of the filesystem. It represents a significant load
to the CPU and the bus; so your transfer could be very well limited by that.

* The data stream from/to CF card could be rather nonuniform; you can
expect sudden delays ~hundreds of milliseconds. The buffering should be
deep enough to accomodate for those delays.


Vladimir Vassilevsky
DSP and Mixed Signal Design Consultant
http://www.abvolt.com
From: Jeffrey Creem on
Vladimir Vassilevsky wrote:
>
>
> Don Goodeve wrote:
>
>> Hi
>> I am currently working on upgrading a legacy embedded application
>> using CFC
>> for real-time storage.
>
> [...]
>
>> 1) Is there a minimum write size I should be using to optimize
>> performance?
>
> More sectors you write at once, the better. Practically, there is little
> difference if you write ~ 32 sectors or more.
>
>> I am using 512MB and 1GB cards. I am guessing that aligning to a
>> 32-sector
>> boundary may improve things in the average - but not knowing the
>> underlying
>> 'erase unit' structure of the cards and the internal management
>> algorithms,
>> I do not know if this will help. In my application (which is severely
>> memory constrained) the buffer space required to perform these larger
>> writes is at a premium...
>>
>> 2) If 'n' is the internal card erase unit size, am I correct in assuming
>> that LBAs that are multiples of 'n' align on erase unit boundaries? Could
>> it be that there is an offset?
>
> That is not very important. A modern card sustains 20..30 MB/sec
> regardless of alignment.
>

This has not quite been my experience. If one is doing large writes that
are many multiples of the erase unit, it is true however, if system
limitations are such that this can't be done and especially of some
aspect of the filesystem causes one to do non-linear writes, performance
can drop by factors of 10 quite easily.
From: Mark Borgerson on
In article <km6h87-7ga.ln1(a)newserver.thecreems.com>, jeff(a)thecreems.com
says...
> Vladimir Vassilevsky wrote:
> >
> >
> > Don Goodeve wrote:
> >
> >> Hi
> >> I am currently working on upgrading a legacy embedded application
> >> using CFC
> >> for real-time storage.
> >
> > [...]
> >
> >> 1) Is there a minimum write size I should be using to optimize
> >> performance?
> >
> > More sectors you write at once, the better. Practically, there is little
> > difference if you write ~ 32 sectors or more.
> >
> >> I am using 512MB and 1GB cards. I am guessing that aligning to a
> >> 32-sector
> >> boundary may improve things in the average - but not knowing the
> >> underlying
> >> 'erase unit' structure of the cards and the internal management
> >> algorithms,
> >> I do not know if this will help. In my application (which is severely
> >> memory constrained) the buffer space required to perform these larger
> >> writes is at a premium...
> >>
> >> 2) If 'n' is the internal card erase unit size, am I correct in assuming
> >> that LBAs that are multiples of 'n' align on erase unit boundaries? Could
> >> it be that there is an offset?
> >
> > That is not very important. A modern card sustains 20..30 MB/sec
> > regardless of alignment.
> >
>
> This has not quite been my experience. If one is doing large writes that
> are many multiples of the erase unit, it is true however, if system
> limitations are such that this can't be done and especially of some
> aspect of the filesystem causes one to do non-linear writes, performance
> can drop by factors of 10 quite easily.
>
A good example would be a dos-like system that updates a FAT with every
write. That will probably cause a block-erase delay with every FAT
write. It may cause other delays if the repeated writes to the FAT
activate a wear-leveling mechanism that translates the address to move
the FAT to a new physical block.


Mark Borgerson