Cache line list handling [Computer Architecture]

Prev: 2nd call - Applied Computing 2010: until 26 July 2010
Next: Opcode Parsing & Invalid Opcodes

From: jacko on 15 Jul 2010 12:09

On Jul 15, 4:48 pm, George Neuner <gneun...(a)comcast.net> wrote:
> On Thu, 15 Jul 2010 08:17:44 -0700 (PDT), jacko <jackokr...(a)gmail.com>
> wrote:
>
> >Is
>
> >http://groups.google.com/group/comp.lang.forth/browse_thread/thread/4...
>
> >important for solving the GC problem? Ref FIFOO
>
> There isn't enough detail in that conversation to figure out what
> problem the structure is trying to solve. What exactly are you
> asking?
>
> George

From my own prospective the mark and sweep cache flushing garbage
collector problem. Also related to the circular pointer counted
reference memory leak problem, and the common tail evaluation of
circular structures solution.

From: jacko on 15 Jul 2010 12:54

The virtualization through a vm relates to cache size, or more
precisly to I cache vs. D cache size. Using a simple virtual machine
the I-cache can be smaller, or more precisly have more decode applied
in the same area. Introduction of a threadidng cache or T cache to
avoid clogging the D cache with contol flow addresses, allows an
individual ip/(pc) per T cache line. If it were not for re-entrant
recursive functions... or maybe this is just an ip push/pop.

To avoid the threading to from code entry exit problem, all that is
needed is a code entry thread address to be made special, and an
opcode to re-enter threading mode, or only execute one opcode before
threading re-entry.

Infact with good design an I cache can be decoded to an RTL
encompassing a cache line unit of execution. Knowing that certain
'registers' are temporary stack pointers, not requiring global memory
writethru, supporting a a mutithreaded pipeline for threadcount equals
cacheline size. The common code base of a threading engine prvents I
cache invalidations. The problem is then defered to the T cache as the
I cache becomes a sort of microcode. The T cache can then be divided
into n parts as the cacheline becomes n instructions. Each part then
has n cycles to achive an indirection lookup or line load/save/copy.

The D cache then becomes the bottleneck, especially with the writeback/
thru/snoop requirement. This could be effectivly split by only
allowing certain address modulo aligment read/write on certain
instructions based on the same modulo in the I cache line.

Any thoughts?

Cheers Jacko

From: George Neuner on 15 Jul 2010 14:35

On Thu, 15 Jul 2010 09:09:52 -0700 (PDT), jacko <jackokring(a)gmail.com>
wrote:

>On Jul 15, 4:48�pm, George Neuner <gneun...(a)comcast.net> wrote:
>> On Thu, 15 Jul 2010 08:17:44 -0700 (PDT), jacko <jackokr...(a)gmail.com>
>> wrote:
>>
>> >Is
>>
>> >http://groups.google.com/group/comp.lang.forth/browse_thread/thread/4...
>>
>> >important for solving the GC problem? Ref FIFOO
>>
>> There isn't enough detail in that conversation to figure out what
>> problem the structure is trying to solve. �What exactly are you
>> asking?
>>
>> George
>
>From my own prospective the mark and sweep cache flushing garbage
>collector problem. Also related to the circular pointer counted
>reference memory leak problem, and the common tail evaluation of
>circular structures solution.

I don't see how the "FIFOO" described in the link would help cache
pollution in GC (any GC). Since marking and copying are both
read-only (no modify) operations, the best solution is to fetch the
value(s) into the L1 cache bypassing the other levels, and for copying
or compacting GC to write it out again bypassing the other levels. I
don't know of a workable solution for sweeping which does modify
structures.

Likewise, I can't see how it fixes circular linking ... all FIFOO
seems intended to do is provide a way to find the roots of structures
that share common sub-structure. I didn't see any provision to
prevent circular linking. It seems to me to be a heavy-weight
solution to the problem - in the absence of tracing or copying GC,
ISTM that using an object table to indirectly reference objects with
multiple parents is a better solution ... delete/sweep of an object
stops where it chases a pointer into the object table.

George

From: Benny Amorsen on 15 Jul 2010 14:45

Morten Reistad <first(a)last.name> writes:

> You cannot identify the session before you have done a bit of
> identification on the packet, and by then there is a thread on
> a processor in the cluster that has read the packet and is
> acting on it.

I would have expected a modern multi-queue NIC to be able to identify
the session. It can then place the packet in the right queue and
interrupt the right CPU. Is this not possible for UDP traffic?

/Benny

From: jacko on 15 Jul 2010 15:29

On Jul 15, 7:35 pm, George Neuner <gneun...(a)comcast.net> wrote:
> On Thu, 15 Jul 2010 09:09:52 -0700 (PDT), jacko <jackokr...(a)gmail.com>
> wrote:
>
>
>
>
>
> >On Jul 15, 4:48 pm, George Neuner <gneun...(a)comcast.net> wrote:
> >> On Thu, 15 Jul 2010 08:17:44 -0700 (PDT), jacko <jackokr...(a)gmail.com>
> >> wrote:
>
> >> >Is
>
> >> >http://groups.google.com/group/comp.lang.forth/browse_thread/thread/4....
>
> >> >important for solving the GC problem? Ref FIFOO
>
> >> There isn't enough detail in that conversation to figure out what
> >> problem the structure is trying to solve. What exactly are you
> >> asking?
>
> >> George
>
> >From my own prospective the mark and sweep cache flushing garbage
> >collector problem. Also related to the circular pointer counted
> >reference memory leak problem, and the common tail evaluation of
> >circular structures solution.
>
> I don't see how the "FIFOO" described in the link would help cache
> pollution in GC (any GC). Since marking and copying are both
> read-only (no modify) operations, the best solution is to fetch the
> value(s) into the L1 cache bypassing the other levels, and for copying
> or compacting GC to write it out again bypassing the other levels. I
> don't know of a workable solution for sweeping which does modify
> structures.

Marking involves a WRITE.

> Likewise, I can't see how it fixes circular linking ... all FIFOO
> seems intended to do is provide a way to find the roots of structures
> that share common sub-structure. I didn't see any provision to
> prevent circular linking. It seems to me to be a heavy-weight
> solution to the problem - in the absence of tracing or copying GC,

Tail self linking becomes 'conceptually' impossible/not supported, and
circular linking can be found by common tail => i.e. will be one of
the heads, and so will share a common tail. So an object traversal of
all heads to check not common tail ... a reduction in computation for
self referencing structures, and an easy active head count = 0 test
for complete free, which will keep you in list elements for quite a
while before any complex gc code has to be run. FIFOO does not prevent
self references, but makes it easier to computationally detect if
necessary.

Mark sweep involves marking the whole active list node set. Three
total traversals or two if a binary bool marker is used.

Allocate copy involves handles to every list node or a referer node
list for back poking, and involves full traversal of the active node
list and reallocation and copying of them.

> ISTM that using an object table to indirectly reference objects with
> multiple parents is a better solution ... delete/sweep of an object
> stops where it chases a pointer into the object table.

Apart from TABLE being associated with heap compaction/fragmentation
issues, could you elaborate. To me it seems like any node which has
more than one referrer, referers to it through a handle in the table.
What of just one referrer and the extra bool? How is in table
detected? How are circular references handled or avoided? (by a handle
of course, but...) How is self reference handled or avoided? If I
understand you it's about kind of a join node inserted transparently
agregated in the table with all the other join nodes, and including a
referer count and a node class indication. So that forces some kind of
count and class per node rather than per structure. So more than 2
machine words per node?

Cheers Jacko

First | Prev | Next | Last
Pages: 1 2 3 4 5 6 7
Prev: 2nd call - Applied Computing 2010: until 26 July 2010
Next: Opcode Parsing & Invalid Opcodes