Arithmetic Coder Probability Model $100 [Cryptography]

Prev: Scalable Key Cryptography - Make_Moduli_Program_Mark_0
Next: recover the password from a .PFX file

From: LawCounsels on 12 Aug 2010 04:41

do you already knows Arithmetic Codings algorithm .... if so , has a
straight forward modification of Aritmetic coding job for you to
complete (prefers in C# source codes)... US$100 by Paypal/ Western
Union on delivery

This presents a basic arithmetic coding implementation, if you have
never implemented an arithmetic coder, this is the article which suits
your needs, ...
www.arturocampos.com/ac_arithmetic.html - Cached - Similar

Our modification to Arithmetic Coding algorithm belows is very
simple : the probability of certain unique prefix length ('0' or '10'
or '110' or '111...10' of bits_length P -1 or '111...11' of
bits_length P -1) at each remaning as yet to be encoded bits_length
position of R , depends on the progressive smaller tabulated
Table's P value

.... so the probability Table shown in algorithm below , is now made to
vary depending on present bits position R

Arithmetic has 'internal' 0 costs model of binomial power series
distributions of unique prefix lengths '0' '10' '110' '111...10' &
'constraint' unique prefix lengths at bits_length R , since this
'static' Model needs not be transmitted :

when subsequent stage is at bits_length R , from look-up Table's P
value Arithmetic Coder knows the max unique prefix length will be = P
# of bits ( eg '1110' consists of 4 bits , '0' consists of 1 bit ) ,
thus at this stage with bits_length of R the probabilities of
occurrences of '0' or '10' or '111...10' (of max possible.... P # of
bits long) is known to continue to be binomial power series EXCEPT
that look-up Table's P value determines the max possible unique prefix
length here .... so the present 'interval' will now be divided
according to present probabilities of these P # of unique prefix
lengths

check out
http://groups.google.com/group/comp.compression/browse_thread/thread/0d5f1d2a447adffd/d98a76f4b5349a86?hl=en&lnk=raot#d98a76f4b5349a86
for details

are you able deliver this :

1. What kind of files do you plan to compress text files, pictures,
movies ...?

only for file of few thousand bits to tens of thousand bits ,
consisting only of Unique Prefixes '0' or '10' or '110' or ...
'111...10' (max bits length = P# of bits long ... '0' has bits_length
1 , '1110' has bits_length 4)

max P value = log(base2)R where is the the current bits position
length ... eg 8 bits long bits_string '110 0 10 10' initially at
bit
position R of 8 can have valid selections '0' or '10' or '110' only
(P
= log(base2)8 = 3) , after the 1st selection of '110' there remains
only 5 bits so next only valid selection next at R = 5 bits length
is '0' or '10' ( P=log(base2)5 = 2) .... so forth

NOTE : the distributions of unique prefixes lengths(symbols) is
binomial power series (ie '0' occurs twice as frequent as '10 &
'10' occurs twice as frequent as '110' .... so forth , AC able use
this as Model) , but with our further constraint here that max
possible unique prefix length when at remaining R bits_length =
log(base2)R then AC can fine-tune this , at remaininmg bits_length R
it can select only unique prefix of lengths =< log(base2)R & these
'restricted' valid possible selections(symbols) together added made
up
100% of symbols probabilities [ even though earlier larger R
remaining bits_length have a much larger possible unique prefix
lengths possible ] .... this is 'source' of further encoding savings
over 'normal' AC Model which unnecessary allows the earlier much
larger unique prefix lengths to occur anywhere else (even at R=4
which
admits of only '0' or '10' possible valid)

you may generate your own such test input 'constraint' unique prefix
lengths ,
seeded with binomial power series & constraint

2. Do you want only the compressor part or the decompressing tool
also ?

both

> are you able deliver this :

its straight forward , 1st sets up Arithmetic Coder's symbols
probabilities Model to provide usual binomial power series
distributions of unique prefix lengths ('0' twice as frequent as '10.
'10' twice as frequent as '110' ... so forth) , then simply adjust
Model to provide the same usual binomial power series symbols
probabilities BUT with max unique prefix length(symbol length) when
remaining bits_length is R bits long now limited tobe max
log(base2)R bits long

==========================================================================

Arithmetic coding
The idea behind arithmetic coding is to have a probability line, 0-1,
and assign to every symbol a range in this line based on its
probability, the higher the probability, the higher range which
assigns to it. Once we have defined the ranges and the probability
line, start to encode symbols, every symbol defines where the output
floating point number lands. Let's say we have:
Symbol Probability Range
a 2 [0.0 , 0.5)
b 1 [0.5 , 0.75)
c 1 [0.7.5 , 1.0)

Note that the "[" means that the number is also included, so all the
numbers from 0 to 5 belong to "a" but 5. And then we start to code the
symbols and compute our output number. The algorithm to compute the
output number is:

Low = 0
High = 1 Loop. For all the symbols.
Range = high - low
High = low + range * high_range of the symbol being coded
Low = low + range * low_range of the symbol being coded
Where:
Range, keeps track of where the next range should be.
High and low, specify the output number.
And now let's see an example: Symbol Range Low value High value
0 1
b 1 0.5 0.75
a 0.25 0.5 0.625
c 0.125 0.59375 0.625
a 0.03125 0.59375 0.609375
Symbol Probability Range
a 2 [0.0 , 0.5)
b 1 [0.5 , 0.75)
c 1 [0.75 , 1.0)

The output number will be 0.59375. The way of decoding is first to see
where the number lands, output the corresponding symbol, and then
extract the range of this symbol from the floating point number. The
algorithm for extracting the ranges is: Loop. For all the symbols.
Range = high_range of the symbol - low_range of the symbol
Number = number - low_range of the symbol
Number = number / range
And this is how decoding is performed: Symbol Range Number
b 0.25 0.59375
a 0.5 0.375
c 0.25 0.75
a 0.5 0
Symbol Probability Range
a 2 [0.0 , 0.5)
b 1 [0.5 , 0.75)
c 1 [0.75 , 1.0)

You may reserve a little range for an EoF symbol, but in the case of
an entropy coder you'll not need it (the main compressor will know
when to stop), with and stand-alone codec you can pass to the
decompressor the length of the file, so it knows when to stop. (I
never liked having an special EoF ;-)

|
Pages: 1
Prev: Scalable Key Cryptography - Make_Moduli_Program_Mark_0
Next: recover the password from a .PFX file