Prev: Hypergeometric function
Next: proof of Goldbach without appeal to Galois Algebra, but some new math appears #5.24 Correcting Math
From: LawCounsels on 12 Aug 2010 04:45 do you already knows Arithmetic Codings algorithm .... if so , has a straight forward modification of Aritmetic coding job for you to complete (prefers in C# source codes)... US$100 by Paypal/ Western Union on delivery This presents a basic arithmetic coding implementation, if you have never implemented an arithmetic coder, this is the article which suits your needs, ... www.arturocampos.com/ac_arithmetic.html - Cached - Similar Our modification to Arithmetic Coding algorithm belows is very simple : the probability of certain unique prefix length ('0' or '10' or '110' or '111...10' of bits_length P -1 or '111...11' of bits_length P -1) at each remaning as yet to be encoded bits_length position of R , depends on the progressive smaller tabulated Table's P value .... so the probability Table shown in algorithm below , is now made to vary depending on present bits position R Arithmetic has 'internal' 0 costs model of binomial power series distributions of unique prefix lengths '0' '10' '110' '111...10' & 'constraint' unique prefix lengths at bits_length R , since this 'static' Model needs not be transmitted : when subsequent stage is at bits_length R , from look-up Table's P value Arithmetic Coder knows the max unique prefix length will be = P # of bits ( eg '1110' consists of 4 bits , '0' consists of 1 bit ) , thus at this stage with bits_length of R the probabilities of occurrences of '0' or '10' or '111...10' (of max possible.... P # of bits long) is known to continue to be binomial power series EXCEPT that look-up Table's P value determines the max possible unique prefix length here .... so the present 'interval' will now be divided according to present probabilities of these P # of unique prefix lengths check out http://groups.google.com/group/comp.compression/browse_thread/thread/0d5f1d2a447adffd/d98a76f4b5349a86?hl=en&lnk=raot#d98a76f4b5349a86 for details are you able deliver this : 1. What kind of files do you plan to compress text files, pictures, movies ...? only for file of few thousand bits to tens of thousand bits , consisting only of Unique Prefixes '0' or '10' or '110' or ... '111...10' (max bits length = P# of bits long ... '0' has bits_length 1 , '1110' has bits_length 4) max P value = log(base2)R where is the the current bits position length ... eg 8 bits long bits_string '110 0 10 10' initially at bit position R of 8 can have valid selections '0' or '10' or '110' only (P = log(base2)8 = 3) , after the 1st selection of '110' there remains only 5 bits so next only valid selection next at R = 5 bits length is '0' or '10' ( P=log(base2)5 = 2) .... so forth NOTE : the distributions of unique prefixes lengths(symbols) is binomial power series (ie '0' occurs twice as frequent as '10 & '10' occurs twice as frequent as '110' .... so forth , AC able use this as Model) , but with our further constraint here that max possible unique prefix length when at remaining R bits_length = log(base2)R then AC can fine-tune this , at remaininmg bits_length R it can select only unique prefix of lengths =< log(base2)R & these 'restricted' valid possible selections(symbols) together added made up 100% of symbols probabilities [ even though earlier larger R remaining bits_length have a much larger possible unique prefix lengths possible ] .... this is 'source' of further encoding savings over 'normal' AC Model which unnecessary allows the earlier much larger unique prefix lengths to occur anywhere else (even at R=4 which admits of only '0' or '10' possible valid) you may generate your own such test input 'constraint' unique prefix lengths , seeded with binomial power series & constraint 2. Do you want only the compressor part or the decompressing tool also ? both > are you able deliver this : its straight forward , 1st sets up Arithmetic Coder's symbols probabilities Model to provide usual binomial power series distributions of unique prefix lengths ('0' twice as frequent as '10. '10' twice as frequent as '110' ... so forth) , then simply adjust Model to provide the same usual binomial power series symbols probabilities BUT with max unique prefix length(symbol length) when remaining bits_length is R bits long now limited tobe max log(base2)R bits long ========================================================================== Arithmetic coding The idea behind arithmetic coding is to have a probability line, 0-1, and assign to every symbol a range in this line based on its probability, the higher the probability, the higher range which assigns to it. Once we have defined the ranges and the probability line, start to encode symbols, every symbol defines where the output floating point number lands. Let's say we have: Symbol Probability Range a 2 [0.0 , 0.5) b 1 [0.5 , 0.75) c 1 [0.7.5 , 1.0) Note that the "[" means that the number is also included, so all the numbers from 0 to 5 belong to "a" but 5. And then we start to code the symbols and compute our output number. The algorithm to compute the output number is: Low = 0 High = 1 Loop. For all the symbols. Range = high - low High = low + range * high_range of the symbol being coded Low = low + range * low_range of the symbol being coded Where: Range, keeps track of where the next range should be. High and low, specify the output number. And now let's see an example: Symbol Range Low value High value 0 1 b 1 0.5 0.75 a 0.25 0.5 0.625 c 0.125 0.59375 0.625 a 0.03125 0.59375 0.609375 Symbol Probability Range a 2 [0.0 , 0.5) b 1 [0.5 , 0.75) c 1 [0.75 , 1.0) The output number will be 0.59375. The way of decoding is first to see where the number lands, output the corresponding symbol, and then extract the range of this symbol from the floating point number. The algorithm for extracting the ranges is: Loop. For all the symbols. Range = high_range of the symbol - low_range of the symbol Number = number - low_range of the symbol Number = number / range And this is how decoding is performed: Symbol Range Number b 0.25 0.59375 a 0.5 0.375 c 0.25 0.75 a 0.5 0 Symbol Probability Range a 2 [0.0 , 0.5) b 1 [0.5 , 0.75) c 1 [0.75 , 1.0) You may reserve a little range for an EoF symbol, but in the case of an entropy coder you'll not need it (the main compressor will know when to stop), with and stand-alone codec you can pass to the decompressor the length of the file, so it knows when to stop. (I never liked having an special EoF ;-) |