CAPS Universe documentation  1.0.4
All you need to know to be successful
Functions
samsung-spl2-mono-algo-0x11.c File Reference

A Samsung printer specific compression routine. More...

Functions

static unsigned drv_spl2_checksum_calc (size_t sz, const uint8_t src[sz])
 
uint8_t drv_spl2_band_rotate (struct spl2_band_buffer *info)
 
static int sort_offsets (const void *n1, const void *n2)
 
static size_t offset_dictionary_limit (size_t element_cnt, struct dictionary_collection dc[element_cnt])
 
static int sort_appearances (const void *n1, const void *n2)
 
static void offset_dictionary_calc (struct spl2_band_buffer *info, size_t cnt, const uint8_t data[cnt])
 
static void compressed_entry_write (uint8_t buffer[2], size_t cnt, size_t offset_idx)
 
static size_t pattern_find_longest (struct spl2_band_buffer *info, size_t cur_idx, size_t remain_length, size_t *offset_idx)
 
int drv_spl2_band_compress (struct spl2_band_buffer *info)
 

Detailed Description

Printer's data is encoded with some kind of dictionary. This dictionary contains the most used pattern found in the first 2 kiB of raw printing data. Since the printer prints in bands of 128 lines, this defines one data set to encode for the printer.

Don't know exactly why, but this algorithm is called 0x11.

Todo:

Count zero bits in all bands of a paper to calculate the print coverage

Check https://en.wikipedia.org/wiki/Burrows%E2%80%93Wheeler_transform for a better algorithm.

Function Documentation

◆ drv_spl2_checksum_calc()

static unsigned drv_spl2_checksum_calc ( size_t  sz,
const uint8_t  src[sz] 
)
static

Calculate a simple checksum over the given count of bytes

Parameters
[in]szByte count in src
[in]srcBuffer to calculate the sum from
Returns
Simple checksum

According to the documentation it is a simple sum over all bytes including overrun in a single 32 bit number.

◆ drv_spl2_band_rotate()

uint8_t drv_spl2_band_rotate ( struct spl2_band_buffer info)

Rotate the input data into the band buffer for further processing.

Parameters
[in]infoAll we must know about the raw data and the resulting band
Returns
0xff if the whole band is empty, any value else

We got our data line by line. The band data must be in collumn by collumn. So, rotate the bytes (not the pixels!) for the band first.

This copies the data from the (line based) input buffer to the (collumn based) band buffer.

Precondition
band size must be large enough to hold all data from the input buffer e.g. at least 128 * line size

◆ sort_offsets()

static int sort_offsets ( const void *  n1,
const void *  n2 
)
static

Compare function#2 for the qsort() call

Parameters
[in]n1Entry to compare with n2
[in]n2Entry to compare with n1
Return values
negativeif n2 < n1
zeroif n1 == n2
positiveif n2 > n1

Return values are selected to let qsort() move entries with lower offset to the beginning of the dictionary.

◆ offset_dictionary_limit()

static size_t offset_dictionary_limit ( size_t  element_cnt,
struct dictionary_collection  dc[element_cnt] 
)
static

Limit the entry count in the dictionary

Parameters
[in]element_cntThe element count in the dictionary
[in,out]dcThe dictionary
Returns
Valid element count in this dictionay

The element count of the compressed header's dictionary is limited to 64 entries. This routine limits the element count to these 64 entries or less than that, if dc has less than 64 valid entries. An entry is valid if its dictionary_collection::appearance member isn't 0. Since the dictionary is already sorted, the first member with dictionary_collection::appearance member equal to 0 defines the max. used entries.

While at it, all valid dictionary_collection::offset members get incremented by one to define the real negative offset.

In order to detect and compress repeating bytes, the dictionary_collection::offset '1' needs to be in the dictionary. This function ensures this offset is always present and moves it to the beginning of the dictionary to improve its detection.

Precondition
The dictionary must be already sorted, highest offset appearance at index '0'.

◆ sort_appearances()

static int sort_appearances ( const void *  n1,
const void *  n2 
)
static

Compare function for the qsort() call

Parameters
[in]n1Entry to compare with n2
[in]n2Entry to compare with n1
Return values
negativeif n2 < n1
zeroif n1 == n2
positiveif n2 > n1

Return values are selected to let qsort() move entries with larger appearance to the beginning of the dictionary.

◆ offset_dictionary_calc()

static void offset_dictionary_calc ( struct spl2_band_buffer info,
size_t  cnt,
const uint8_t  data[cnt] 
)
static

Calculate some kind of dictionary for one band of printer data

Parameters
[in,out]infoInformation about the current band
[in]cntByte count in data
[in]dataThe data buffer used for the calculation

This routine searches in the first COMPRESS_SAMPLE_SIZE bytes of data for repeating characters. It starts with the character at offset (COMPRESS_SAMPLE_SIZE - 1) and searches for matches of this character backwards. Then it starts again at (COMPRESS_SAMPLE_SIZE - 2), until all characters are searched in such way. In the dictionary for each match it increments the appearance counter of the corresponding offset where the match happens. The dictionary then gets sorted, the offset with the most appearances first. At the end it gets limited to the first 64 entries (or less, if less valid entries are available).

Note
Measurements show, it doesn't improve the compression, if more than COMPRESS_SAMPLE_SIZE bytes are probed. With this dictionary it was possible to reduce a band data of size 79376 (DIN A4 at 600 DPI) down to 915 bytes.

Experiences show: it makes no sense to check every byte value again and again. So one improvement is to check every occuring byte value only once.

Precondition
cnt must be larger than COMPRESS_SAMPLE_SIZE

◆ compressed_entry_write()

static void compressed_entry_write ( uint8_t  buffer[2],
size_t  cnt,
size_t  offset_idx 
)
static

Append a two byte repeating entry into the output buffer

Parameters
[in]bufferWhere to store the entry
[in]cntDuplicate byte count
[in]offset_idxThe index into the header's offset table dictionary

A repeating entry just defines the amount of bytes to duplicate/repeat and an offset to copy this byte/these bytes from. The offset comes from the header's offset table. The offset defines the backward position into the area with already reconstructed bytes to copy the duplicated/repeated bytes from.

Encoding:

  • r(epeat counter): r8…r0, e.g. values of 0…511
  • i(ndex): i5…i0, e.g. values of 0…63
Bit 7  6  5  4  3  2  1  0   7  6  5  4  3  2  1  0
    1 r6 r5 r4 r3 r2 r1 r0  r8 r7 i5 i4 i3 i2 i1 i0

i(ndex) points into one entry in the header's dictionary_collection

Precondition
cnt must be in range 3...514
offset_idx must in range 0...63
buffer must be at least of size 2 bytes

◆ pattern_find_longest()

static size_t pattern_find_longest ( struct spl2_band_buffer info,
size_t  cur_idx,
size_t  remain_length,
size_t *  offset_idx 
)
static

Find the longest matching pattern in the band data based on the offsets in the dictionary

Parameters
[in,out]infoInformation about the current band
[in]cur_idxCurrent data position to encode in info->band
[in]remain_lengthRemaining bytes after cur_idx to compare
[out]offset_idxDictionary index of the best (e.g. longest) matching pattern
Return values
Below3, no useable pattern found
Above3, matching and useable pattern found (offset_idx is then valid)
Precondition
remain_length must be greater than MIN_COMPRESSED_BYTES to make sense to call this function
spl2_band_buffer::dictionary_size must be greater than 0 to make sense to call this function

◆ drv_spl2_band_compress()

int drv_spl2_band_compress ( struct spl2_band_buffer info)

Compress/encode one band of printer data

Parameters
[in,out]infoInformation about the current band
Returns
0 on success (currently always)