| |
| JPEG SYSTEM ARCHITECTURE 1-DEC-92 |
| |
| |
| This file provides an overview of the "architecture" of the portable JPEG |
| software; that is, the functions of the various modules in the system and the |
| interfaces between modules. For more precise details about any data structure |
| or calling convention, see the header files. |
| |
| Important note: when I say "module" I don't mean "a C function", which is what |
| some people seem to think the term means. A separate C source file is closer |
| to the mark. Also, it is frequently the case that several different modules |
| present a common interface to callers; the term "object" or "method" refers to |
| this common interface (see "Poor man's object-oriented programming", below). |
| |
| JPEG-specific terminology follows the JPEG standard: |
| A "component" means a color channel, e.g., Red or Luminance. |
| A "sample" is a pixel component value (i.e., one number in the image data). |
| A "coefficient" is a frequency coefficient (a DCT transform output number). |
| The term "block" refers to an 8x8 group of samples or coefficients. |
| "MCU" (minimum coded unit) is the same as "MDU" of the R8 draft; i.e., an |
| interleaved set of blocks of size determined by the sampling factors, |
| or a single block in a noninterleaved scan. |
| |
| |
| *** System requirements *** |
| |
| We must support compression and decompression of both Huffman and |
| arithmetic-coded JPEG files. Any set of compression parameters allowed by the |
| JPEG spec should be readable for decompression. (We can be more restrictive |
| about what formats we can generate.) (Note: for legal reasons no arithmetic |
| coding implementation is currently included in the publicly available sources. |
| However, the architecture still supports it.) |
| |
| We need to be able to handle both raw JPEG files (more specifically, the JFIF |
| format) and JPEG-in-TIFF (C-cubed's format, and perhaps Kodak's). Even if we |
| don't implement TIFF ourselves, other people will want to use our code for |
| that. This means that generation and scanning of the file header has to be |
| separated out. |
| |
| Perhaps we should be prepared to support the JPEG lossless mode (also referred |
| to in the spec as spatial DPCM coding). A lot of people seem to believe they |
| need this... whether they really do is debatable, but the customer is always |
| right. On the other hand, there will not be much sharable code between the |
| lossless and lossy modes! At best, a lossless program could be derived from |
| parts of the lossy version. For now we will only worry about the lossy mode. |
| |
| I see no real value in supporting the JPEG progressive modes (note that |
| spectral selection and successive approximation are two different progressive |
| modes). These are only of interest when painting the decompressed image in |
| real-time, which nobody is going to do with a pure software implementation. |
| |
| There is some value in supporting the hierarchical mode, which allows for |
| successive frames of higher resolution. This could be of use for including |
| "thumbnail" representations. However, this appears to add a lot more |
| complexity than it is worth. |
| |
| A variety of uncompressed image file formats and user interfaces must be |
| supported. These aspects therefore have to be kept separate from the rest of |
| the system. A particularly important issue is whether color quantization of |
| the output is needed (i.e., whether a colormap is used). We should be able to |
| support both adaptive quantization (which requires two or more passes over the |
| image) and nonadaptive (quantization to a prespecified colormap, which can be |
| done in one pass). |
| |
| Memory usage is an important concern, since we will port this code to 80x86 |
| and other limited-memory machines. For large intermediate structures, we |
| should be able to use either virtual memory or temporary files. |
| |
| It should be possible to build programs that handle compression only, |
| decompression only, or both, without much duplicate or unused code in any |
| version. (In particular, a decompression-only version should have no extra |
| baggage.) |
| |
| |
| *** Compression overview *** |
| |
| The *logical* steps needed in (non-lossless) JPEG compression are: |
| |
| 1. Conversion from incoming image format to a standardized internal form |
| (either RGB or grayscale). |
| |
| 2. Color space conversion (e.g., RGB to YCbCr). This is a null step for |
| grayscale (unless we support mapping color inputs to grayscale, which |
| would most easily be done here). Gamma adjustment may also be needed here. |
| |
| 3. Downsampling (reduction of number of samples in some color components). |
| This step operates independently on each color component. |
| |
| 4. MCU extraction (creation of a single sequence of 8x8 sample blocks). |
| This step and the following ones are performed once for each scan |
| in the output JPEG file, i.e., once if making an interleaved file and more |
| than once for a noninterleaved file. |
| Note: both this step and the previous one must deal with edge conditions |
| for pictures that aren't a multiple of the MCU dimensions. Alternately, |
| we could expand the picture to a multiple of an MCU before doing these |
| two steps. (The latter seems better and has been adopted below.) |
| |
| 5. DCT transformation of each 8x8 block. |
| |
| 6. Quantization scaling and zigzag reordering of the elements in each 8x8 |
| block. |
| |
| 7. Huffman or arithmetic encoding of the transformed block sequence. |
| |
| 8. Output of the JPEG file with whatever headers/markers are wanted. |
| |
| Of course, the actual implementation will combine some of these logical steps |
| for efficiency. The trick is to keep these logical functions as separate as |
| possible without losing too much performance. |
| |
| In addition to these logical pipeline steps, we need various modules that |
| aren't part of the data pipeline. These are: |
| |
| A. Overall control (sequencing of other steps & management of data passing). |
| |
| B. User interface; this will determine the input and output files, and supply |
| values for some compression parameters. Note that this module is highly |
| platform-dependent. |
| |
| C. Compression parameter selection: some parameters should be chosen |
| automatically rather than requiring the user to find a good value. |
| The prototype only does this for the back-end (Huffman or arithmetic) |
| parameters, but further in the future, more might be done. A |
| straightforward approach to selection is to try several values; this |
| requires being able to repeatedly apply some portion of the pipeline and |
| inspect the results (without actually outputting them). Probably only |
| entropy encoding parameters can reasonably be done this way; optimizing |
| earlier steps would require too much data to be reprocessed (not to mention |
| the problem of interactions between parameters for different steps). |
| What other facilities do we need to support automatic parameter selection? |
| |
| D. A memory management module to deal with small-memory machines. This must |
| create the illusion of virtual memory for certain large data structures |
| (e.g., the downsampled image or the transformed coefficients). |
| The interface to this must be defined to minimize the overhead incurred, |
| especially on virtual-memory machines where the module won't do much. |
| |
| In many cases we can arrange things so that a data stream is produced in |
| segments by one module and consumed by another without the need to hold it all |
| in (virtual) memory. This is obviously not possible for any data that must be |
| scanned more than once, so it won't work everywhere. |
| |
| The major variable at this level of detail is whether the JPEG file is to be |
| interleaved or not; that affects the order of processing so fundamentally that |
| the central control module must know about it. Some of the other modules may |
| need to know it too. It would simplify life if we didn't need to support |
| noninterleaved images, but that is not reasonable. |
| |
| Many of these steps operate independently on each color component; the |
| knowledge of how many components there are, and how they are interleaved, |
| ought to be confined to the central control module. (Color space conversion |
| and MCU extraction probably have to know it too.) |
| |
| |
| *** Decompression overview *** |
| |
| Decompression is roughly the inverse process from compression, but there are |
| some additional steps needed to produce a good output image. |
| |
| The *logical* steps needed in (non-lossless) JPEG decompression are: |
| |
| 1. Scanning of the JPEG file, decoding of headers/markers etc. |
| |
| 2. Huffman or arithmetic decoding of the coefficient sequence. |
| |
| 3. Quantization descaling and zigzag reordering of the elements in each 8x8 |
| block. |
| |
| 4. MCU disassembly (conversion of a possibly interleaved sequence of 8x8 |
| blocks back to separate components in pixel map order). |
| |
| 5. (Optional) Cross-block smoothing per JPEG section K.8 or a similar |
| algorithm. (Steps 5-8 operate independently on each component.) |
| |
| 6. Inverse DCT transformation of each 8x8 block. |
| |
| 7. Upsampling. At this point a pixel image of the original dimensions |
| has been recreated. |
| |
| 8. Post-upsampling smoothing. This can be combined with upsampling, |
| by using a convolution-like calculation to generate each output pixel |
| directly from one or more input pixels. |
| |
| 9. Cropping to the original pixel dimensions (throwing away duplicated |
| pixels at the edges). It is most convenient to do this now, as the |
| preceding steps are simplified by not having to worry about odd picture |
| sizes. |
| |
| 10. Color space reconversion (e.g., YCbCr to RGB). This is a null step for |
| grayscale. (Note that mapping a color JPEG to grayscale output is most |
| easily done in this step.) Gamma adjustment may also be needed here. |
| |
| 11. Color quantization (only if a colormapped output format is requested). |
| NOTE: it is probably preferable to perform quantization in the internal |
| (JPEG) colorspace rather than the output colorspace. Doing it that way, |
| color conversion need only be applied to the colormap entries, not to |
| every pixel; and quantization gets to operate in a non-gamma-corrected |
| space. But the internal space may not be suitable for some algorithms. |
| The system design is such that only the color quantizer module knows |
| whether color conversion happens before or after quantization. |
| |
| 12. Writing of the desired image format. |
| |
| As before, some of these will be combined into single steps. When dealing |
| with a noninterleaved JPEG file, steps 2-9 will be performed once for each |
| scan; the resulting data will need to be buffered up so that steps 10-12 can |
| process all the color components together. |
| |
| The same auxiliary modules are needed as before, except for compression |
| parameter selection. Note that rerunning a pipeline stage should never be |
| needed during decompression. This may allow a simpler control module. The |
| user interface might also be simpler since it need not supply any compression |
| parameters. |
| |
| As before, not all of these steps require the whole image to be stored. |
| Actually, two-pass color quantization is the only step that logically requires |
| this; everything else could be done a few raster lines at a time (at least for |
| interleaved images). We might want to make color quantization be a separate |
| program because of this fact. |
| |
| Again, many of the steps should be able to work on one color component in |
| ignorance of the other components. |
| |
| |
| *** Implications of noninterleaved formats *** |
| |
| Much of the work can be done in a single pass if an interleaved JPEG file |
| format is used. With a noninterleaved JPEG file, separating or recombining |
| the components will force use of virtual memory (on a small-memory machine, |
| we probably would want one temp file per color component). |
| |
| If any of the image formats we read or write are noninterleaved, the opposite |
| condition might apply: processing a noninterleaved JPEG file would be more |
| efficient. Offhand, though, I can't think of any popular image formats that |
| work that way; besides the win would only come if the same color space were |
| used in JPEG and non-JPEG files. It's not worth the complexity to make the |
| system design accommodate that case efficiently. |
| |
| An argument against interleaving is that it makes the decompressor need more |
| memory for cross-block smoothing (since the minimum processable chunk of the |
| image gets bigger). With images more than 1000 pixels across, 80x86 machines |
| are likely to have difficulty in handling this feature. |
| |
| Another argument against interleaving is that the noninterleaved format allows |
| a wider range of sampling factors, since the limit of ten blocks per MCU no |
| longer applies. We could get around this by blithely ignoring the spec's |
| limit of ten blocks, but that seems like a bad idea (especially since it makes |
| the above problem worse). |
| |
| The upshot is that we need to support both interleaved and noninterleaved JPEG |
| formats, since for any given machine and picture size one may be much more |
| efficient than the other. However, the non-JPEG format we convert to or from |
| will be assumed to be an interleaved format (i.e., it produces or stores all |
| the components of a pixel together). |
| |
| I do not think it is necessary for the compressor to be able to output |
| partially-interleaved formats (multiple scans, some of which interleave a |
| subset of the components). However, the decompressor must be able to read |
| such files to conform to the spec. |
| |
| |
| *** Data formats *** |
| |
| Pipeline steps that work on pixel sample values will use the following data |
| structure: |
| |
| typedef something JSAMPLE; a pixel component value, 0..MAXJSAMPLE |
| typedef JSAMPLE *JSAMPROW; ptr to a row of samples |
| typedef JSAMPROW *JSAMPARRAY; ptr to a list of rows |
| typedef JSAMPARRAY *JSAMPIMAGE; ptr to a list of color-component arrays |
| |
| The basic element type JSAMPLE will be one of unsigned char, (signed) char, or |
| unsigned short. Unsigned short will be used if samples wider than 8 bits are |
| to be supported (this is a compile-time option). Otherwise, unsigned char is |
| used if possible. If the compiler only supports signed chars, then it is |
| necessary to mask off the value when reading. Thus, all reads of sample |
| values should be coded as "GETJSAMPLE(value)", where the macro will be defined |
| as "((value)&0xFF)" on signed-char machines and "(value)" elsewhere. |
| |
| With these conventions, JSAMPLE values can be assumed to be >= 0. This should |
| simplify correct rounding during downsampling, etc. The JPEG draft's |
| specification that sample values run from -128..127 will be accommodated by |
| subtracting 128 just as the sample value is copied into the source array for |
| the DCT step (this will be an array of signed shorts or longs). Similarly, |
| during decompression the output of the IDCT step will be immediately shifted |
| back to 0..255. (NB: different values are required when 12-bit samples are in |
| use. The code should be written in terms of MAXJSAMPLE and CENTERJSAMPLE, |
| which will be #defined as 255 and 128 respectively in an 8-bit implementation, |
| and as 4095 and 2048 in a 12-bit implementation.) |
| |
| On compilers that don't support "unsigned short", signed short can be used for |
| a 12-bit implementation. To support lossless coding (which allows up to |
| 16-bit data precision) masking with 0xFFFF in GETJSAMPLE might be necessary. |
| (But if "int" is 16 bits then using "unsigned int" is the best solution.) |
| |
| Notice that we use a pointer per row, rather than a two-dimensional JSAMPLE |
| array. This choice costs only a small amount of memory and has several |
| benefits: |
| |
| * Code using the data structure doesn't need to know the allocated width of |
| the rows. This will simplify edge expansion/compression, since we can work |
| in an array that's wider than the logical picture width. |
| |
| * The rows forming a component array may be allocated at different times |
| without extra copying. This will simplify working a few scanlines at a time, |
| especially in smoothing steps that need access to the previous and next rows. |
| |
| * Indexing doesn't require multiplication; this is a performance win on many |
| machines. |
| |
| Note that each color component is stored in a separate array; we don't use the |
| traditional structure in which the components of a pixel are stored together. |
| This simplifies coding of steps that work on each component independently, |
| because they don't need to know how many components there are. Furthermore, |
| we can read or write each component to a temp file independently, which is |
| helpful when dealing with noninterleaved JPEG files. |
| |
| A specific sample value will be accessed by code such as |
| GETJSAMPLE(image[colorcomponent][row][col]) |
| where col is measured from the image left edge, but row is measured from the |
| first sample row currently in memory. Either of the first two indexings can |
| be precomputed by copying the relevant pointer. |
| |
| |
| Pipeline steps that work on frequency-coefficient values will use the |
| following data structure: |
| |
| typedef short JCOEF; a 16-bit signed integer |
| typedef JCOEF JBLOCK[64]; an 8x8 block of coefficients |
| typedef JBLOCK *JBLOCKROW; ptr to one horizontal row of 8x8 blocks |
| typedef JBLOCKROW *JBLOCKARRAY; ptr to a list of such rows |
| typedef JBLOCKARRAY *JBLOCKIMAGE; ptr to a list of color component arrays |
| |
| The underlying type is always a 16-bit signed integer (this is "short" on all |
| machines of interest, but let's use the typedef name anyway). These are |
| grouped into 8x8 blocks (we should use #defines DCTSIZE and DCTSIZE2 rather |
| than "8" and "64"). The contents of a block may be either in "natural" or |
| zigzagged order, and may be true values or divided by the quantization |
| coefficients, depending on where the block is in the pipeline. |
| |
| Notice that the allocation unit is now a row of 8x8 blocks, corresponding to |
| eight rows of samples. Otherwise the structure is much the same as for |
| samples, and for the same reasons. |
| |
| On machines where malloc() can't handle a request bigger than 64Kb, this data |
| structure limits us to rows of less than 512 JBLOCKs, which would be a picture |
| width of 4000 pixels. This seems an acceptable restriction. |
| |
| |
| On 80x86 machines, the bottom-level pointer types (JSAMPROW and JBLOCKROW) |
| must be declared as "far" pointers, but the upper levels can be "near" |
| (implying that the pointer lists are allocated in the DS segment). |
| To simplify sharing code, we'll have a #define symbol FAR, which expands to |
| the "far" keyword when compiling on 80x86 machines and to nothing elsewhere. |
| |
| |
| The data arrays used as input and output of the DCT transform subroutine will |
| be declared using a separate typedef; they could be arrays of "short", "int" |
| or "long" independently of the above choices. This would depend on what is |
| needed to make the compiler generate correct and efficient multiply/add code |
| in the DCT inner loops. No significant speed or memory penalty will be paid |
| to have a different representation than is used in the main image storage |
| arrays, since some additional value-by-value processing is done at the time of |
| creation or extraction of the DCT data anyway (e.g., add/subtract 128). |
| |
| |
| *** Poor man's object-oriented programming *** |
| |
| It should be pretty clear by now that we have a lot of quasi-independent |
| steps, many of which have several possible behaviors. To avoid cluttering the |
| code with lots of switch statements, we'll use a simple form of object-style |
| programming to separate out the different possibilities. |
| |
| For example, Huffman and arithmetic coding will be implemented as two separate |
| modules that present the same external interface; at runtime, the calling code |
| will access the proper module indirectly through an "object". |
| |
| We can get the limited features we need while staying within portable C. The |
| basic tool is a function pointer. An "object" is just a struct containing one |
| or more function pointer fields, each of which corresponds to a method name in |
| real object-oriented languages. During initialization we fill in the function |
| pointers with references to whichever module we have determined we need to use |
| in this run. Then invocation of the module is done by indirecting through a |
| function pointer; on most architectures this is no more expensive (and |
| possibly cheaper) than a switch, which would be the only other way of making |
| the required run-time choice. The really significant benefit, of course, is |
| keeping the source code clean and well structured. |
| |
| For example, the interface for entropy decoding (Huffman or arithmetic |
| decoding) might look like this: |
| |
| struct function_ptr_struct { |
| ... |
| /* Entropy decoding methods */ |
| void (*prepare_for_scan) (); |
| void (*get_next_mcu) (); |
| ... |
| }; |
| |
| typedef struct function_ptr_struct * function_ptrs; |
| |
| The struct pointer is what will actually be passed around. A call site might |
| look like this: |
| |
| some_function (function_ptrs fptrs) |
| { |
| ... |
| (*fptrs->get_next_mcu) (...); |
| ... |
| } |
| |
| (It might be worth inventing some specialized macros to hide the rather ugly |
| syntax for method definition and call.) Note that the caller doesn't know how |
| many different get_next_mcu procedures there are, what their real names are, |
| nor how to choose which one to call. |
| |
| An important benefit of this scheme is that it is easy to provide multiple |
| versions of any method, each tuned to a particular case. While a lot of |
| precalculation might be done to select an optimal implementation of a method, |
| the cost per invocation is constant. For example, the MCU extraction step |
| might have a "generic" method, plus one or more "hardwired" methods for the |
| most popular sampling factors; the hardwired methods would be faster because |
| they'd use straight-line code instead of for-loops. The cost to determine |
| which method to use is paid only once, at startup, and the selection criteria |
| are hidden from the callers of the method. |
| |
| This plan differs a little bit from usual object-oriented structures, in that |
| only one instance of each object class will exist during execution. The |
| reason for having the class structure is that on different runs we may create |
| different instances (choose to execute different modules). |
| |
| To minimize the number of object pointers that have to be passed around, it |
| will be easiest to have just a few big structs containing all the method |
| pointers. We'll actually use two such structs, one for "system-dependent" |
| methods (memory allocation and error handling) and one for everything else. |
| |
| Because of this choice, it's best not to think of an "object" as a specific |
| data structure. Rather, an "object" is just a group of related methods. |
| There would typically be one or more C modules (source files) providing |
| concrete implementations of those methods. You can think of the term |
| "method" as denoting the common interface presented by some set of functions, |
| and "object" as denoting a group of common method interfaces, or the total |
| shared interface behavior of a group of modules. |
| |
| |
| *** Data chunk sizes *** |
| |
| To make the cost of this object-oriented style really minimal, we should make |
| sure that each method call does a fair amount of computation. To do that we |
| should pass large chunks of data around; for example, the colorspace |
| conversion method should process much more than one pixel per call. |
| |
| For many steps, the most natural unit of data seems to be an "MCU row". |
| This consists of one complete horizontal strip of the image, as high as an |
| MCU. In a noninterleaved scan, an MCU row is always eight samples high (when |
| looking at samples) or one 8x8 block high (when looking at coefficients). In |
| an interleaved scan, an MCU row consists of all the data for one horizontal |
| row of MCUs; this may be from one to four blocks high (eight to thirty-two |
| samples) depending on the sampling factors. The height and width of an MCU |
| row may be different in each component. (Note that the height and width of an |
| MCU row changes at the downsampling and upsampling steps. An unsubsampled |
| image has the same size in each component. The preceding statements apply to |
| the downsampled dimensions.) |
| |
| For example, consider a 1024-pixel-wide image using (2h:2v)(1h:1v)(1h:1v) |
| subsampling. In the noninterleaved case, an MCU row of Y would contain 8x1024 |
| samples or the same number of frequency coefficients, so it would occupy |
| 8K bytes (samples) or 16K bytes (coefficients). An MCU row of Cb or Cr would |
| contain 8x512 samples and occupy half as much space. In the interleaved case, |
| an MCU row would contain 16x1024 Y samples, 8x512 Cb and 8x512 Cr samples, so |
| a total of 24K (samples) or 48K (coefficients) would be needed. This is a |
| reasonable amount of data to expect to retain in memory at one time. (Bear in |
| mind that we'll usually need to have several MCU rows resident in memory at |
| once, at the inputs and outputs to various pipeline steps.) |
| |
| The worst case is probably (2h:4v)(1h:1v)(1h:1v) interleaving (this uses 10 |
| blocks per MCU, which is the maximum allowed by the spec). An MCU will then |
| contain 32 sample rows worth of Y, so it would occupy 40K or 80K bytes for a |
| 1024-pixel-wide image. The most memory-intensive step is probably cross-block |
| smoothing, for which we'd need 3 MCU rows of coefficients as input and another |
| one as output; that would be 320K of working storage. Anything much larger |
| would not fit in an 80x86 machine. (To decompress wider pictures on an 80x86, |
| we'll have to skip cross-block smoothing or else use temporary files.) |
| |
| This unit is thus a reasonable-sized chunk for passing through the pipeline. |
| Of course, its major advantage is that it is a natural chunk size for the MCU |
| assembly and disassembly steps to work with. |
| |
| For the entropy (Huffman or arithmetic) encoding/decoding steps, the most |
| convenient chunk is a single MCU: one 8x8 block if not interleaved, three to |
| ten such blocks if interleaved. The advantage of this is that when handling |
| interleaved data, the blocks have the same sequence of component membership on |
| each call. (For example, Y,Y,Y,Y,Cb,Cr when using (2h:2v)(1h:1v)(1h:1v) |
| subsampling.) The code needs to know component membership so that it can |
| apply the right set of compression coefficients to each block. A prebuilt |
| array describing this membership can be used during each call. This chunk |
| size also makes it easy to handle restart intervals: just count off one MCU |
| per call and reinitialize when the count reaches zero (restart intervals are |
| specified in numbers of MCU). |
| |
| For similar reasons, one MCU is also the best chunk size for the frequency |
| coefficient quantization and dequantization steps. |
| |
| For downsampling and upsampling, the best chunk size is to have each call |
| transform Vk sample rows from or to Vmax sample rows (Vk = this component's |
| vertical sampling factor, Vmax = largest vertical sampling factor). There are |
| eight such chunks in each MCU row. Using a whole MCU row as the chunk size |
| would reduce function call overhead a fraction, but would imply more buffering |
| to provide context for cross-pixel smoothing. |
| |
| |
| *** Compression object structure *** |
| |
| I propose the following set of objects for the compressor. Here an "object" |
| is the common interface for one or more modules having comparable functions. |
| |
| Most of these objects can be justified as information-hiding modules. |
| I've indicated what information is private to each object/module. |
| |
| Note that in all cases, the caller of a method is expected to have allocated |
| any storage needed for it to return its result. (Typically this storage can |
| be re-used in successive calls, so malloc'ing and free'ing once per call is |
| not reasonable.) Also, much of the context required (compression parameters, |
| image size, etc) will be passed around in large common data structures, which |
| aren't described here; see the header files. Notice that any object that |
| might need to allocate working storage receives an "init" and a "term" call; |
| "term" should be careful to free all allocated storage so that the JPEG system |
| can be used multiple times during a program run. (For the same reason, |
| depending on static initialization of variables is a no-no. The only |
| exception to the free-all-allocated-storage rule is that storage allocated for |
| the entire processing of an image need not be explicitly freed, since the |
| memory manager's free_all cleanup will free it.) |
| |
| 1. Input file conversion to standardized form. This provides these methods: |
| input_init: read the file header, report image size & component count. |
| get_input_row: read one pixel row, return it in our standard format. |
| input_term: finish up at the end. |
| In implementations that support multiple input formats, input_init could |
| set up an appropriate get_input_row method depending on the format it |
| finds. Note that in most applications, the selection and opening of the |
| input file will be under the control of the user interface module; and |
| indeed the user interface may have already read the input header, so that |
| all that input_init may have to do is return previously saved values. The |
| behind-the-scenes interaction between this object and the user interface is |
| not specified by this architecture. |
| (Hides format of input image and mechanism used to read it. This code is |
| likely to vary considerably from one implementation to another. Note that |
| the color space and number of color components of the source are not hidden; |
| but they are used only by the next object.) |
| |
| 2. Gamma and color space conversion. This provides three methods: |
| colorin_init: initialization. |
| get_sample_rows: read, convert, and return a specified number of pixel |
| rows (not more than remain in the picture). |
| colorin_term: finish up at the end. |
| The most efficient approach seems to be for this object to call |
| get_input_row directly, rather than being passed the input data; that way, |
| any intermediate storage required can be local to this object. |
| (get_sample_rows might tell get_input_row to read directly into its own |
| output area and then convert in place; or it may do something different. |
| For example, conversion in place wouldn't work if it is changing the number |
| of color components.) The output of this step is in the standardized |
| sample array format shown previously. |
| (Hides all knowledge of color space semantics and conversion. Remaining |
| modules only need to know the number of JPEG components.) |
| |
| 3. Edge expansion: needs only a single method. |
| edge_expand: Given an NxM sample array, expand to a desired size (a |
| multiple of the MCU dimensions) by duplicating the last |
| row or column. Repeat for each component. |
| Expansion will occur in place, so the caller must have pre-allocated enough |
| storage. (I'm assuming that it is easier and faster to do this expansion |
| than it is to worry about boundary conditions in the next two steps. |
| Notice that vertical expansion will occur only once, at the bottom of the |
| picture, so only horizontal expansion by a few pixels is speed-critical.) |
| (This doesn't really hide any information, so maybe it could be a simple |
| subroutine instead of a method. Depends on whether we want to be able to |
| use alternative, optimized methods.) |
| |
| 4. Downsampling: this will be applied to one component at a time. |
| downsample_init: initialize (precalculate convolution factors, for |
| example). This will be called once per scan. |
| downsample: Given a sample array, reduce it to a smaller number of |
| samples using specified sampling factors. |
| downsample_term: clean up at the end of a scan. |
| If the current component has vertical sampling factor Vk and the largest |
| sampling factor is Vmax, then the input is always Vmax sample rows (whose |
| width is a multiple of Hmax) and the output is always Vk sample rows. |
| Vmax additional rows above and below the nominal input rows are also passed |
| for use by partial-pixel-averaging sampling methods. (Is this necessary?) |
| At the top and bottom of the image, these extra rows are copies of the |
| first or last actual input row. |
| (This hides whether and how cross-pixel averaging occurs.) |
| |
| 5. MCU extraction (creation of a single sequence of 8x8 sample blocks). |
| extract_init: initialize as needed. This will be called once per scan. |
| extract_MCUs: convert a sample array to a sequence of MCUs. |
| extract_term: clean up at the end of a scan. |
| Given one or more MCU rows worth of image data, extract sample blocks in the |
| appropriate order; pass these off to subsequent steps one MCU at a time. |
| The input must be a multiple of the MCU dimensions. It will probably be |
| most convenient for the DCT transform, frequency quantization, and zigzag |
| reordering of each block to be done as simple subroutines of this step. |
| Once a transformed MCU has been completed, it'll be passed off to a |
| method call, which will be passed as a parameter to extract_MCUs. |
| That routine might either encode and output the MCU immediately, or buffer |
| it up for later output if we want to do global optimization of the entropy |
| encoding coefficients. Note: when outputting a noninterleaved file this |
| object will be called separately for each component. Direct output could |
| be done for the first component, but the others would have to be buffered. |
| (Again, an object mainly on the grounds that multiple instantiations might |
| be useful.) |
| |
| 6. DCT transformation of each 8x8 block. This probably doesn't have to be a |
| full-fledged method, but just a plain subroutine that will be called by MCU |
| extraction. One 8x8 block will be processed per call. |
| |
| 7. Quantization scaling and zigzag reordering of the elements in each 8x8 |
| block. (This can probably be a plain subroutine called once per block by |
| MCU extraction; hard to see a need for multiple instantiations here.) |
| |
| 8. Entropy encoding (Huffman or arithmetic). |
| entropy_encode_init: prepare for one scan. |
| entropy_encode: accepts an MCU's worth of quantized coefficients, |
| encodes and outputs them. |
| entropy_encode_term: finish up at end of a scan (dump any buffered |
| bytes, for example). |
| The data output by this module will be sent to the entropy_output method |
| provided by the pipeline controller. (It will probably be worth using |
| buffering to pass multiple bytes per call of the output method.) The |
| output method could be just write_jpeg_data, but might also be a dummy |
| routine that counts output bytes (for use during cut-and-try coefficient |
| optimization). |
| (This hides which entropy encoding method is in use.) |
| |
| 9. JPEG file header construction. This will provide these methods: |
| write_file_header: output the initial header. |
| write_scan_header: output scan header (called once per component |
| if noninterleaved mode). |
| write_jpeg_data: the actual data output method for the preceding step. |
| write_scan_trailer: finish up after one scan. |
| write_file_trailer: finish up at end of file. |
| Note that compressed data is passed to the write_jpeg_data method, in case |
| a simple fwrite isn't appropriate for some reason. |
| (This hides which variant JPEG file format is being written. Also, the |
| actual mechanism for writing the file is private to this object and the |
| user interface.) |
| |
| 10. Pipeline control. This object will provide the "main loop" that invokes |
| all the pipeline objects. Note that we will need several different main |
| loops depending on the situation (interleaved output or not, global |
| optimization of encoding parameters or not, etc). This object will do |
| most of the memory allocation, since it will provide the working buffers |
| that are the inputs and outputs of the pipeline steps. |
| (An object mostly to support multiple instantiations; however, overall |
| memory management and sequencing of operations are known only here.) |
| |
| 11. Overall control. This module will provide at least two routines: |
| jpeg_compress: the main entry point to the compressor. |
| per_scan_method_selection: called by pipeline controllers for |
| secondary method selection passes. |
| jpeg_compress is invoked from the user interface after the UI has selected |
| the input and output files and obtained values for all compression |
| parameters that aren't dynamically determined. jpeg_compress performs |
| basic initialization (e.g., calculating the size of MCUs), does the |
| "global" method selection pass, and finally calls the selected pipeline |
| control object. (Per-scan method selections will be invoked by the |
| pipeline controller.) |
| Note that jpeg_compress can't be a method since it is invoked prior to |
| method selection. |
| |
| 12. User interface; this is the architecture's term for "the rest of the |
| application program", i.e., that which invokes the JPEG compressor. In a |
| standalone JPEG compression program the UI need be little more than a C |
| main() routine and argument parsing code; but we can expect that the JPEG |
| compressor may be incorporated into complex graphics applications, wherein |
| the UI is much more complex. Much of the UI will need to be written |
| afresh for each non-Unix-like platform the compressor is ported to. |
| The UI is expected to supply input and output files and values for all |
| non-automatically-chosen compression parameters. (Hence defaults are |
| determined by the UI; we should provide helpful routines to fill in |
| the recommended defaults.) The UI must also supply error handling |
| routines and some mechanism for trace messages. |
| (This module hides the user interface provided --- command line, |
| interactive, etc. Except for error/message handling, the UI calls the |
| portable JPEG code, not the other way around.) |
| |
| 13. (Optional) Compression parameter selection control. |
| entropy_optimize: given an array of MCUs ready to be fed to entropy |
| encoding, find optimal encoding parameters. |
| The actual optimization algorithm ought to be separated out as an object, |
| even though a special pipeline control method will be needed. (The |
| pipeline controller only has to understand that the output of extract_MCUs |
| must be built up as a virtual array rather than fed directly to entropy |
| encoding and output. This pipeline behavior may also be useful for future |
| implementation of hierarchical modes, etc.) |
| To minimize the amount of control logic in the optimization module, the |
| pipeline control doesn't actually hand over big-array pointers, but rather |
| an "iterator": a function which knows how to scan the stored image. |
| (This hides the details of the parameter optimization algorithm.) |
| |
| The present design doesn't allow for multiple passes at earlier points |
| in the pipeline, but allowing that would only require providing some |
| new pipeline control methods; nothing else need change. |
| |
| 14. A memory management object. This will provide methods to allocate "small" |
| things and "big" things. Small things have to fit in memory and you get |
| back direct pointers (this could be handled by direct calls to malloc, but |
| it's cleaner not to assume malloc is the right routine). "Big" things |
| mean buffered images for multiple passes, noninterleaved output, etc. |
| In this case the memory management object will give you room for a few MCU |
| rows and you have to ask for access to the next few; dumping and reloading |
| in a temporary file will go on behind the scenes. (All big objects are |
| image arrays containing either samples or coefficients, and will be |
| scanned top-to-bottom some number of times, so we can apply this access |
| model easily.) On a platform with virtual memory, the memory manager can |
| treat small and big things alike: just malloc up enough virtual memory for |
| the whole image, and let the operating system worry about swapping the |
| image to disk. |
| |
| Most of the actual calls on the memory manager will be made from pipeline |
| control objects; changing any data item from "small" to "big" status would |
| require a new pipeline control object, since it will contain the logic to |
| ask for a new chunk of a big thing. Thus, one way in which pipeline |
| controllers will vary is in which structures they treat as big. |
| |
| The memory manager will need to be told roughly how much space is going to |
| be requested overall, so that it can figure out how big a buffer is safe |
| to allocate for a "big" object. (If it happens that you are dealing with |
| a small image, you'd like to decide to keep it all in memory!) The most |
| flexible way of doing this is to divide allocation of "big" objects into |
| two steps. First, there will be one or more "request" calls that indicate |
| the desired object sizes; then an "instantiate" call causes the memory |
| manager to actually construct the objects. The instantiation must occur |
| before the contents of any big object can be accessed. |
| |
| For 80x86 CPUs, we would like the code to be compilable under small or |
| medium model, meaning that pointers are 16 bits unless explicitly declared |
| FAR. Hence space allocated by the "small" allocator must fit into the |
| 64Kb default data segment, along with stack space and global/static data. |
| For normal JPEG operations we seem to need only about 32Kb of such space, |
| so we are within the target (and have a reasonable slop for the needs of |
| a surrounding application program). However, some color quantization |
| algorithms need 64Kb or more of all-in-memory space in order to create |
| color histograms. For this purpose, we will also support "medium" size |
| things. These are semantically the same as "small" things but are |
| referenced through FAR pointers. |
| |
| The following methods will be needed: |
| alloc_small: allocate an object of given size; use for any random |
| data that's not an image array. |
| free_small: release same. |
| alloc_medium: like alloc_small, but returns a FAR pointer. Use for |
| any object bigger than a couple kilobytes. |
| free_medium: release same. |
| alloc_small_sarray: construct an all-in-memory image sample array. |
| free_small_sarray: release same. |
| alloc_small_barray, |
| free_small_barray: ditto for block (coefficient) arrays. |
| request_big_sarray: request a virtual image sample array. The size |
| of the in-memory buffer will be determined by the |
| memory manager, but it will always be a multiple |
| of the passed-in MCU height. |
| request_big_barray: ditto for block (coefficient) arrays. |
| alloc_big_arrays: instantiate all the big arrays previously requested. |
| This call will also pass some info about future |
| memory demands, so that the memory manager can |
| figure out how much space to leave unallocated. |
| access_big_sarray: obtain access to a specified portion of a virtual |
| image sample array. |
| free_big_sarray: release a virtual sample array. |
| access_big_barray, |
| free_big_barray: ditto for block (coefficient) arrays. |
| free_all: release any remaining storage. This is called |
| before normal or error termination; the main reason |
| why it must exist is to ensure that any temporary |
| files will be deleted upon error termination. |
| |
| alloc_big_arrays will be called by the pipeline controller, which does |
| most of the memory allocation anyway. The only reason for having separate |
| request calls is to allow some of the other modules to get big arrays. |
| The pipeline controller is required to give an upper bound on total future |
| small-array requests, so that this space can be discounted. (A fairly |
| conservative estimate will be adequate.) Future small-object requests |
| aren't counted; the memory manager has to use a slop factor for those. |
| 10K or so seems to be sufficient. (In an 80x86, small objects aren't an |
| issue anyway, since they don't compete for far-heap space. "Medium"-size |
| objects will have to be counted separately.) |
| |
| The distinction between sample and coefficient array routines is annoying, |
| but it has to be maintained for machines in which "char *" is represented |
| differently from "int *". On byte-addressable machines some of these |
| methods could perhaps point to the same code. |
| |
| The array routines will operate on only 2-D arrays (one component at a |
| time), since different components may require different-size arrays. |
| |
| (This object hides the knowledge of whether virtual memory is available, |
| as well as the actual interface to OS and library support routines.) |
| |
| Note that any given implementation will presumably contain only one |
| instantiation of input file header reading, overall control, user interface, |
| and memory management. Thus these could be called as simple subroutines, |
| without bothering with an object indirection. This is essential for overall |
| control (which has to initialize the object structure); for consistency we |
| will impose objectness on the other three. |
| |
| |
| *** Decompression object structure *** |
| |
| I propose the following set of objects for decompression. The general |
| comments at the top of the compression object section also apply here. |
| |
| 1. JPEG file scanning. This will provide these methods: |
| read_file_header: read the file header, determine which variant |
| JPEG format is in use, read everything through SOF. |
| read_scan_header: read scan header (up through SOS). This is called |
| after read_file_header and again after each scan; |
| it returns TRUE if it finds SOS, FALSE if EOI. |
| read_jpeg_data: fetch data for entropy decoder. |
| resync_to_restart: try to recover from bogus data (see below). |
| read_scan_trailer: finish up after one scan, prepare for another call |
| of read_scan_header (may be a no-op). |
| read_file_trailer: finish up at end of file (probably a no-op). |
| The entropy decoder must deal with restart markers, but all other JPEG |
| marker types will be handled in this object; useful data from the markers |
| will be extracted into data structures available to subsequent routines. |
| Note that on exit from read_file_header, only the SOF-marker data should be |
| assumed valid (image size, component IDs, sampling factors); other data |
| such as Huffman tables may not appear until after the SOF. The overall |
| image size and colorspace can be determined after read_file_header, but not |
| whether or how the data is interleaved. (This hides which variant JPEG |
| file format is being read. In particular, for JPEG-in-TIFF the read_header |
| routines might not be scanning standard JPEG markers at all; they could |
| extract the data from TIFF tags. The user interface will already have |
| opened the input file and possibly read part of the header before |
| read_file_header is called.) |
| |
| When reading a file with a nonzero restart interval, the entropy decoder |
| expects to see a correct sequence of restart markers. In some cases, these |
| markers may be synthesized by the file-format module (a TIFF reader might |
| do so, for example, using tile boundary pointers to determine where the |
| restart intervals fall). If the incoming data is corrupted, the entropy |
| decoder will read as far as the next JPEG marker, which may or may not be |
| the expected next restart marker. If it isn't, resync_to_restart is called |
| to try to locate a good place to resume reading. We make this heuristic a |
| file-format-dependent operation since some file formats may have special |
| info that's not available to the entropy decoder (again, TIFF is an |
| example). Note that resync_to_restart is NOT called at the end of a scan; |
| it is read_scan_trailer's responsibility to resync there. |
| |
| NOTE: for JFIF/raw-JPEG file format, the read_jpeg_data routine is actually |
| supplied by the user interface; the jrdjfif module uses read_jpeg_data |
| internally to scan the input stream. This makes it possible for the user |
| interface module to single-handedly implement special applications like |
| reading from a non-stdio source. For JPEG-in-TIFF format, the need for |
| random access will make it impossible for this to work; hence the TIFF |
| header module will override the UI-supplied read_jpeg_data routine. |
| Non-stdio input from a TIFF file will require extensive surgery to the TIFF |
| header module, if indeed it is practical at all. |
| |
| 2. Entropy (Huffman or arithmetic) decoding of the coefficient sequence. |
| entropy_decode_init: prepare for one scan. |
| entropy_decode: decodes and returns an MCU's worth of quantized |
| coefficients per call. |
| entropy_decode_term: finish up after a scan (may be a no-op). |
| This will read raw data by calling the read_jpeg_data method (I don't see |
| any reason to provide a further level of indirection). |
| (This hides which entropy encoding method is in use.) |
| |
| 3. Quantization descaling and zigzag reordering of the elements in each 8x8 |
| block. This will be folded into entropy_decode for efficiency reasons: |
| many of the coefficients are zeroes, and this can be exploited most easily |
| within entropy_decode since the encoding explicitly skips zeroes. |
| |
| 4. MCU disassembly (conversion of a possibly interleaved sequence of 8x8 |
| blocks back to separate components in pixel map order). |
| disassemble_init: initialize. This will be called once per scan. |
| disassemble_MCU: Given an MCU's worth of dequantized blocks, |
| distribute them into the proper locations in a |
| coefficient image array. |
| disassemble_term: clean up at the end of a scan. |
| Probably this should be called once per MCU row and should call the |
| entropy decoder repeatedly to obtain the row's data. The output is |
| always a multiple of an MCU's dimensions. |
| (An object on the grounds that multiple instantiations might be useful.) |
| |
| 5. Cross-block smoothing per JPEG section K.8 or a similar algorithm. |
| smooth_coefficients: Given three block rows' worth of a single |
| component, emit a smoothed equivalent of the |
| middle row. The "above" and "below" pointers |
| may be NULL if at top/bottom of image. |
| The pipeline controller will do the necessary buffering to provide the |
| above/below context. Smoothing will be optional since a good deal of |
| extra memory is needed to buffer the additional block rows. |
| (This object hides the details of the smoothing algorithm.) |
| |
| 6. Inverse DCT transformation of each 8x8 block. |
| reverse_DCT: given an MCU row's worth of blocks, perform inverse |
| DCT on each block and output the results into an array |
| of samples. |
| We put this method into the jdmcu module for symmetry with the division of |
| labor in compression. Note that the actual IDCT code is a separate source |
| file. |
| |
| 7. Upsampling and smoothing: this will be applied to one component at a |
| time. Note that cross-pixel smoothing, which was a separate step in the |
| prototype code, will now be performed simultaneously with expansion. |
| upsample_init: initialize (precalculate convolution factors, for |
| example). This will be called once per scan. |
| upsample: Given a sample array, enlarge it by specified sampling |
| factors. |
| upsample_term: clean up at the end of a scan. |
| If the current component has vertical sampling factor Vk and the largest |
| sampling factor is Vmax, then the input is always Vk sample rows (whose |
| width is a multiple of Hk) and the output is always Vmax sample rows. |
| Vk additional rows above and below the nominal input rows are also passed |
| for use in cross-pixel smoothing. At the top and bottom of the image, |
| these extra rows are copies of the first or last actual input row. |
| (This hides whether and how cross-pixel smoothing occurs.) |
| |
| 8. Cropping to the original pixel dimensions (throwing away duplicated |
| pixels at the edges). This won't be a separate object, just an |
| adjustment of the nominal image size in the pipeline controller. |
| |
| 9. Color space reconversion and gamma adjustment. |
| colorout_init: initialization. This will be passed the component |
| data from read_file_header, and will determine the |
| number of output components. |
| color_convert: convert a specified number of pixel rows. Input and |
| output are image arrays of same size but possibly |
| different numbers of components. |
| colorout_term: cleanup (probably a no-op except for memory dealloc). |
| In practice will usually be given an MCU row's worth of pixel rows, except |
| at the bottom where a smaller number of rows may be left over. Note that |
| this object works on all the components at once. |
| When quantizing colors, color_convert may be applied to the colormap |
| instead of actual pixel data. color_convert is called by the color |
| quantizer in this case; the pipeline controller calls color_convert |
| directly only when not quantizing. |
| (Hides all knowledge of color space semantics and conversion. Remaining |
| modules only need to know the number of JPEG and output components.) |
| |
| 10. Color quantization (used only if a colormapped output format is requested). |
| We use two different strategies depending on whether one-pass (on-the-fly) |
| or two-pass quantization is requested. Note that the two-pass interface |
| is actually designed to let the quantizer make any number of passes. |
| color_quant_init: initialization, allocate working memory. In 1-pass |
| quantization, should call put_color_map. |
| color_quantize: convert a specified number of pixel rows. Input |
| and output are image arrays of same size, but input |
| is N coefficients and output is only one. (Used only |
| in 1-pass quantization.) |
| color_quant_prescan: prescan a specified number of pixel rows in |
| 2-pass quantization. |
| color_quant_doit: perform multi-pass color quantization. Input is a |
| "big" sample image, output is via put_color_map and |
| put_pixel_rows. (Used only in 2-pass quantization.) |
| color_quant_term: cleanup (probably a no-op except for memory dealloc). |
| The input to the color quantizer is always in the unconverted colorspace; |
| its output colormap must be in the converted colorspace. The quantizer |
| has the choice of which space to work in internally. It must call |
| color_convert either on its input data or on the colormap it sends to the |
| output module. |
| For one-pass quantization the image is simply processed by color_quantize, |
| a few rows at a time. For two-pass quantization, the pipeline controller |
| accumulates the output of steps 1-8 into a "big" sample image. The |
| color_quant_prescan method is invoked during this process so that the |
| quantizer can accumulate statistics. (If the input file has multiple |
| scans, the prescan may be done during the final scan or as a separate |
| pass.) At the end of the image, color_quant_doit is called; it must |
| create and output a colormap, then rescan the "big" image and pass mapped |
| data to the output module. Additional scans of the image could be made |
| before the output pass is done (in fact, prescan could be a no-op). |
| As with entropy parameter optimization, the pipeline controller actually |
| passes an iterator function rather than direct access to the big image. |
| (Hides color quantization algorithm.) |
| |
| 11. Writing of the desired image format. |
| output_init: produce the file header given data from read_file_header. |
| put_color_map: output colormap, if any (called by color quantizer). |
| If used, must be called before any pixel data is output. |
| put_pixel_rows: output image data in desired format. |
| output_term: finish up at the end. |
| The actual timing of I/O may differ from that suggested by the routine |
| names; for instance, writing of the file header may be delayed until |
| put_color_map time if the actual number of colors is needed in the header. |
| Also, the colormap is available to put_pixel_rows and output_term as well |
| as put_color_map. |
| Note that whether colormapping is needed will be determined by the user |
| interface object prior to method selection. In implementations that |
| support multiple output formats, the actual output format will also be |
| determined by the user interface. |
| (Hides format of output image and mechanism used to write it. Note that |
| several other objects know the color model used by the output format. |
| The actual mechanism for writing the file is private to this object and |
| the user interface.) |
| |
| 12. Pipeline control. This object will provide the "main loop" that invokes |
| all the pipeline objects. Note that we will need several different main |
| loops depending on the situation (interleaved input or not, whether to |
| apply cross-block smoothing or not, etc). We may want to divvy up the |
| pipeline controllers into two levels, one that retains control over the |
| whole file and one that is invoked per scan. |
| This object will do most of the memory allocation, since it will provide |
| the working buffers that are the inputs and outputs of the pipeline steps. |
| (An object mostly to support multiple instantiations; however, overall |
| memory management and sequencing of operations are known only here.) |
| |
| 13. Overall control. This module will provide at least two routines: |
| jpeg_decompress: the main entry point to the decompressor. |
| per_scan_method_selection: called by pipeline controllers for |
| secondary method selection passes. |
| jpeg_decompress is invoked from the user interface after the UI has |
| selected the input and output files and obtained values for all |
| user-specified options (e.g., output file format, whether to do block |
| smoothing). jpeg_decompress calls read_file_header, performs basic |
| initialization (e.g., calculating the size of MCUs), does the "global" |
| method selection pass, and finally calls the selected pipeline control |
| object. (Per-scan method selections will be invoked by the pipeline |
| controller.) |
| Note that jpeg_decompress can't be a method since it is invoked prior to |
| method selection. |
| |
| 14. User interface; this is the architecture's term for "the rest of the |
| application program", i.e., that which invokes the JPEG decompressor. |
| The UI is expected to supply input and output files and values for all |
| operational parameters. The UI must also supply error handling routines. |
| (This module hides the user interface provided --- command line, |
| interactive, etc. Except for error handling, the UI calls the portable |
| JPEG code, not the other way around.) |
| |
| 15. A memory management object. This will be identical to the memory |
| management for compression (and will be the same code, in combined |
| programs). See above for details. |
| |
| |
| *** Initial method selection *** |
| |
| The main ugliness in this design is the portion of startup that will select |
| which of several instantiations should be used for each of the objects. (For |
| example, Huffman or arithmetic for entropy encoding; one of several pipeline |
| controllers depending on interleaving, the size of the image, etc.) It's not |
| really desirable to have a single chunk of code that knows the names of all |
| the possible instantiations and the conditions under which to select each one. |
| |
| The best approach seems to be to provide a selector function for each object |
| (group of related method calls). This function knows about each possible |
| instantiation of its object and how to choose the right one; but it doesn't |
| know about any other objects. |
| |
| Note that there will be several rounds of method selection: at initial startup, |
| after overall compression parameters are determined (after the file header is |
| read, if decompressing), and one in preparation for each scan (this occurs |
| more than once if the file is noninterleaved). Each object method will need |
| to be clearly identified as to which round sets it up. |
| |
| |
| *** Implications of DNL marker *** |
| |
| Some JPEG files may use a DNL marker to postpone definition of the image |
| height (this would be useful for a fax-like scanner's output, for instance). |
| In these files the SOF marker claims the image height is 0, and you only |
| find out the true image height at the end of the first scan. |
| |
| We could handle these files as follows: |
| 1. Upon seeing zero image height, replace it by 65535 (the maximum allowed). |
| 2. When the DNL is found, update the image height in the global image |
| descriptor. |
| This implies that pipeline control objects must avoid making copies of the |
| image height, and must re-test for termination after each MCU row. This is |
| no big deal. |
| |
| In situations where image-size data structures are allocated, this approach |
| will result in very inefficient use of virtual memory or |
| much-larger-than-necessary temporary files. This seems acceptable for |
| something that probably won't be a mainstream usage. People might have to |
| forgo use of memory-hogging options (such as two-pass color quantization or |
| noninterleaved JPEG files) if they want efficient conversion of such files. |
| (One could improve efficiency by demanding a user-supplied upper bound for the |
| height, less than 65536; in most cases it could be much less.) |
| |
| Alternately, we could insist that DNL-using files be preprocessed by a |
| separate program that reads ahead to the DNL, then goes back and fixes the SOF |
| marker. This is a much simpler solution and is probably far more efficient. |
| Even if one wants piped input, buffering the first scan of the JPEG file |
| needs a lot smaller temp file than is implied by the maximum-height method. |
| For this approach we'd simply treat DNL as a no-op in the decompressor (at |
| most, check that it matches the SOF image height). |
| |
| We will not worry about making the compressor capable of outputting DNL. |
| Something similar to the first scheme above could be applied if anyone ever |
| wants to make that work. |
| |
| |
| *** Memory manager internal structure *** |
| |
| The memory manager contains the most potential for system dependencies. |
| To isolate system dependencies as much as possible, we have broken the |
| memory manager into two parts. There is a reasonably system-independent |
| "front end" (jmemmgr.c) and a "back end" that contains only the code |
| likely to change across systems. All of the memory management methods |
| outlined above are implemented by the front end. The back end provides |
| the following routines for use by the front end (none of these routines |
| are known to the rest of the JPEG code): |
| |
| jmem_init, jmem_term system-dependent initialization/shutdown |
| |
| jget_small, jfree_small interface to malloc and free library routines |
| |
| jget_large, jfree_large interface to FAR malloc/free in MS-DOS machines; |
| otherwise same as jget_small/jfree_small |
| |
| jmem_available estimate available memory |
| |
| jopen_backing_store create a backing-store object |
| |
| read_backing_store, manipulate a backing store object |
| write_backing_store, |
| close_backing_store |
| |
| On some systems there will be more than one type of backing-store object |
| (specifically, in MS-DOS a backing store file might be an area of extended |
| memory as well as a disk file). jopen_backing_store is responsible for |
| choosing how to implement a given object. The read/write/close routines |
| are method pointers in the structure that describes a given object; this |
| lets them be different for different object types. |
| |
| It may be necessary to ensure that backing store objects are explicitly |
| released upon abnormal program termination. (For example, MS-DOS won't free |
| extended memory by itself.) To support this, we will expect the main program |
| or surrounding application to arrange to call the free_all method upon |
| abnormal termination; this may require a SIGINT signal handler, for instance. |
| (We don't want to have the system-dependent module install its own signal |
| handler, because that would pre-empt the surrounding application's ability |
| to control signal handling.) |
| |
| |
| *** Notes for MS-DOS implementors *** |
| |
| The standalone cjpeg and djpeg applications can be compiled in "small" memory |
| model, at least at the moment; as the code grows we may be forced to switch to |
| "medium" model. (Small = both code and data pointers are near by default; |
| medium = far code pointers, near data pointers.) Medium model will slow down |
| calls through method pointers, but I don't think this will amount to any |
| significant speed penalty. |
| |
| When integrating the JPEG code into a larger application, it's a good idea to |
| stay with a small-data-space model if possible. An 8K stack is much more than |
| sufficient for the JPEG code, and its static data requirements are less than |
| 1K. When executed, it will typically malloc about 10K-20K worth of near heap |
| space (and lots of far heap, but that doesn't count in this calculation). |
| This figure will vary depending on image size and other factors, but figuring |
| 30K should be more than sufficient. Thus you have about 25K available for |
| other modules' static data and near heap requirements before you need to go to |
| a larger memory model. The C library's static data will account for several K |
| of this, but that still leaves a good deal for your needs. (If you are tight |
| on space, you could reduce JPEG_BUF_SIZE from 4K to 1K to save 3K of near heap |
| space.) |
| |
| As the code is improved, we will endeavor to hold the near data requirements |
| to the range given above. This does imply that certain data structures will |
| be allocated as FAR although they would fit in near space if we assumed the |
| JPEG code is stand-alone. (The LZW tables in jrdgif/jwrgif are examples.) |
| To make an optimal implementation, you might want to move these structures |
| back to near heap if you know there is sufficient space. |
| |
| FAR data space may also be a tight resource when you are dealing with large |
| images. The most memory-intensive case is decompression with two-pass color |
| quantization. This requires a 128Kb color histogram plus strip buffers |
| amounting to about 150 bytes per column for typical sampling ratios (eg, about |
| 96000 bytes for a 640-pixel-wide image). You may not be able to process wide |
| images if you have large data structures of your own. |
| |
| |
| *** Potential optimizations *** |
| |
| For colormapped input formats it might be worthwhile to merge the input file |
| reading and the colorspace conversion steps; in other words, do the colorspace |
| conversion by hacking up the colormap before inputting the image body, rather |
| than doing the conversion on each pixel independently. Not clear if this is |
| worth the uglification involved. In the above design for the compressor, only |
| the colorspace conversion step ever sees the output of get_input_row, so this |
| sort of thing could be done via private agreement between those two modules. |
| |
| Level shift from 0..255 to -128..127 may be done either during colorspace |
| conversion, or at the moment of converting an 8x8 sample block into the format |
| used by the DCT step (which will be signed short or long int). This could be |
| selectable by a compile-time flag, so that the intermediate steps can work on |
| either signed or unsigned chars as samples, whichever is most easily handled |
| by the platform. However, making sure that rounding is done right will be a |
| lot easier if we can assume positive values. At the moment I think that |
| benefit is worth the overhead of "& 0xFF" when reading out sample values on |
| signed-char-only machines. |