Constant Memory Mode

By default libxlsxwriter holds all cell data in memory to allow non-sequential data storage. The effect of this is that for large files libxlsxwriter can consume a lot of memory.

Fortunately, this memory usage can be reduced almost completely by using workbook_new_opt() and the lxw_workbook_options constant_memory property:

#include "xlsxwriter.h"
 
int main() {
 
    lxw_row_t row;
    lxw_col_t col;
    lxw_row_t max_row = 1000;
    lxw_col_t max_col = 50;
 
    /* Set the worksheet options. */
    lxw_workbook_options options = {.constant_memory = LXW_TRUE,
                                    .tmpdir = NULL,
                                    .use_zip64 = LXW_FALSE,
                                    .output_buffer = NULL,
                                    .output_buffer_size = NULL};
 
    /* Create a new workbook with options. */
    lxw_workbook  *workbook  = workbook_new_opt("constant_memory.xlsx", &options);
    lxw_worksheet *worksheet = workbook_add_worksheet(workbook, NULL);
 
    for (row = 0; row < max_row; row++) {
        for (col = 0; col < max_col; col++) {
            worksheet_write_number(worksheet, row, col, 123.45, NULL);
        }
    }
 
    return workbook_close(workbook);
}

This optimization works by flushing each row after a subsequent row is written. In this way the largest amount of data held in memory for a worksheet is the amount of data required to hold a single row of data.

If required, this memory usage can be reduced even more by setting LXW_COL_MAX in worksheet.c from 16384 down to a value that matches the maximum column that is likely to be encountered.

The trade-off when using constant_memory mode is that data must be added sequentially in row order and you won't be able to take advantage of any functions that manipulate cell data after it is written. For example:

In constant_memory mode worksheet_set_row() can only be used when writing data to the current row. This has an additional knock on effect that images won't scale properly over row heights adjusted with worksheet_set_row().
A merged range set with worksheet_merge_range() can only be applied to the current row (which in general isn't very useful).

Row Column Order

Since each new row flushes the previous row, data must be written in sequential row order when constant_memory mode is on:

lxw_workbook  *workbook  = workbook_new_opt("constant_memory.xlsx", &options);
...
 
// !! Don't use "column x row" order in 'constant_memory' mode. Only
//    the first column of data will be written.
for (col = 0; col < max_col; col++) {
    for (row = 0; row < max_row; row++) {
        worksheet_write_number(worksheet, row, col, 123.45, NULL);
    }
}

Constant memory mode and the /tmp directory

The libxlsxwriter library uses temporary files stored in the system /tmp directory prior to creating the final xlsx file. In constant_memory mode the library uses additional temporary file storage for worksheet data. This can lead to an issue on OSes that map the /tmp directory into memory since it is possible for a libxlsxwriter application to consume the "system" memory via disk usage even though the "process" memory remains constant.

This is generally only an issue with embedded Linux systems with limited amounts of system memory. In these cases you should use an alternative temporary file location by using the tmpdir option of lxw_workbook_options and workbook_new_opt().

Inline strings

Another optimization that is used to reduce memory usage in constant_memory mode is that cell strings aren't stored in an Excel structure call "shared strings" and instead are written "in-line".

This is a documented Excel feature that is supported by most spreadsheet applications. However, it isn't supported by some some spreadsheet viewer applications. Also, the size of the output file can increase by 20%-100% depending on the amount of repeated string data.

Performance

Currently the library is optimized but not highly optimized. Also, the library is currently single threaded.

Compiling with the embedded but option dtoa library is 40-50% faster for raw numeric data. See Using a double formatting library.

Next: Working with VBA Macros

Table of Contents

Constant Memory Mode

Row Column Order

Constant memory mode and the /tmp directory

Inline strings

Performance