By default libxlsxwriter holds all cell data in memory. This is to allow non-sequential data storage and also to allow future features where formatting is applied separately from the data.
The effect of this is that for large files libxlsxwriter can consume a lot of memory.
This optimization works by flushing each row after a subsequent row is written. In this way the largest amount of data held in memory for a worksheet is the amount of data required to hold a single row of data.
If required, this memory usage can be reduced even more by setting
LXW_COL_MAX in worksheet.c from 16384 down to a value that matches the maximum column that is likely to be encountered.
The trade-off when using
constant_memory mode is that data must be added sequentially in row order and you won't be able to take advantage of any functions that manipulate cell data after it is written. One such feature is worksheet_set_row() which in
constant_memory mode can only be used when writing data to the current row. Another is worksheet_merge_range() which can only be applied to a merged range in the current row.
Since each new row flushes the previous row, data must be written in sequential row order when
constant_memory mode is on:
Another optimization that is used to reduce memory usage in
constant_memory mode is that cell strings aren't stored in an Excel structure call "shared
strings" and instead are written "in-line". This is a documented Excel feature that is supported by most spreadsheet applications. One known exception is Apple Numbers for Mac where the string data isn't displayed.
Currently the library isn't highly optimized. Also, the library is currently single threaded.
Next: Example Programs