libxlsxwriter
|
By default libxlsxwriter holds all cell data in memory to allow non-sequential data storage. The effect of this is that for large files libxlsxwriter can consume a lot of memory.
Fortunately, this memory usage can be reduced almost completely by using workbook_new_opt() and the lxw_workbook_options constant_memory
property:
This optimization works by flushing each row after a subsequent row is written. In this way the largest amount of data held in memory for a worksheet is the amount of data required to hold a single row of data.
If required, this memory usage can be reduced even more by setting LXW_COL_MAX
in worksheet.c from 16384 down to a value that matches the maximum column that is likely to be encountered.
The trade-off when using constant_memory
mode is that data must be added sequentially in row order and you won't be able to take advantage of any functions that manipulate cell data after it is written. For example:
constant_memory
mode worksheet_set_row()
can only be used when writing data to the current row. This has an additional knock on effect that images won't scale properly over row heights adjusted with worksheet_set_row()
.worksheet_merge_range()
can only be applied to the current row (which in general isn't very useful).Since each new row flushes the previous row, data must be written in sequential row order when constant_memory
mode is on:
The libxlsxwriter library uses temporary files stored in the system /tmp
directory prior to creating the final xlsx file. In constant_memory
mode the library uses additional temporary file storage for worksheet data. This can lead to an issue on OSes that map the /tmp
directory into memory since it is possible for a libxlsxwriter application to consume the "system" memory via disk usage even though the "process" memory remains constant.
This is generally only an issue with embedded Linux systems with limited amounts of system memory. In these cases you should use an alternative temporary file location by using the tmpdir
option of lxw_workbook_options and workbook_new_opt().
Another optimization that is used to reduce memory usage in constant_memory
mode is that cell strings aren't stored in an Excel structure call "shared
strings" and instead are written "in-line".
This is a documented Excel feature that is supported by most spreadsheet applications. However, it isn't supported by some some spreadsheet viewer applications. Also, the size of the output file can increase by 20%-100% depending on the amount of repeated string data.
Currently the library is optimized but not highly optimized. Also, the library is currently single threaded.
Compiling with the embedded but option dtoa library is 40-50% faster for raw numeric data. See Using a double formatting library.
Next: Working with VBA Macros