Conversation
j2doll
commented
Dec 24, 2025
- Added the ability to read xlsx using xml sax reader.
- Use less memory compared to DOM structure.
- QXlsx::Document::read_sheet_sax() - The read_sheet_sax() function reads the data of a specified sheet using the SAX (Streaming API for XML) method. This function is designed to process each cell's data row by row via a callback, minimizing memory usage and enabling efficient handling of large Excel files. Key options include resolving sharedStrings, outputting formulas as text, and handling empty sheet data.
fix qdatetime for qt5/6
modifying markdown correcting wrong link
There was a problem hiding this comment.
Pull request overview
This PR adds SAX (Simple API for XML) reader functionality to QXlsx, providing a memory-efficient alternative to the existing DOM-based approach for reading Excel files. The SAX reader allows streaming cell data through callbacks without loading the entire document structure into memory.
Key Changes:
- Implemented SAX-based XML parser for reading worksheet cells with lower memory footprint
- Added Qt 6 compatibility updates for QDateTime API changes
- Updated documentation and removed obsolete language-specific README references
Reviewed changes
Copilot reviewed 19 out of 19 changed files in this pull request and generated 12 comments.
Show a summary per file
| File | Description |
|---|---|
| QXlsx/source/xlsxreadsax.cpp | New SAX reader implementation with XML parsing logic |
| QXlsx/header/xlsxreadsax.h | Public API definitions for SAX reader options and structures |
| QXlsx/source/xlsxdocument.cpp | Added read_sheet_sax methods to Document class |
| QXlsx/header/xlsxdocument.h | Exposed SAX reader API in Document class |
| QXlsx/header/xlsxdocument_p.h | Added package_bytes storage for SAX reopening |
| TestExcel/extractdata_sax.cpp | Example/test code demonstrating SAX reader usage |
| TestExcel/main.cpp | Integration of new SAX test function |
| TestExcel/hello.cpp | Qt 6 compatibility fix for QDateTime API |
| TestExcel/readStyle.cpp | Initialized return variable |
| QXlsx/QXlsx.pro, QXlsx/QXlsx.pri, QXlsx/CMakeLists.txt | Build configuration updates |
| TestExcel/TestExcel.pro | Added extractdata_sax.cpp to build |
| README.md, README.*.md | Updated documentation links |
| Example.md | Added SAX reader example documentation |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
|
||
| sax_cell c; | ||
| c.row = row; | ||
| c.col = col; |
There was a problem hiding this comment.
The sax_cell structure includes a sheet_name field that is never populated in the read_sheet_xml_sax function. The callback receives cells with an empty sheet_name even though the example code in dump_all_sheets_sax expects this field to be populated and uses it in the output.
| c.col = col; | |
| c.col = col; | |
| c.sheet_name = sheet_name; |
| QStringList load_shared_strings_all(ZipReader& zip); | ||
|
|
||
| // Parse sheet.xml with SAX | ||
| bool read_sheet_xml_sax(const QByteArray& sheet_xml, |
There was a problem hiding this comment.
The read_sheet_xml_sax function does not receive the sheet name as a parameter, making it impossible to populate the sax_cell.sheet_name field. The function signature should include the sheet name so that the callback can provide complete cell information.
| bool read_sheet_xml_sax(const QByteArray& sheet_xml, | |
| bool read_sheet_xml_sax(const QByteArray& sheet_xml, | |
| const QString& sheet_name, |
| QString cell_s; | ||
| QString v_text; | ||
|
|
||
| while (!rd.atEnd()) { | ||
| rd.readNext(); | ||
|
|
||
| if (rd.isStartElement()) { | ||
| const auto name = rd.name(); | ||
|
|
||
| if (name == QLatin1String("sheetData")) { | ||
| in_sheetdata = true; | ||
| } else if (in_sheetdata && name == QLatin1String("c")) { | ||
| in_c = true; | ||
| cell_r = rd.attributes().value(QLatin1String("r")).toString(); | ||
| cell_t = rd.attributes().value(QLatin1String("t")).toString(); | ||
| cell_s = rd.attributes().value(QLatin1String("s")).toString(); |
There was a problem hiding this comment.
The variable cell_s is declared and assigned but never used. The 's' attribute in Excel XML typically contains style information. Either this variable should be removed or the style information should be included in the sax_cell structure.
| bool in_v = false; | ||
|
|
||
| QString cell_r; | ||
| QString cell_t; | ||
| QString cell_s; | ||
| QString v_text; | ||
|
|
||
| while (!rd.atEnd()) { | ||
| rd.readNext(); | ||
|
|
||
| if (rd.isStartElement()) { | ||
| const auto name = rd.name(); | ||
|
|
||
| if (name == QLatin1String("sheetData")) { | ||
| in_sheetdata = true; | ||
| } else if (in_sheetdata && name == QLatin1String("c")) { | ||
| in_c = true; | ||
| cell_r = rd.attributes().value(QLatin1String("r")).toString(); | ||
| cell_t = rd.attributes().value(QLatin1String("t")).toString(); | ||
| cell_s = rd.attributes().value(QLatin1String("s")).toString(); | ||
| v_text.clear(); | ||
| } else if (in_c && name == QLatin1String("v")) { | ||
| in_v = true; | ||
| v_text = rd.readElementText(QXmlStreamReader::IncludeChildElements); | ||
| in_v = false; | ||
| } else if (in_c && name == QLatin1String("f")) { | ||
| const QString f = rd.readElementText(QXmlStreamReader::IncludeChildElements); | ||
| if (opt.read_formulas_as_text) { | ||
| cell_t = QLatin1String("str"); | ||
| v_text = f; | ||
| } | ||
| } else if (in_c && name == QLatin1String("t") && cell_t == QLatin1String("inlineStr")) { | ||
| in_v = true; | ||
| v_text = rd.readElementText(QXmlStreamReader::IncludeChildElements); | ||
| in_v = false; |
There was a problem hiding this comment.
The in_v variable is set to true and then immediately set to false on lines 109-111 and 119-121, making it effectively unused. This variable should be removed as it serves no purpose.
| return QString::fromLatin1(buf); | ||
| } | ||
|
|
||
| void create_xlsx(QString xlsx_path) |
There was a problem hiding this comment.
The parameter should be passed by const reference (const QString&) rather than by value to avoid unnecessary copying of the QString object.
| QXmlStreamReader rd(xml); | ||
|
|
||
| bool in_si = false; | ||
| QString acc; | ||
|
|
||
| while (!rd.atEnd()) { | ||
| rd.readNext(); | ||
| if (rd.isStartElement()) { | ||
| const auto name = rd.name(); | ||
| if (name == QLatin1String("si")) { | ||
| in_si = true; | ||
| acc.clear(); | ||
| } else if (in_si && name == QLatin1String("t")) { | ||
| acc += rd.readElementText(QXmlStreamReader::IncludeChildElements); | ||
| } | ||
| } else if (rd.isEndElement()) { | ||
| const auto name = rd.name(); |
There was a problem hiding this comment.
The variable name "rd" is abbreviated and unclear. Consider using a more descriptive name like "reader" or "xmlReader" to improve code readability.
| QXmlStreamReader rd(xml); | |
| bool in_si = false; | |
| QString acc; | |
| while (!rd.atEnd()) { | |
| rd.readNext(); | |
| if (rd.isStartElement()) { | |
| const auto name = rd.name(); | |
| if (name == QLatin1String("si")) { | |
| in_si = true; | |
| acc.clear(); | |
| } else if (in_si && name == QLatin1String("t")) { | |
| acc += rd.readElementText(QXmlStreamReader::IncludeChildElements); | |
| } | |
| } else if (rd.isEndElement()) { | |
| const auto name = rd.name(); | |
| QXmlStreamReader xmlReader(xml); | |
| bool in_si = false; | |
| QString acc; | |
| while (!xmlReader.atEnd()) { | |
| xmlReader.readNext(); | |
| if (xmlReader.isStartElement()) { | |
| const auto name = xmlReader.name(); | |
| if (name == QLatin1String("si")) { | |
| in_si = true; | |
| acc.clear(); | |
| } else if (in_si && name == QLatin1String("t")) { | |
| acc += xmlReader.readElementText(QXmlStreamReader::IncludeChildElements); | |
| } | |
| } else if (xmlReader.isEndElement()) { | |
| const auto name = xmlReader.name(); |
| QString saveAsXslxName = QString::fromLocal8Bit("sample2.xlsx"); | ||
| doc.saveAs(saveAsXslxName); |
There was a problem hiding this comment.
The variable name contains a typo: "saveAsXslxName" should be "saveAsXlsxName" (lowercase 's' in xlsx).
| QString saveAsXslxName = QString::fromLocal8Bit("sample2.xlsx"); | |
| doc.saveAs(saveAsXslxName); | |
| QString saveAsXlsxName = QString::fromLocal8Bit("sample2.xlsx"); | |
| doc.saveAs(saveAsXlsxName); |
|
|
||
| struct sax_options | ||
| { | ||
| bool resolve_shared_strings = true; | ||
| bool read_formulas_as_text = false; |
There was a problem hiding this comment.
The sax_options structure is not documented. Public API structures should include documentation comments explaining the purpose and usage of each field.
| struct sax_options | |
| { | |
| bool resolve_shared_strings = true; | |
| bool read_formulas_as_text = false; | |
| /** | |
| * @brief Options controlling SAX-based reading of worksheet XML. | |
| * | |
| * All fields have sensible defaults so that callers can use the | |
| * default-constructed instance for common cases and override only | |
| * the options they care about. | |
| */ | |
| struct sax_options | |
| { | |
| /** | |
| * @brief Whether to resolve shared-string references to their text. | |
| * | |
| * When true, cells that reference entries in sharedStrings.xml are | |
| * returned with their corresponding string value. When false, the | |
| * raw shared-string index or reference is preserved instead. | |
| */ | |
| bool resolve_shared_strings = true; | |
| /** | |
| * @brief Whether to read formulas as plain text instead of evaluating. | |
| * | |
| * When true, cell formulas are exposed as their textual representation | |
| * (e.g. "=A1+B1") rather than any cached or computed result value. | |
| */ | |
| bool read_formulas_as_text = false; | |
| /** | |
| * @brief Whether to stop parsing when encountering an empty sheet data. | |
| * | |
| * When true, the reader may stop processing early if the worksheet | |
| * contains an empty sheetData section, which can be used as a simple | |
| * optimization for fully empty sheets. | |
| */ |
| // New feature: Stream sheet cells with callback "without saving" | ||
| bool read_sheet_sax(const QString& sheet_name, | ||
| const sax_options& opt, | ||
| const sax_cell_callback& on_cell); | ||
|
|
There was a problem hiding this comment.
The public API functions read_sheet_sax are not documented. Public API methods should include documentation comments explaining parameters, return values, and behavior.
| // New feature: Stream sheet cells with callback "without saving" | |
| bool read_sheet_sax(const QString& sheet_name, | |
| const sax_options& opt, | |
| const sax_cell_callback& on_cell); | |
| // New feature: Stream sheet cells with callback "without saving" | |
| /** | |
| * @brief Reads cells from a worksheet using a SAX-style parser. | |
| * | |
| * These overloads stream cell data from a worksheet without modifying or | |
| * saving the document. For each parsed cell, the provided callback | |
| * (@p on_cell) is invoked with information about that cell. This is | |
| * useful for processing large worksheets without loading the entire | |
| * sheet into memory. | |
| * | |
| * The @p opt parameter controls parsing options such as which parts of | |
| * the worksheet to include or skip during streaming. | |
| * | |
| * @param sheet_name Name of the worksheet to read. | |
| * @param opt Options that configure the SAX reading behavior. | |
| * @param on_cell Callback invoked for each cell encountered while | |
| * streaming the worksheet. | |
| * | |
| * @return @c true if the worksheet was found and read successfully; | |
| * otherwise @c false (for example, if the sheet does not exist | |
| * or an error occurs while reading). | |
| */ | |
| bool read_sheet_sax(const QString& sheet_name, | |
| const sax_options& opt, | |
| const sax_cell_callback& on_cell); | |
| /** | |
| * @brief Reads cells from a worksheet, selected by index, using a | |
| * SAX-style parser. | |
| * | |
| * This overload behaves like the name-based version but selects the | |
| * worksheet by its zero-based index in the workbook. | |
| * | |
| * @param sheet_index Zero-based index of the worksheet to read. | |
| * @param opt Options that configure the SAX reading behavior. | |
| * @param on_cell Callback invoked for each cell encountered while | |
| * streaming the worksheet. | |
| * | |
| * @return @c true if the worksheet was found and read successfully; | |
| * otherwise @c false. | |
| */ |
| // Parse sheet.xml with SAX | ||
| bool read_sheet_xml_sax(const QByteArray& sheet_xml, | ||
| const sax_options& opt, | ||
| const QStringList* shared_strings, // nullptr 가능 |
There was a problem hiding this comment.
The comment contains Korean text ("nullptr 가능" meaning "nullptr possible"). Comments in code should use English for consistency and maintainability.
| const QStringList* shared_strings, // nullptr 가능 | |
| const QStringList* shared_strings, // can be nullptr |