TOC for documents #204
Replies: 10 comments 13 replies
-
Performance of serializationI made some first experiments with serialization to JSON and CBOR. The results for a 2000 pages document are as follows:
So the benefit in compactness for CBOR it not much, as most data are just strings, which have the same size in JSON and CBOR. Of course serialization is faster with CBOR, but JSON is still not a problem. Note that this is for compact JSON (single line, no spaces) and pretty printed JSON would be larger (141 kByte with an indentation of 2 spaces per level). |
Beta Was this translation helpful? Give feedback.
-
Class responsibilities and sequence of refactoringWork split of
|
Beta Was this translation helpful? Give feedback.
-
Good idea. However, in our context, where the files are stored on a shared disk, it doesn't help, because you can have multiple version of OpenBoard that access the same folder. With this change, I'll have to make clear that we can't accept older version to persist in some machines anymore. it will be OK after that, but the release containing this change will force our schools to transition quickly and on every machine. I'll discuss it internally with the deploy team to see how much of an issue it could be. |
Beta Was this translation helpful? Give feedback.
-
|
Concurrent with these considerations I'm also experimenting with some refactorings to see what consequences this has to other code. During this work, I repeatedly stumble across #198, the multiple and slightly different implementations for copying a scene. One example is the deletion of pages. Here we have again an implementation to copy pages in ? I have the feeling that we first should address that problem before we go deeper into TOC work. |
Beta Was this translation helpful? Give feedback.
-
|
Hi Martin, Thank you for all the work you put in analysis of this issue. You had very good ideas to mitigate the possible issues a user would encounter playing with different versions of OpenBoard on a same document, after the TOC implementation. Your recent idea of using document metadata version number (actually 4.8) could be the simpler and best solution : we check for the document version. If not 4.9, we update the document to the new structure (with a We can even introduce an empty file in the folder, for example Do you think it could work ? What are your thoughts based on 4.8/4.9 version number ? |
Beta Was this translation helpful? Give feedback.
-
|
Here a further brain dump: Do not change media asset UUIDs
Note:
Use hash based UUIDs for media filesAdditional idea: We could also use a hash based UUID, e.g. created by This will not work for web widgets represented by a directory. We could hash the Increment document version numberTo distinguish documents processed by the new version of OpenBoard, we increment the document version number to 4.9.0. We also have to write the version number as the last element of When opening a document in the new version of OpenBoard, we can detect
We should trigger a (re-)scan of the document and create or update the TOC whenever the version number is < 4.9.0. Store assets as part of the TOCThe TOC entry for a page shall contain an "assets" list with the names of all assets used. The list is updated when saving a scene. When deleting a Store the scene UUID as part of the TOCStoring the scene UUID as part of the TOC allows us to verify the integrity of the TOC when a document was modified by a previous version of OpenBoard. See "scanning" below. Scanning a documentA document has to be scanned or re-scanned when the document version is < 4.9.0. Here a proposal for the algorithm which works both for an initial scan and for a re-scan. Scanning is started when we create a
|
Beta Was this translation helpful? Give feedback.
-
|
Question to @kaamui: In In my scanner to create a TOC, I want to use If yes, then I need other ways to scan for all scene files even if they are not in consecutive order and might have quite large gaps. Edit: I assume this was not the problem. In Edit 2: Found this comment in So a directory listing of a newly created document can potentially return not all files. This would just mean that we should not create the first TOC of a new document by scanning, but just by creating it. That should be easy. |
Beta Was this translation helpful? Give feedback.
-
|
Just created branch feat-toc with all the current work. It is not ready for merging, but for testing. The work previously done in refactor-scene-copy is also contained (and improved) here, so that branch will be deleted. Still missing is the cleanup of no longer referenced media asset files. Also the scanner for scanning pre-4.9.0 documents will be further improved to convert the asset files to use the new content-based UUIDv5 identifiers. Here is the comparison of all changes: OpenBoard-org/OpenBoard@dev...letsfindaway:OpenBoard:feat-toc With (currently) 11 commits containing 1,415 additions and 7,291 deletions it is more extensive than I have expected initially. The large number of deletions comes from the fact that the UBForeignObjectsHandler and CFF support, but also some unused functions were dropped. I have also collected some TODOs in #215 |
Beta Was this translation helpful? Give feedback.
-
|
After a longer pause I now added one more commit to address most of the points in #215. Renumbering pages for export is still open, and some more documentation. |
Beta Was this translation helpful? Give feedback.
-
|
Now also renumbering pages for export is complete. I will create a PR which can be used for testing. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Problem
Currently, the table of contents (TOC) of an OpenBoard document only consists of the file names, which must always be numbered in ascending order without gaps. When inserting or deleting pages in a huge document, lots of files need to be renamed, which leads to serious delays, especially when using a network drive.
Proposal
The idea is now to decouple the file names from the sequence of pages by using a TOC. The individual page files never get renamed. Instead, just the entries in the TOC are updated.
The TOC also offers additional benefits for future features:
The structure of the TOC is just a list of pages, containing various attributes. In a first step the only attribute might be the real file name of a page. Other attributes can be added later.
Serialization
As the TOC is relevant for understanding the document, it must be flushed to disk after any modification. That means that it is beneficial when serializing the TOC is fast and the data format is compact. Besides that, there are many data formats which could be used. Just to name a few, which are natively supported by Qt Core:
See also https://doc.qt.io/qt-6/qtserialization.html for an overview of Qt serialization mechanisms including a discussion on advantages.
Version compatibility
To provide upward compatibility OpenBoard can simply add a TOC if it opens a document without that file. It just enumerates the existing pages in numerical order. and adds no other page attributes.
Providing downward compatibility is more difficult:
We could however provide an "Export as..." option for UBZ files creating the current format without TOC. This will definitely help with the page sequence. However additional attributes stored in the TOC will be lost. Alternatively we could try to make a compatible export version by renumbering the pages during creation of the compressed file AND updating the TOC accordingly and store both in the exported UBZ file. The resulting file will be compatible without loosing additional attributes.
Implementation
The PR OpenBoard-org#1162 offers a good base for adding a TOC, as it introduces a
UBDocumentclass, which handles adding, removing and moving of pages. The TOC will however not directly be attached to that class, but to theUBPersistenceManager, which provides functions to retrieve file names for a page index. Other classes which need the file names will always use those functions.Edit After longer considerations I now tend to attach the TOC to
UBDocument:We will also move the scene cache to the document. There is no need to access a scene when we don't have the document.Scene cache remains at the persistence manager, as there are functions (especially deleting all pages of a document) which do not require an existingUBDocumentinstance.An implementation of the TOC should separate the TOC itself and the serialization of the TOC, so that the serialization format can easily be switched. No serialization specific classes must appear on the API of the TOC.
Beta Was this translation helpful? Give feedback.
All reactions