cfile / ddio: use most of functions to std::filesystem::path by winterheart · Pull Request #475 · DescentDevelopers/Descent3

winterheart · 2024-06-29T01:22:33Z

Pull Request Type

Description

This PR updates cfile and partially ddio modules to use C++17's std::filesystem::path feature, which simplifies overall filesystem operations.

Some parts of code was migrated, and there some performance improvements was achieved. Internally cfile maintains some sort search paths. Old code was excessively added some non-existent paths, and on opening files these paths was pointlessly used. Also there some code optimization in cf_FindRealFileNameCaseInsenstive() was performed.

Added unittest framework for cfile.

Related Issues

Screenshots (if applicable)

Checklist

I have tested my changes locally and verified that they work as intended.
I have documented any new or modified functionality.
I have reviewed the changes to ensure they do not introduce any unnecessary complexity or duplicate code.
I understand that by submitting this pull request, I am agreeing to license my contributions under the project's license.

Additional Comments

Make cfopen(), open_file_in_directory() and cfexist() use std::filesystem::path arguments.

Simplify related code in `cf_SetSearchPath()`.

Refactor paths and extensions to use std::map. Rewriting related code in manage.cpp.

Convert cf_IsFileInHog(), cf_OpenFileInLibrary() and cf_GetfileCRC() functions to use std::fs::path.

Refactor paths and extensions to use std::map. Rewriting related code in manage.cpp.

Convert cf_CopyFile() cf_CopyFileTime() and underlying functions to use std::filesystem::path

Convert cf_Diff() and underlying functions to use std::filesystem::path

Use std::fs::path on path construction, simplify code.

Simplifies code related to UpdatePrimitive().

Remove ddio_CreateDir(), ddio_RemoveDir(), ddio_SaveWorkingDir(), ddio_RestoreWorkingDir(), ddio_DirExists(). Replace them in code with std::fs alternatives.

Lgt2x · 2024-07-01T21:39:01Z

You can do something like that to fix mac os CI (but cleaner). strncpy is problematic Lgt2x@15a50ca#diff-be34739991a103c73033e6358b31b2a8d7859b7a5c84616dcf1409176168f55d

winterheart · 2024-07-02T17:10:38Z

After some refactoring cf_FindRealFileNameCaseInsensitive() now works much faster and more reliable. Here some benchmarks:

Release:

2024-07-02T20:08:56+03:00
Running ./cfile_bench
Run on (12 X 4300 MHz CPU s)
CPU Caches:
  L1 Data 32 KiB (x6)
  L1 Instruction 32 KiB (x6)
  L2 Unified 256 KiB (x6)
  L3 Unified 12288 KiB (x1)
Load Average: 2.82, 1.81, 1.42
***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
-------------------------------------------------------------------
Benchmark                         Time             CPU   Iterations
-------------------------------------------------------------------
cfile_no_exist_old            11384 ns        11065 ns        63154
cfile_no_exist_new             9812 ns         9613 ns        74990
cfile_exist_relative_old       2682 ns         2452 ns       258570
cfile_exist_relative_new        926 ns          925 ns       784549
cfile_exist_absolute_old       3422 ns         3132 ns       222331
cfile_exist_absolute_new       1222 ns         1220 ns       585813

Debug:

2024-07-02T20:09:47+03:00
Running ./cfile_bench
Run on (12 X 4300 MHz CPU s)
CPU Caches:
  L1 Data 32 KiB (x6)
  L1 Instruction 32 KiB (x6)
  L2 Unified 256 KiB (x6)
  L3 Unified 12288 KiB (x1)
Load Average: 2.21, 1.81, 1.45
***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
-------------------------------------------------------------------
Benchmark                         Time             CPU   Iterations
-------------------------------------------------------------------
cfile_no_exist_old            11768 ns        11426 ns        61555
cfile_no_exist_new            13682 ns        13423 ns        53839
cfile_exist_relative_old       3147 ns         2907 ns       224424
cfile_exist_relative_new       1591 ns         1588 ns       427936
cfile_exist_absolute_old       4179 ns         3893 ns       183101
cfile_exist_absolute_new       2132 ns         2129 ns       319614

Jayman2000 · 2024-07-03T22:48:34Z

cfile/cfile.cpp

  tmode[1] = (mode[0] == 'w') ? mode[1] : 'b';
  // try to open file
-  fp = fopen(path, tmode);
+  fp = fopen(using_filename.u8string().c_str(), tmode);


How do you know that UTF-8 is the correct encoding to use for fopen()’s filename parameter?

This way how std::filesystem::path represent native encoding on systems. POSIX filesystem is native char representation, but Windows' native representation is wchar. If you want use paths with C functions, you need convert them with c_str() using u8string().

This way how std::filesystem::path represent native encoding on systems. POSIX filesystem is native char representation, but Windows' native representation is wchar. If you want use paths with C functions, you need convert them with c_str() using u8string().

I’m a little bit confused. According to cppreference.com, std::filesystem::path::u8string() returns a std::u8string. std::u8string is the same as std::basic_string<char8_t>. std::basic_string::<whatever>::c_str() returns a whatever * which means that my_u8string.c_str() should return a char8_t *, right? It seems to me like using_filename.u8string().c_str() is always going to evaluate to a char8_t *. I just don’t see how it could evaluate to a wchar_t *.

(The std::u8string type was added in C++20. In C++17, a similar argument can be made, but you have to replace std::u8string with std::string and char8_t with char.)

Passing a UTF-8 encoded string to fopen() is potentially a problem. According to Microsoft’s documentation,

The fopen function opens the file specified by filename. By default, a narrow filename string is interpreted using the ANSI codepage (CP_ACP). In Windows Desktop applications, it can be changed to the OEM codepage (CP_OEMCP) by using the SetFileApisToOEM function. You can use the AreFileApisANSI function to determine whether filename is interpreted using the ANSI or the system default OEM codepage.

Is there anything that’s going to guarantee that we’re using a UTF-8 codepage? Windows won’t use a UTF-8 codepage by default unless the user has “Beta: Use Unicode UTF-8 for worldwide language support” enabled, or the application explicitly specifies a UTF-8 codepage.

Additionally, if using_filename.u8string().c_str() sometimes does evaluate to a wchar_t *, then I think that we have a problem. According to Microsoft’s documentation, you’re supposed to pass a char * to fopen(), not a wchar_t *. If you want to use a wchar_t * for the filename parameter, then you’re supposed to call _wfopen().

We don't even interested here what encoding is actually in filesystem. We effectively receive C-string (and don't bother in what encoding is it, it's just sequence of chars) and we ensuring that C-string is actually valid path in native filesystem. We do not using UTF-8 here at all. It's just way to convert native encoding to C-string.

We don't even interested here what encoding is actually in filesystem.

If we’re going to call fopen() on Windows, then I think that we need to be interested in the filename parameter’s encoding because Windows is interested in the filename parameter’s encoding.

We effectively receive C-string (and don't bother in what encoding is it, it's just sequence of chars) and we ensuring that C-string is actually valid path in native filesystem. We do not using UTF-8 here at all.

That last sentence is very surprising to me. I thought that std::filesystem::path::u8string was supposed to return UTF-8, that std::filesystem::path::u16string was supposed to return UTF-16 and that std::filesystem::path::u32string was supposed to return UTF-32. cppreference.com says the following about those functions:

Returns the internal pathname in native pathname format, converted to specific string type. Conversion, if any, is performed as follows:

[…]

3) The result encoding in the case of u8string() is always UTF-8.

Additionally, Microsoft’s documentation for std::filesystem::path::u8string says:

Converts the sequence stored in mypath to UTF-8 and returns it stored in an object of type u8string.

What makes you say “We do not using UTF-8 here at all”?

It's just way to convert native encoding to C-string.

I agree. u8string().c_str() does indeed convert a path (which uses its native encoding internally) into a C-string. My concern is about the encoding of the resulting C-string.

Lgt2x

Really good job, tested successfully on Linux. Will also test Windows before merging. If someone can do OSX don't hesitate

Descent3/gamesave.cpp

cfile/tests/cfile_tests.cpp

Descent3/gamesave.cpp

cfile/cfile.cpp

Refactoring platform-dependent parts of code into common part, greatly reduce LOC.

Convert some of the paths to use std::fs::path.

Moved directory parameter to 3rd position and make it optional.

Implementing cf_FindRealFileNameCaseInsensitive_new() based on std::fs::directory_iterator, which runs faster and more reliable. cf_FindRealFileNameCaseInsensitive_new() works exactly as cf_FindRealFileNameCaseInsensitive(). Benchmark on Release build: ``` ------------------------------------------------------------------- Benchmark Time CPU Iterations ------------------------------------------------------------------- cfile_no_exist_old 11607 ns 11399 ns 63035 cfile_no_exist_new 9302 ns 9196 ns 70041 cfile_exist_relative_old 2701 ns 2480 ns 282695 cfile_exist_relative_new 867 ns 866 ns 798478 cfile_exist_absolute_old 3415 ns 3167 ns 228323 cfile_exist_absolute_new 1182 ns 1181 ns 574782 ```

A little speed-up for our slowpoke.

Surprisingly on macOS FS is case-insensitive.

Lgt2x · 2024-07-06T23:25:10Z

No problems in game in either Linux or Windows, we're good to merge. I hope we did not break anything for MacOS, but I don't have the hardware to test it. If UTF-8 proves to be a problem in some situations we can fix it later.

Consider this program: #include <cstdio> int main(void) { const char *filename = u8"ディセント3.txt"; auto fp = std::fopen(filename, "r"); if (fp) { std::fclose(fp); return 0; } else { return 1; }; } If a file named ディセント3.txt exists, then will that program successfully open it? The answer is: it depends. filename is going to point to these bytes: Raw bytes: e3 83 87 e3 82 a3 e3 82 bb e3 83 b3 e3 83 88 33 2e 74 78 74 00 Characters: ディセント3.txt␀ Internally, Windows uses UTF-16. When you call fopen(), Windows will convert the filename parameter into UTF-16 [1]. If the program is run with a UTF-8 Windows code page, then the above bytes will be correctly interpreted as UTF-8 when being converted into UTF-16 [2]. The final UTF-16 string will be this*: Raw bytes: ff fe c7 30 a3 30 bb 30 f3 30 c8 30 33 00 2e 00 74 00 78 00 74 00 Characters: ディセント3.txt On the other hand, if the program is run with code page 932, then the original bytes will be incorrectly interpreted as code page 932 when being converted into UTF-16. The final UTF-16 string will be this*: Raw bytes: ff fe 5d 7e fd ff 67 7e 63 ff 67 7e 7b ff 5d 7e 73 ff 5d 7e fd ff 33 00 2e 00 74 00 78 00 74 00 Characters: 繝�繧｣繧ｻ繝ｳ繝�3.txt In other words, if that program gets compiled on Windows with a UTF-8 execution character set, then it needs to be run with a UTF-8 Windows code page. Otherwise, mojibake might happen. *Unlike the first string, this one does not have a null terminator. This is because the Windows kernel doesn’t use null terminated strings for paths [3][4]. --- Before this commit, Descent 3 would pass UTF-8 to fopen(), even if Descent 3 is run with a non-UTF-8 Windows code page [5]. This commit makes sure that Descent 3 gets run with a UTF-8 Windows code page. The Windows code page isn’t just used by fopen(). It also gets used by many other functions in the Windows API [6]. I don’t know if Descent 3 uses any of those other functions, but if it does, then this commit will also help make sure that those functions receive strings with the correct character encoding. Descent 3 uses UTF-8 for strings by default [7]. Making sure that Descent 3 uses UTF-8 everywhere will make encoding-related mistakes less likely in the future. Fixes DescentDevelopers#483. [1]: <https://stackoverflow.com/a/7950569/7593853> [2]: <https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/fopen-wfopen?view=msvc-170#remarks> [3]: <https://stackoverflow.com/a/52372115/7593853> [4]: <https://googleprojectzero.blogspot.com/2016/02/the-definitive-guide-on-win32-to-nt.html> [5]: <DescentDevelopers#475 (comment)> [6]: <https://learn.microsoft.com/en-us/windows/apps/design/globalizing/use-utf8-code-page#-a-vs--w-apis> [7]: adf58ec (Explicitly declare execution character set, 2024-07-07)

winterheart force-pushed the cfile-std-filesystem branch from 5eb204f to 946affc Compare June 29, 2024 09:36

winterheart added 11 commits July 1, 2024 12:50

Convert some of cfile functions to use std::filesystem

82084e9

Make cfopen(), open_file_in_directory() and cfexist() use std::filesystem::path arguments.

CFILE: make path_entry struct using std::fs::path

33e8802

Simplify related code in `cf_SetSearchPath()`.

CFILE: refactor internal paths and extensions

0ea7580

Refactor paths and extensions to use std::map. Rewriting related code in manage.cpp.

CFILE: migrate more functions to use std::fs::path

8d14ae0

Convert cf_IsFileInHog(), cf_OpenFileInLibrary() and cf_GetfileCRC() functions to use std::fs::path.

CFILE: refactor internal paths and extensions

d1f0ddf

Refactor paths and extensions to use std::map. Rewriting related code in manage.cpp.

CFILE: convert cf_CopyFile() cf_CopyFileTime() to use std::fs::path

c47210b

Convert cf_CopyFile() cf_CopyFileTime() and underlying functions to use std::filesystem::path

CFILE: convert cf_Diff() to use std::fs::path

81555af

Convert cf_Diff() and underlying functions to use std::filesystem::path

MANAGE: convert UpdatePrimitive() to use std::fs::path

c33d99a

MANAGE: refactor UpdatePrimitive() related code

a00be9f

Use std::fs::path on path construction, simplify code.

MANAGE: convert some global path variables to std::fs::path

3af18da

Simplifies code related to UpdatePrimitive().

DDIO: Remove unused platform-dependent functions

de49da0

Remove ddio_CreateDir(), ddio_RemoveDir(), ddio_SaveWorkingDir(), ddio_RestoreWorkingDir(), ddio_DirExists(). Replace them in code with std::fs alternatives.

winterheart force-pushed the cfile-std-filesystem branch from 946affc to 3fc6dd5 Compare July 1, 2024 09:51

winterheart force-pushed the cfile-std-filesystem branch 6 times, most recently from 504dd2d to 82e6c2f Compare July 2, 2024 16:27

Lgt2x self-assigned this Jul 3, 2024

Jayman2000 reviewed Jul 3, 2024

View reviewed changes

Lgt2x reviewed Jul 4, 2024

View reviewed changes

winterheart added 6 commits July 5, 2024 01:50

CFILE: Add unittest framework

dad20b0

CFILE: Change cf_FindRealFileNameCaseInsenstive() to use std::fs::path

15ec76d

CFILE: Additional unit tests for cfile functions

c571220

CFILE: Remove duplicated code in open_file_in_directory()

6552037

Refactoring platform-dependent parts of code into common part, greatly reduce LOC.

Update cinematics API to use std::fs::path

5222bb2

Convert bm_SaveFileBitmap() and SaveVClip() to std::fs::path

41ec941

winterheart added 6 commits July 5, 2024 01:50

Fix compilation of Descent3Editor

c287a2d

Convert some of the paths to use std::fs::path.

CFILE: change signature of cf_FindRealFileNameCaseInsenstive()

f44f6a4

Moved directory parameter to 3rd position and make it optional.

Use new cf_FindRealFileNameCaseInsensitive() function in code

b3affdb

A little speed-up for our slowpoke.

Fix unittests on macOS

7a1f08e

Surprisingly on macOS FS is case-insensitive.

Use case-insensitive search for cf_OpenLibrary() on all platforms

2fce643

winterheart force-pushed the cfile-std-filesystem branch from 82e6c2f to 2fce643 Compare July 4, 2024 22:51

Lgt2x approved these changes Jul 5, 2024

View reviewed changes

bryanperris approved these changes Jul 5, 2024

View reviewed changes

Lgt2x merged commit de75a80 into DescentDevelopers:main Jul 6, 2024

winterheart mentioned this pull request Jul 8, 2024

Fix runtime error on creating multiplayer game #484

Merged

13 tasks

Lgt2x mentioned this pull request Jul 22, 2024

fix path creation in gamesave.cpp #499

Merged

winterheart deleted the cfile-std-filesystem branch August 12, 2024 20:58

Jayman2000 mentioned this pull request Sep 3, 2024

Add -additionaldir command-line option #471

Merged

13 tasks

Conversation

winterheart commented Jun 29, 2024

Pull Request Type

Description

Related Issues

Screenshots (if applicable)

Checklist

Additional Comments

Uh oh!

Lgt2x commented Jul 1, 2024

Uh oh!

winterheart commented Jul 2, 2024

Uh oh!

Jayman2000 Jul 3, 2024

Choose a reason for hiding this comment

Uh oh!

winterheart Jul 4, 2024

Choose a reason for hiding this comment

Uh oh!

Jayman2000 Jul 4, 2024

Choose a reason for hiding this comment

Uh oh!

winterheart Jul 4, 2024

Choose a reason for hiding this comment

Uh oh!

Jayman2000 Jul 5, 2024

Choose a reason for hiding this comment

Uh oh!

Lgt2x left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Lgt2x commented Jul 6, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants