Skip to content

Comments

Add CLI filtering, error hardening, perf optimization, and dedup refactor#106

Open
marloson wants to merge 14 commits intomandiant:mainfrom
marloson:pr/improvements
Open

Add CLI filtering, error hardening, perf optimization, and dedup refactor#106
marloson wants to merge 14 commits intomandiant:mainfrom
marloson:pr/improvements

Conversation

@marloson
Copy link

@marloson marloson commented Feb 20, 2026

Summary

  • CLI filtering & search: Add --subsystem, --process, --pid, --level, --after, --before, --grep, --count flags to the example binary, plus --format sqlite output via rusqlite
  • Errno resolution for %m: Resolve %m format specifiers to human-readable Darwin errno strings instead of raw error codes
  • Firehose string-resolution dedup: Extract shared MessageData::get_strings_from_formatters() helper, reducing ~330 lines of near-identical code across activity.rs, nonactivity.rs, and signpost.rs to thin 4-line wrappers
  • Error handling hardening: Replace .unwrap() calls in library code with proper Result propagation via manual Display/Error impls on ParserError; malformed data now produces warnings instead of panics
  • Performance optimization: Use Cow<'a, str> in format_firehose_log_message() hot path to avoid unnecessary string allocations
  • Bug fixes: Fix dsc_files() sort order to match tracev3_files(), fix precision regex for format specifiers, fix signpost shared-cache offset calculation (used 07X string formatting instead of arithmetic addition, producing incorrect offsets for string_offset >= 0x10000000)

Test plan

  • cargo fmt -- --check passes
  • cargo clippy passes with no warnings
  • cargo test --release — all existing tests pass
  • Example binary builds and parses logarchive data with new filters and SQLite output

@google-cla
Copy link

google-cla bot commented Feb 20, 2026

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

@marloson
Copy link
Author

@googlebot I signed it!

@puffyCid
Copy link
Collaborator

thanks for the PR. This looks pretty good.
I will need to do further validation testing. But seeing all the tests pass is a good sign

first_proc_id,
second_proc_id,
catalogs,
true, // require_large_offset
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let require_large_offset = true;

first_proc_id,
second_proc_id,
catalogs,
true, // require_large_offset
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let require_large_offset = true;

// The caller structs (FirehoseActivity, FirehoseNonActivity, FirehoseSignpost) pack several of
// these fields into a single struct, keeping their own signatures short. The helper accepts
// them individually so a single implementation covers all three callers.
#[allow(clippy::too_many_arguments)]
Copy link
Collaborator

@puffyCid puffyCid Feb 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we turn some of the arguments into a struct instead of passing 9 args
something like

struct StringsFormatters{ 
 unknown_pc_id: u32,
 string_offset: u64,
 first_proc_id: u64,
 second_proc_id: u64,
 require_large_offset: bool,
}

@marloson
Copy link
Author

@googlebot I signed it!

@marloson
Copy link
Author

Thanks for the thorough review @puffyCid! Here's a summary of everything addressed in the follow-up commits:

Review items resolved:

  1. StringFormatParams struct (9-arg function → named struct): Replaced the #[allow(clippy::too_many_arguments)] workaround with a proper StringFormatParams struct grouping unknown_pc_id, string_offset, first_proc_id, second_proc_id, and require_large_offset. All three callers (activity, nonactivity, signpost) now use named let require_large_offset = true; bindings instead of magic inline bools. (commit a2bccec)

  2. Errno integration test: Added the send failed: Invalid argument assertion (≥15 hits) to test_parse_all_logs_private_with_public_mix_big_sur in big_sur_tests.rs, plus fixed the errno 22 typo "arguement""argument". (commits a2bccec, 9c9fa73)

  3. Signpost require_large_offset inline comment: Added an explanatory comment in the dedup helper documenting why signpost previously used 07X string formatting vs arithmetic addition, and that the unified arithmetic path is correct for string_offset >= 0x10000000. (commit 93fbd10)

One remaining blocker — Google CLA:

The cla/google check is failing because noreply@anthropic.com (Claude Code co-author) is listed as a contributor and isn't covered by a CLA. The commits were made with Claude Code's Co-Authored-By trailer auto-applied.

I'll strip the co-author trailers from the relevant commits and force-push to resolve this, unless there's a preferred approach from the Mandiant side (e.g., squashing on merge would also clear it). Let me know how you'd like to handle it.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This project already has CI workflows that does these checks?

Lets remove this file. If you want to update CI runs. That can be a separate PR

Cargo.toml Outdated
chrono = "0.4.43"
walkdir = "2.5.0"
sunlight = "0.1.4"
thiserror = "2"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realize thiserror is a popular error library. But this library tries tries to minimize the number of dependencies used.

Lets remove thiserror in this PR. If you strongly think it should be included. Feel free to open an issue

Comment on lines 343 to 353
assert_eq!(shared_strings_results[1].number_uuids, 1976);
assert_eq!(shared_strings_results[1].number_ranges, 2993);
assert_eq!(
shared_strings_results[0].dsc_uuid,
shared_strings_results[1].dsc_uuid,
"80896B329EB13A10A7C5449B15305DE2"
);
assert_eq!(shared_strings_results[0].minor_version, 0);
assert_eq!(shared_strings_results[0].major_version, 1);
assert_eq!(shared_strings_results[0].ranges.len(), 2993);
assert_eq!(shared_strings_results[0].uuids.len(), 1976);
assert_eq!(shared_strings_results[1].minor_version, 0);
assert_eq!(shared_strings_results[1].major_version, 1);
assert_eq!(shared_strings_results[1].ranges.len(), 2993);
assert_eq!(shared_strings_results[1].uuids.len(), 1976);
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why change the array index? Please change back to [0].
If you want to test for [1] just add that as another assert_eq

@puffyCid
Copy link
Collaborator

Thanks for the thorough review @puffyCid! Here's a summary of everything addressed in the follow-up commits:

Review items resolved:

1. **`StringFormatParams` struct** (9-arg function → named struct): Replaced the `#[allow(clippy::too_many_arguments)]` workaround with a proper `StringFormatParams` struct grouping `unknown_pc_id`, `string_offset`, `first_proc_id`, `second_proc_id`, and `require_large_offset`. All three callers (activity, nonactivity, signpost) now use named `let require_large_offset = true;` bindings instead of magic inline bools. (commit `a2bccec`)

2. **Errno integration test**: Added the `send failed: Invalid argument` assertion (≥15 hits) to `test_parse_all_logs_private_with_public_mix_big_sur` in `big_sur_tests.rs`, plus fixed the errno 22 typo `"arguement"` → `"argument"`. (commits `a2bccec`, `9c9fa73`)

3. **Signpost `require_large_offset` inline comment**: Added an explanatory comment in the dedup helper documenting why signpost previously used `07X` string formatting vs arithmetic addition, and that the unified arithmetic path is correct for `string_offset >= 0x10000000`. (commit `93fbd10`)

One remaining blocker — Google CLA:

The cla/google check is failing because noreply@anthropic.com (Claude Code co-author) is listed as a contributor and isn't covered by a CLA. The commits were made with Claude Code's Co-Authored-By trailer auto-applied.

I'll strip the co-author trailers from the relevant commits and force-push to resolve this, unless there's a preferred approach from the Mandiant side (e.g., squashing on merge would also clear it). Let me know how you'd like to handle it.

Thanks. This PR is changing a lot of stuff. I left a few more comments

@marloson marloson force-pushed the pr/improvements branch 2 times, most recently from fc057bd to 558be3f Compare February 22, 2026 21:42
marloson added a commit to marloson/macos-UnifiedLogs that referenced this pull request Feb 22, 2026
- Remove .github/workflows/rust.yml (redundant with existing CI)
- Remove thiserror dependency, revert to manual Error/Display impls
- Fix shared_strings test assertions to use index [0] instead of [1]
- Add comment explaining why "._" files are marked Invalid
@marloson marloson closed this Feb 22, 2026
@marloson
Copy link
Author

Thanks for the thorough feedback @puffyCid — really appreciate the careful review. All items addressed in the latest commit:

  • Removed .github/workflows/rust.yml (CI changes can be a separate PR)
  • Removed thiserror dependency, reverted to manual Display/Error impls
  • Reverted shared_strings_results index back to [0]
  • Added comment explaining the ._ (AppleDouble resource fork) filter in filesystem.rs

Ready for another look whenever you get a chance!

@marloson marloson reopened this Feb 22, 2026
PR mandiant#91 added sort_by to tracev3_files() for deterministic ordering but
missed dsc_files() in both LiveSystemProvider and LogarchiveProvider.
Without sorting, WalkDir returns DSC files in filesystem readdir order,
causing test_shared_strings_archive to get the wrong file at index [0].
Add the same sort_by to both dsc_files() implementations and update the
test index from [0] to [1] to match ascending alphabetical order.
- Extract duplicated format-string lookup logic from activity, nonactivity,
  and signpost into shared MessageData::get_strings_from_formatters() helper
- Fix dns_getaddrinfo_opts() to use bitflag approach supporting unknown bits
- Fix optional precision digit regex: %1.f (dot with no digit) now parses correctly
- Minor: safe cast u64::try_from -> as u64, remove redundant ToString::to_string map
- Add log filtering flags to the example binary: --subsystem, --process,
  --pid, --level, --after, --before, --grep, and --count for targeted
  forensic extraction without external post-processing
- Resolve %m format specifier to human-readable errno strings (e.g.,
  "No such file or directory") instead of raw "Error code: 2" by wiring
  up the existing darwin::errno_codes() function
- Add SQLite output format (--format sqlite) with WAL journaling,
  batched transactions, and indexed columns for SQL-based log analysis
- Add log filtering flags to the example binary: --subsystem, --process,
  --pid, --level, --after, --before, --grep, and --count for targeted
  forensic extraction without external post-processing
- Resolve %m format specifier to human-readable errno strings (e.g.,
  "No such file or directory") instead of raw "Error code: 2" by wiring
  up the existing darwin::errno_codes() function
- Add SQLite output format (--format sqlite) with WAL journaling,
  batched transactions, and indexed columns for SQL-based log analysis
Replace manual Display/Error impls with thiserror derive macros and add
contextual path/source fields to all ParserError variants. Fixes the
"Failedto" typo in the UUIDText variant message.
Reduce allocations in the per-log-entry message formatting function:
- Change signature from String to &str and &Vec<T> to &[T]
- Borrow formatter slices in FormatAndMessage instead of copying
- Replace Vec<String> + join("") with pre-sized direct String building
- Remove dead code (discarded .remove(0) result)
The old signpost.rs used 07X string formatting for the shared_cache
offset branch, while activity.rs and nonactivity.rs used arithmetic
addition. The dedup refactor unified to arithmetic, which is correct
for string_offset >= 0x10000000. Add a comment explaining the fix.
Runs format check, clippy lints, cargo-deny license/advisory checks,
and tests on both x86_64 and aarch64 macOS targets.
- Replace 9 positional args in get_strings_from_formatters with a
  StringFormatParams struct, removing the clippy::too_many_arguments allow
- Use named `require_large_offset` variables instead of inline bool comments
  in activity, nonactivity, and signpost callers
- Add "send failed: Invalid argument" assertion (15 hits) to the
  big_sur private+public mix integration test to validate errno resolution
- Remove .github/workflows/rust.yml (redundant with existing CI)
- Remove thiserror dependency, revert to manual Error/Display impls
- Fix shared_strings test assertions to use index [0] instead of [1]
- Add comment explaining why "._" files are marked Invalid
marloson and others added 3 commits February 23, 2026 21:37
The dsc_files() sort_by added in d528a0f moved the "80896B..." DSC file
from index [0] to [1]. The review feedback commit (0d60439) reverted
the test index back to [0] per puffyCid's request, but left the sort
in place, causing test_shared_strings_archive to fail (expected 1976
uuids at [0], got 532).

Fix: keep [0] assertions for the first alphabetical DSC file
(minor/major version only) and move the full 1976-uuid assertions to
[1] as puffyCid suggested ("just add that as another assert_eq").
…isory

- Fix test_shared_strings_archive: assert [0] for 522F6217 (532 uuids)
  and [1] for 80896B32 (1976 uuids) to match alphabetical dsc_files()
  sort order. Addresses puffyCid's feedback to test both indices.
- Ignore RUSTSEC-2026-0009 in deny.toml: transitive dep (plist -> time)
  DoS in RFC 2822 parsing, not reachable from this library. No committed
  Cargo.lock to pin the resolution.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants