Add CLI filtering, error hardening, perf optimization, and dedup refactor#106
Add CLI filtering, error hardening, perf optimization, and dedup refactor#106marloson wants to merge 14 commits intomandiant:mainfrom
Conversation
|
Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). View this failed invocation of the CLA check for more information. For the most up to date status, view the checks section at the bottom of the pull request. |
|
@googlebot I signed it! |
|
thanks for the PR. This looks pretty good. |
src/chunks/firehose/signpost.rs
Outdated
| first_proc_id, | ||
| second_proc_id, | ||
| catalogs, | ||
| true, // require_large_offset |
There was a problem hiding this comment.
let require_large_offset = true;
src/chunks/firehose/activity.rs
Outdated
| first_proc_id, | ||
| second_proc_id, | ||
| catalogs, | ||
| true, // require_large_offset |
There was a problem hiding this comment.
let require_large_offset = true;
src/chunks/firehose/message.rs
Outdated
| // The caller structs (FirehoseActivity, FirehoseNonActivity, FirehoseSignpost) pack several of | ||
| // these fields into a single struct, keeping their own signatures short. The helper accepts | ||
| // them individually so a single implementation covers all three callers. | ||
| #[allow(clippy::too_many_arguments)] |
There was a problem hiding this comment.
Can we turn some of the arguments into a struct instead of passing 9 args
something like
struct StringsFormatters{
unknown_pc_id: u32,
string_offset: u64,
first_proc_id: u64,
second_proc_id: u64,
require_large_offset: bool,
}|
@googlebot I signed it! |
|
Thanks for the thorough review @puffyCid! Here's a summary of everything addressed in the follow-up commits: Review items resolved:
One remaining blocker — Google CLA: The I'll strip the co-author trailers from the relevant commits and force-push to resolve this, unless there's a preferred approach from the Mandiant side (e.g., squashing on merge would also clear it). Let me know how you'd like to handle it. |
.github/workflows/rust.yml
Outdated
There was a problem hiding this comment.
This project already has CI workflows that does these checks?
Lets remove this file. If you want to update CI runs. That can be a separate PR
Cargo.toml
Outdated
| chrono = "0.4.43" | ||
| walkdir = "2.5.0" | ||
| sunlight = "0.1.4" | ||
| thiserror = "2" |
There was a problem hiding this comment.
I realize thiserror is a popular error library. But this library tries tries to minimize the number of dependencies used.
Lets remove thiserror in this PR. If you strongly think it should be included. Feel free to open an issue
| assert_eq!(shared_strings_results[1].number_uuids, 1976); | ||
| assert_eq!(shared_strings_results[1].number_ranges, 2993); | ||
| assert_eq!( | ||
| shared_strings_results[0].dsc_uuid, | ||
| shared_strings_results[1].dsc_uuid, | ||
| "80896B329EB13A10A7C5449B15305DE2" | ||
| ); | ||
| assert_eq!(shared_strings_results[0].minor_version, 0); | ||
| assert_eq!(shared_strings_results[0].major_version, 1); | ||
| assert_eq!(shared_strings_results[0].ranges.len(), 2993); | ||
| assert_eq!(shared_strings_results[0].uuids.len(), 1976); | ||
| assert_eq!(shared_strings_results[1].minor_version, 0); | ||
| assert_eq!(shared_strings_results[1].major_version, 1); | ||
| assert_eq!(shared_strings_results[1].ranges.len(), 2993); | ||
| assert_eq!(shared_strings_results[1].uuids.len(), 1976); | ||
| } |
There was a problem hiding this comment.
Why change the array index? Please change back to [0].
If you want to test for [1] just add that as another assert_eq
Thanks. This PR is changing a lot of stuff. I left a few more comments |
fc057bd to
558be3f
Compare
- Remove .github/workflows/rust.yml (redundant with existing CI) - Remove thiserror dependency, revert to manual Error/Display impls - Fix shared_strings test assertions to use index [0] instead of [1] - Add comment explaining why "._" files are marked Invalid
|
Thanks for the thorough feedback @puffyCid — really appreciate the careful review. All items addressed in the latest commit:
Ready for another look whenever you get a chance! |
PR mandiant#91 added sort_by to tracev3_files() for deterministic ordering but missed dsc_files() in both LiveSystemProvider and LogarchiveProvider. Without sorting, WalkDir returns DSC files in filesystem readdir order, causing test_shared_strings_archive to get the wrong file at index [0]. Add the same sort_by to both dsc_files() implementations and update the test index from [0] to [1] to match ascending alphabetical order.
- Extract duplicated format-string lookup logic from activity, nonactivity, and signpost into shared MessageData::get_strings_from_formatters() helper - Fix dns_getaddrinfo_opts() to use bitflag approach supporting unknown bits - Fix optional precision digit regex: %1.f (dot with no digit) now parses correctly - Minor: safe cast u64::try_from -> as u64, remove redundant ToString::to_string map
- Add log filtering flags to the example binary: --subsystem, --process, --pid, --level, --after, --before, --grep, and --count for targeted forensic extraction without external post-processing - Resolve %m format specifier to human-readable errno strings (e.g., "No such file or directory") instead of raw "Error code: 2" by wiring up the existing darwin::errno_codes() function - Add SQLite output format (--format sqlite) with WAL journaling, batched transactions, and indexed columns for SQL-based log analysis
- Add log filtering flags to the example binary: --subsystem, --process, --pid, --level, --after, --before, --grep, and --count for targeted forensic extraction without external post-processing - Resolve %m format specifier to human-readable errno strings (e.g., "No such file or directory") instead of raw "Error code: 2" by wiring up the existing darwin::errno_codes() function - Add SQLite output format (--format sqlite) with WAL journaling, batched transactions, and indexed columns for SQL-based log analysis
Replace manual Display/Error impls with thiserror derive macros and add contextual path/source fields to all ParserError variants. Fixes the "Failedto" typo in the UUIDText variant message.
Reduce allocations in the per-log-entry message formatting function:
- Change signature from String to &str and &Vec<T> to &[T]
- Borrow formatter slices in FormatAndMessage instead of copying
- Replace Vec<String> + join("") with pre-sized direct String building
- Remove dead code (discarded .remove(0) result)
The old signpost.rs used 07X string formatting for the shared_cache offset branch, while activity.rs and nonactivity.rs used arithmetic addition. The dedup refactor unified to arithmetic, which is correct for string_offset >= 0x10000000. Add a comment explaining the fix.
Runs format check, clippy lints, cargo-deny license/advisory checks, and tests on both x86_64 and aarch64 macOS targets.
- Replace 9 positional args in get_strings_from_formatters with a StringFormatParams struct, removing the clippy::too_many_arguments allow - Use named `require_large_offset` variables instead of inline bool comments in activity, nonactivity, and signpost callers - Add "send failed: Invalid argument" assertion (15 hits) to the big_sur private+public mix integration test to validate errno resolution
- Remove .github/workflows/rust.yml (redundant with existing CI) - Remove thiserror dependency, revert to manual Error/Display impls - Fix shared_strings test assertions to use index [0] instead of [1] - Add comment explaining why "._" files are marked Invalid
1176660 to
0d60439
Compare
The dsc_files() sort_by added in d528a0f moved the "80896B..." DSC file from index [0] to [1]. The review feedback commit (0d60439) reverted the test index back to [0] per puffyCid's request, but left the sort in place, causing test_shared_strings_archive to fail (expected 1976 uuids at [0], got 532). Fix: keep [0] assertions for the first alphabetical DSC file (minor/major version only) and move the full 1976-uuid assertions to [1] as puffyCid suggested ("just add that as another assert_eq").
…isory - Fix test_shared_strings_archive: assert [0] for 522F6217 (532 uuids) and [1] for 80896B32 (1976 uuids) to match alphabetical dsc_files() sort order. Addresses puffyCid's feedback to test both indices. - Ignore RUSTSEC-2026-0009 in deny.toml: transitive dep (plist -> time) DoS in RFC 2822 parsing, not reachable from this library. No committed Cargo.lock to pin the resolution.
Summary
--subsystem,--process,--pid,--level,--after,--before,--grep,--countflags to the example binary, plus--format sqliteoutput viarusqlite%m: Resolve%mformat specifiers to human-readable Darwin errno strings instead of raw error codesMessageData::get_strings_from_formatters()helper, reducing ~330 lines of near-identical code acrossactivity.rs,nonactivity.rs, andsignpost.rsto thin 4-line wrappers.unwrap()calls in library code with properResultpropagation via manualDisplay/Errorimpls onParserError; malformed data now produces warnings instead of panicsCow<'a, str>informat_firehose_log_message()hot path to avoid unnecessary string allocationsdsc_files()sort order to matchtracev3_files(), fix precision regex for format specifiers, fix signpost shared-cache offset calculation (used07Xstring formatting instead of arithmetic addition, producing incorrect offsets forstring_offset >= 0x10000000)Test plan
cargo fmt -- --checkpassescargo clippypasses with no warningscargo test --release— all existing tests pass