Rosetta

Rosetta is a Python library for generating realistic security telemetry and alerts at scale. It can:

Generate observables/indicators (IPs, URLs, hashes, CVEs, MITRE ATT&CK techniques)
Emit synthetic logs in multiple formats (SYSLOG, CEF, LEEF, JSON, Windows Event XML)
Produce incident bundles composed of multiple event types
Convert one log format to another (e.g., CEF to JSON/LEEF)
Send synthetic logs to TCP/UDP/HTTP/HTTPS endpoints
Validate fields against a schema and generate missing values heuristically
Simulate database queries including SQL injection patterns
Generate Kubernetes and cloud-native telemetry

Installation

Install from PyPI:

pip install rosetta-ce

Install from source:

git clone https://github.com/ayman-m/rosetta.git
cd rosetta
python setup.py install

Quick start

from rosetta import Events, Observables, ObservableType, ObservableKnown

# Generate observables
bad_ips = Observables.generator(count=3, observable_type=ObservableType.IP, known=ObservableKnown.BAD)

# Inject custom observables and extra fields
observables = Observables(
    src_host=["web-01"],
    user=["alex"],
    url=["https://example.org"],
    custom_field=["custom_value"],
)

# Create events in different formats
syslog_events = Events.syslog(count=2, observables=observables)
cef_events = Events.cef(count=2, observables=observables)
leef_events = Events.leef(count=2, observables=observables)
json_events = Events.json(count=2, observables=observables)
win_events = Events.winevent(count=2, observables=observables)

Observables

Observable types

Type	Description	Known Values
`IP`	IPv4 addresses	BAD (malicious), GOOD (benign)
`URL`	Web URLs	BAD (malicious), GOOD (benign)
`SHA256`	File hashes	BAD (malicious), GOOD (benign)
`CVE`	CVE identifiers	N/A
`TERMS`	MITRE ATT&CK techniques (280+ IDs)	N/A

Fetch or generate indicators

from rosetta import Observables, ObservableType, ObservableKnown

bad_urls = Observables.generator(count=2, observable_type=ObservableType.URL, known=ObservableKnown.BAD)
good_hashes = Observables.generator(count=2, observable_type=ObservableType.SHA256, known=ObservableKnown.GOOD)
cves = Observables.generator(count=2, observable_type=ObservableType.CVE)
terms = Observables.generator(count=2, observable_type=ObservableType.TERMS)

Provide your own observables

Observables accepts known fields and arbitrary extra fields via **kwargs.

from rosetta import Observables

observables = Observables(
    local_ip=["192.168.10.10"],
    remote_ip=["1.1.1.1"],
    src_host=["abc"],
    dst_host=["xyz"],
    user=["ayman"],
    file_name=["test.zip"],
    custom_field=["custom_value"],
)

Built-in observable fields

Category	Fields
Network (IPv4/IPv6)	`local_ip`, `remote_ip`, `local_ip_v6`, `remote_ip_v6`, `local_port`, `remote_port`, `protocol`
Hosts & Domains	`src_host`, `dst_host`, `src_domain`, `dst_domain`, `url`
Users & Email	`user`, `sender_email`, `recipient_email`, `email_subject`, `email_body`
Files	`file_name`, `file_hash`
Processes	`win_process`, `win_child_process`, `unix_process`, `unix_child_process`, `win_cmd`, `unix_cmd`
Security	`severity`, `action`, `event_id`, `error_code`, `technique`, `cve`, `terms`
Alerts & Incidents	`alert_types`, `alert_name`, `incident_types`, `analysts`, `action_status`
Database	`query_type`, `database_name`, `query`
Other	`app`, `os`, `sensor`, `entry_type`, `inbound_bytes`, `outbound_bytes`

Events

Rosetta supports generating events in multiple industry-standard log formats:

Format	Description	Use Case
SYSLOG	RFC 5424 syslog format	Unix/Linux system logs, network devices
CEF	Common Event Format	SIEM integration (ArcSight, Splunk)
LEEF	Log Event Extended Format	IBM QRadar integration
JSON	Structured JSON format	Modern SIEM, Elasticsearch, cloud platforms
Windows Event XML	Windows Event Log format	Windows security monitoring, Sysmon
Incidents	Bundled multi-format events	Incident response testing, SOC training

SYSLOG

from rosetta import Events

Events.syslog(count=1)
Events.syslog(count=1, observables=observables)

CEF

Events.cef(count=1, observables=observables)
Events.cef(count=1, observables=observables, required_fields="local_ip,local_port,remote_ip,remote_port,protocol,rule_id,action")

LEEF

Events.leef(count=1, observables=observables)

Windows Event Log (XML)

Events.winevent(count=1, observables=observables)

JSON

Events.json(count=1, observables=observables)

Incidents (bundled events)

Events.incidents(count=1, fields="id,type,duration,analyst,description,events", observables=observables)

Supported incident types

Rosetta includes 11 predefined incident categories:

Malware
Phishing
Access Violation
Lateral Movement
Port Scan
SQL Injection
Brute Force
Control Avoidance
Rogue Device
Denial of Service
Account Compromised

Required fields and presets

Rosetta can require specific fields per event. You can pass required_fields directly, or rely on presets.

Preset file: rosetta/schema/required_presets.json
Keys: syslog, cef, leef, json, winevent

# Explicit override
Events.syslog(count=1, required_fields="timestamp,hostname,username")

# Use presets (default behavior)
Events.syslog(count=1)

If the preset file is missing or empty, Rosetta falls back to built-in defaults.

Schema validation

Rosetta checks required fields and observables against a supported-fields list and emits warnings for unknown fields.

Schema file: rosetta/schema/supported_fields.json
Behavior: non-blocking warnings only

from rosetta import Events, Observables

Events.syslog(count=1, observables=Observables(), required_fields="unknown_field")
# Warning: Field 'unknown_field' is not in schema/supported_fields.json

Supported schema fields (1000+ fields)

Click to expand full field list by category

Identity & Authentication

username, user, user_id, user_sid, user_dn, user_ou, user_type, user_role, user_group, actor_username, actor_sid, actor_id, actor_uid, actor_arn, actor_ip, target_username, target_user_sid, target_user_id, target_uid, admin_username, admin_ip, analyst_username, creator_username, creator_ip, display_name, full_name, email, department, title, manager

Authentication & Sessions

authentication_method, authentication_result, authentication_package, authentication_status, authorization_status, session_id, session_type, session_start, session_end, session_duration, session_timeout, token_id, token_expiry, token_elevation_type, mfa_method, mfa_result, logon_type, logon_process, logon_guid, logon_id, logon_time, logoff_time, login_type, login_time, last_login, last_logon, last_password_change

Network & Connectivity

client_ip, client_port, client_hostname, client_mac, server_ip, server_port, server_hostname, source_ip, source_port, source_mac, source_hostname, destination_ip, destination_port, destination_mac, destination_hostname, local_ip, local_port, remote_ip, remote_port, remote_host, assigned_ip, public_ip, private_ip, nat_source_ip, nat_destination_ip, scanner_ip, target_ip, target_port, target_hostname

DNS & DHCP

dns_server, dns_servers, dns_query, dns_response, dns_flags, dns_name, dnssec_validated, query_name, query_class, query_time_ms, query_count, response_data, response_ip, response_count, response_ttl, authoritative, recursion_desired, recursion_available, dhcp_*, lease_duration, lease_start, lease_expiry, lease_state, scope_name, scope_id

HTTP & Web

http_method, http_uri, http_host, http_status_code, http_protocol, http_referer, http_user_agent, http_query_string, request_id, request_size, request_body_sample, request_headers, response_code, response_size, response_time_ms, response_body_sample, response_headers, content_type, content_length, user_agent, referer, cookie, cookies, url, url_category, url_categories

API Gateway

gateway_name, api_key, api_name, api_endpoint, api_operation, api_version, api_parameters, api_call, oauth_client_id, oauth_scope, rate_limit_policy, rate_limit_remaining, quota_policy, quota_remaining, backend_server, backend_response_time_ms, backend_status_code, cache_status, cache_hit

Files & Storage

file_name, file_path, file_type, file_size, file_hash, file_hash_md5, file_hash_sha1, file_hash_sha256, file_hash_imphash, file_owner, file_group, file_permissions, file_attributes, file_version, original_filename, creation_time, modification_time, deletion_time, access_time, old_hash, new_hash, old_size, new_size, old_permissions, new_permissions

Processes & Execution

process_id, process_name, process_guid, parent_process_name, parent_process_guid, parent_command_line, parent_image, pid, ppid, executable_path, command_line, command, arguments, args, working_directory, cwd, image, image_path, image_loaded, start_time, stop_time, exit_code, cpu_time, thread_count, handle_count

Windows Events

event_id, event_type, event_record_id, event_category, logon_id, linked_logon_id, virtual_account, elevated_token, mandatory_label, integrity_level, terminal_session_id, current_directory, source_pid, source_process_name, source_image, source_user, target_pid, target_process_name, target_image, granted_access, call_trace

Registry

registry_key, registry_value_name, registry_value_type, registry_value_data, old_value_type, old_value_data, new_value_type, new_value_data, target_object, details, new_name

Services & Scheduled Tasks

service_name, service_type, service_state, service_path, service_file_name, service_start_type, service_unit, service_account, task_name, task_content, task_id, task_status, task_result, trigger_type, trigger_value, run_level, enabled, schedule, last_run_time, next_run_time

Modules & Drivers

module_name, module_path, module_base_address, module_size, module_version, module_parameters, module_hash, driver_name, signature_status, signature_level, signed, signed_by, signer, load_reason, load_result, load_address, load_time, is_kernel_mode

PowerShell & Scripts

script_block_text, script_path, script_content, script_hash, script_content_hash, script_block_id, script_engine, host_application, engine_version, runspace_id, pipeline_id, interpreter, obfuscation_score

Containers & Kubernetes

container_id, container_name, container_image, namespace, pod_name, pod_uid, node_name, cluster, labels, annotations, resource_limits, security_context, service_account, restart_count, exit_code_previous, environment_variables, cgroup, namespace_pid, capabilities

Cloud & Infrastructure

cloud_provider, region, instance_id, instance_name, instance_type, ami_id, vpc_id, subnet_id, security_groups, iam_role, resource_type, resource_id, resource_name, resource_arn, bucket_name, bucket_arn, volume_id, volume_name, volume_type, volume_size, snapshot_id, snapshot_name, tags

Virtual Machines

hypervisor_type, vm_id, vm_name, vm_uuid, cpu_usage, memory_usage, cpu_count, memory_mb, disk_size_gb, network_adapters, template_name, resource_pool, datastore, target_vm, target_host, boot_time_ms, uptime_seconds, previous_state

Database

database_name, database_role, query_type, query_text, query, command_type, command_text, object_name, schema_name, execution_status, execution_time_ms, affected_rows, transaction_id, privilege, error_code, error_message

Email & Messaging

sender, recipient, sender_email, recipient_email, sender_domain, recipient_domain, subject, message_id, message_size, message_count, attachment_name, attachment_type, attachment_size, attachment_hash, attachment_count, attachment_names, attachment_types, attachment_hashes, spam_score, phishing_score, spf_result, dkim_result, dmarc_result

Firewall & Network Security

firewall_name, rule_id, rule_name, rule_type, rule_number, rule_action, acl_name, acl_type, action, action_taken, zone_source, zone_destination, interface_in, interface_out, input_interface, output_interface, source_network, destination_network, port_range, tcp_flags, packets, bytes, bytes_sent, bytes_received

IDS/IPS & Threat Detection

signature_id, signature_name, signature_category, attack_type, attack_vector, attack_category, attack_severity, threat_type, threat_name, threat_category, threat_score, threat_level, threat_severity, threat_indicator, threat_detected, detection_name, detection_type, mitre_tactic, mitre_technique, cve_id, cvss_score, cvss_vector

Endpoint Detection

agent_id, agent_version, scan_id, scan_type, scan_result, scan_status, scan_start, scan_end, scan_duration, finding_id, vulnerability_id, vulnerability_name, vulnerability_description, remediation, quarantine_id, quarantine_status, quarantine_path, quarantined, blocked

SIEM & Incident Response

incident_id, incident_name, incident_type, incident_severity, incident_status, alert_id, alert_type, alert_name, playbook_id, playbook_name, analyst_notes, confidence, risk_score, risk_level, severity, priority

SSL/TLS

ssl_protocol, ssl_version, ssl_cipher, ssl_subject, ssl_issuer, ssl_client_cert_cn, ssl_ja3_hash, ssl_ja3s_hash, tls_version, tls_cipher, cipher_suite, certificate_cn, certificate_serial, certificate_issuer, certificate_subject, certificate_validity_start, certificate_validity_end, certificate_chain_valid, certificate_revocation_status, ja3_hash, ja3s_hash

VPN & Remote Access

vpn_group, tunnel_type, tunnel_id, encryption_algorithm, idle_timeout, session_timeout, bytes_quota, client_version

Wireless

ssid, ap_name, ap_mac, bssid, eap_type, vlan_assigned, radio_type, channel, rssi, snr, roam_count, association_time, data_rate, power_save_mode

Network Access Control

identity_group, policy_matched, nas_ip, nas_port, calling_station_id, called_station_id, radius_attributes, switch_ip, switch_port, vlan_id, vlan_name, posture_status, endpoint_policy

Data Loss Prevention

data_classification, sensitive_data_flag, sensitive_data_types, sensitive_data_detected, sensitive_data_added, sensitive_data_removed, pattern_matched, bytes_inspected, dlp_verdict, dlp_violation, dlp_scan_result, masked_fields, channel_type

Vulnerability Management

scanner_ip, target_os, target_os_version, service_detected, service_version, banner, vulnerability_checks, vulnerabilities_found, vulnerabilities_critical, vulnerabilities_high, vulnerabilities_medium, vulnerabilities_low, vulnerabilities_info, compliance_score, exploit_available, patch_available, first_detected, last_detected

Mobile Device Management

device_type, device_id, device_name, enrollment_status, enrollment_method, enrollment_time, serial_number, imei, jailbreak_status, passcode_compliant, installed_apps_count, managed_apps_count, certificates_installed, profiles_installed

Privileged Access Management

vault_name, checkout_id, checkout_reason, checkout_time, checkin_time, session_duration_limit, recording_enabled, recording_id, target_account, target_account_type, target_system, credential_type, credential_name

Application Logs

application, application_name, application_version, environment, log_level, logger_name, message, exception_type, exception_message, stack_trace, thread_name, thread_id, span_id, trace_id, custom_fields

Audit & Compliance

audit_id, operation, operation_type, modification_type, change_type, change_description, change_reason, old_value, new_value, justification, approval_id, approval_status, approver, workflow_id, compliance_status, policy_name, policy_violation

Metrics & Performance

metric_name, metric_value, threshold, cpu_usage, memory_usage, disk_usage, throughput_bps, connection_count, response_time_ms, execution_time_ms, duration, jitter_ms, offset_ms, latency

Heuristic value generation

When a field has no explicit value, Rosetta infers a reasonable value based on name patterns. This makes large schemas usable without hardcoding every field.

Supported field patterns

Category	Patterns
Network	`_ip`, `_ipv6`, `_port`, `_mac`, `_domain`, `_hostname`, `*_url`
Identity	`_email`, `_user`, `_sid`, `_arn`
Identifiers	`_id`, `_uuid`, `*_guid`
Hashing	`_hash`, `_md5`, `_sha1`, `_sha256`
Status	`_status`, `_result`, `_outcome`, `_verdict`, `*_action`
Metrics	`_size`, `_bytes`, `_count`, `_duration`, `_ms`, `_score`, `*_percent`
Time	`_time`, `_timestamp`, `*_date`
HTTP/API	`http_`, `request_`, `response_`, `api_`
DNS/DHCP	`dns_`, `dhcp_`
Authentication	`auth_`, `mfa_`, `token_`, `session_`, `role_`, `permission_`
Kubernetes	`namespace`, `pod_`, `container_`, `node_*`, `cluster`, `labels`, `annotations`, `service_account`
Threats	`vulnerability_`, `cve`, `cvss_`, `threat_`, `mitre_`, `ioc_*`
Email/SMTP	`sender_`, `recipient_`, `smtp_`, `dkim_`, `spf_`, `dmarc_`
Boolean	`is_`, `_enabled`, `*_flag`

Sender

Send synthetic events to TCP/UDP/HTTP/HTTPS endpoints using multi-threaded workers.

Supported data types

SYSLOG
CEF
LEEF
WINEVENT
JSON
INCIDENT

Destination formats

UDP: udp:127.0.0.1:514
TCP: tcp:127.0.0.1:514
HTTP: http://127.0.0.1:8000/endpoint
HTTPS: https://127.0.0.1:8000/endpoint

Example

from rosetta import Sender, WorkerTypeEnum

# UDP syslog
udp_worker = Sender(
    data_type=WorkerTypeEnum.SYSLOG,
    destination="udp:127.0.0.1:514",
    observables=observables,
    count=5,
    interval=2
)
udp_worker.start()

# HTTP JSON
http_worker = Sender(
    data_type=WorkerTypeEnum.JSON,
    destination="http://127.0.0.1:8000/logs",
    observables=observables,
    count=5,
    interval=2
)
http_worker.start()

Converter

from rosetta import Converter, ConverterToEnum, ConverterFromEnum

cef_log = "CEF:0|Security|IDS|1.0|Alert|10|src=192.168.0.1 dst=192.168.0.2 act=blocked"
converted = Converter.convert(from_type=ConverterFromEnum.CEF, to_type=ConverterToEnum.JSON, data=cef_log)

Testing

python3 -m unittest discover -s tests

Database telemetry

Rosetta can generate realistic database activity logs including normal operations and attack patterns.

Supported query types

SELECT, INSERT, UPDATE, DELETE, ALTER, CREATE, DROP, TRUNCATE, GRANT, REVOKE, MERGE, CALL

Attack patterns included

SQL injection queries
Unauthorized data manipulation
Privilege escalation attempts

OWASP Top 10 attack simulation

Rosetta includes built-in OWASP Top 10 attack technique indicators:

Injection (SQL, Command)
Broken Authentication and Session Management
Cross-Site Scripting (XSS)
Broken Access Control
Security Misconfiguration
Insecure Cryptographic Storage
Insufficient Transport Layer Protection
Unvalidated Redirects and Forwards
Using Components with Known Vulnerabilities
Insufficient Logging and Monitoring

Network protocols

Supported protocols for telemetry generation: TCP, UDP, HTTP, SSL, SQL, SSH, FTP, RTP, RDP

Windows telemetry

Rosetta generates realistic Windows endpoint data including:

18 common Windows processes (explorer.exe, svchost.exe, lsass.exe, etc.)
PowerShell commands for attack simulation
Windows Event Log XML templates (Sysmon, Security events)

Examples

See the examples/ directory for complete usage examples:

observables.py - Generate indicators
events_formats.py - Create events in different formats
incidents.py - Build incident bundles
sender_tcp_udp_http.py - Send events to endpoints
converter.py - Convert between formats
k8s_fields.py - Kubernetes field generation
presets_schema.py - Schema validation

Notes

Some observable generators fetch from public sources. When offline, Rosetta falls back to synthetic values.
Preset and schema files are generated from the CSV mapping in the project root and can be updated as your schema evolves.

Name		Name	Last commit message	Last commit date
Latest commit History 72 Commits
.claude		.claude
.vscode		.vscode
examples		examples
img		img
rosetta		rosetta
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
push_package.sh		push_package.sh
requirements.txt		requirements.txt
setup.py		setup.py

License

ayman-m/rosetta

Folders and files

Latest commit

History

Repository files navigation