Rosetta is a Python library for generating realistic security telemetry and alerts at scale. It can:
- Generate observables/indicators (IPs, URLs, hashes, CVEs, MITRE ATT&CK techniques)
- Emit synthetic logs in multiple formats (SYSLOG, CEF, LEEF, JSON, Windows Event XML)
- Produce incident bundles composed of multiple event types
- Convert one log format to another (e.g., CEF to JSON/LEEF)
- Send synthetic logs to TCP/UDP/HTTP/HTTPS endpoints
- Validate fields against a schema and generate missing values heuristically
- Simulate database queries including SQL injection patterns
- Generate Kubernetes and cloud-native telemetry
- Install from PyPI:
pip install rosetta-ce- Install from source:
git clone https://github.com/ayman-m/rosetta.git
cd rosetta
python setup.py installfrom rosetta import Events, Observables, ObservableType, ObservableKnown
# Generate observables
bad_ips = Observables.generator(count=3, observable_type=ObservableType.IP, known=ObservableKnown.BAD)
# Inject custom observables and extra fields
observables = Observables(
src_host=["web-01"],
user=["alex"],
url=["https://example.org"],
custom_field=["custom_value"],
)
# Create events in different formats
syslog_events = Events.syslog(count=2, observables=observables)
cef_events = Events.cef(count=2, observables=observables)
leef_events = Events.leef(count=2, observables=observables)
json_events = Events.json(count=2, observables=observables)
win_events = Events.winevent(count=2, observables=observables)| Type | Description | Known Values |
|---|---|---|
IP |
IPv4 addresses | BAD (malicious), GOOD (benign) |
URL |
Web URLs | BAD (malicious), GOOD (benign) |
SHA256 |
File hashes | BAD (malicious), GOOD (benign) |
CVE |
CVE identifiers | N/A |
TERMS |
MITRE ATT&CK techniques (280+ IDs) | N/A |
from rosetta import Observables, ObservableType, ObservableKnown
bad_urls = Observables.generator(count=2, observable_type=ObservableType.URL, known=ObservableKnown.BAD)
good_hashes = Observables.generator(count=2, observable_type=ObservableType.SHA256, known=ObservableKnown.GOOD)
cves = Observables.generator(count=2, observable_type=ObservableType.CVE)
terms = Observables.generator(count=2, observable_type=ObservableType.TERMS)Observables accepts known fields and arbitrary extra fields via **kwargs.
from rosetta import Observables
observables = Observables(
local_ip=["192.168.10.10"],
remote_ip=["1.1.1.1"],
src_host=["abc"],
dst_host=["xyz"],
user=["ayman"],
file_name=["test.zip"],
custom_field=["custom_value"],
)| Category | Fields |
|---|---|
| Network (IPv4/IPv6) | local_ip, remote_ip, local_ip_v6, remote_ip_v6, local_port, remote_port, protocol |
| Hosts & Domains | src_host, dst_host, src_domain, dst_domain, url |
| Users & Email | user, sender_email, recipient_email, email_subject, email_body |
| Files | file_name, file_hash |
| Processes | win_process, win_child_process, unix_process, unix_child_process, win_cmd, unix_cmd |
| Security | severity, action, event_id, error_code, technique, cve, terms |
| Alerts & Incidents | alert_types, alert_name, incident_types, analysts, action_status |
| Database | query_type, database_name, query |
| Other | app, os, sensor, entry_type, inbound_bytes, outbound_bytes |
Rosetta supports generating events in multiple industry-standard log formats:
| Format | Description | Use Case |
|---|---|---|
| SYSLOG | RFC 5424 syslog format | Unix/Linux system logs, network devices |
| CEF | Common Event Format | SIEM integration (ArcSight, Splunk) |
| LEEF | Log Event Extended Format | IBM QRadar integration |
| JSON | Structured JSON format | Modern SIEM, Elasticsearch, cloud platforms |
| Windows Event XML | Windows Event Log format | Windows security monitoring, Sysmon |
| Incidents | Bundled multi-format events | Incident response testing, SOC training |
from rosetta import Events
Events.syslog(count=1)
Events.syslog(count=1, observables=observables)Events.cef(count=1, observables=observables)
Events.cef(count=1, observables=observables, required_fields="local_ip,local_port,remote_ip,remote_port,protocol,rule_id,action")Events.leef(count=1, observables=observables)Events.winevent(count=1, observables=observables)Events.json(count=1, observables=observables)Events.incidents(count=1, fields="id,type,duration,analyst,description,events", observables=observables)Rosetta includes 11 predefined incident categories:
- Malware
- Phishing
- Access Violation
- Lateral Movement
- Port Scan
- SQL Injection
- Brute Force
- Control Avoidance
- Rogue Device
- Denial of Service
- Account Compromised
Rosetta can require specific fields per event. You can pass required_fields directly, or rely on presets.
- Preset file:
rosetta/schema/required_presets.json - Keys:
syslog,cef,leef,json,winevent
# Explicit override
Events.syslog(count=1, required_fields="timestamp,hostname,username")
# Use presets (default behavior)
Events.syslog(count=1)If the preset file is missing or empty, Rosetta falls back to built-in defaults.
Rosetta checks required fields and observables against a supported-fields list and emits warnings for unknown fields.
- Schema file:
rosetta/schema/supported_fields.json - Behavior: non-blocking warnings only
from rosetta import Events, Observables
Events.syslog(count=1, observables=Observables(), required_fields="unknown_field")
# Warning: Field 'unknown_field' is not in schema/supported_fields.jsonClick to expand full field list by category
username, user, user_id, user_sid, user_dn, user_ou, user_type, user_role, user_group, actor_username, actor_sid, actor_id, actor_uid, actor_arn, actor_ip, target_username, target_user_sid, target_user_id, target_uid, admin_username, admin_ip, analyst_username, creator_username, creator_ip, display_name, full_name, email, department, title, manager
authentication_method, authentication_result, authentication_package, authentication_status, authorization_status, session_id, session_type, session_start, session_end, session_duration, session_timeout, token_id, token_expiry, token_elevation_type, mfa_method, mfa_result, logon_type, logon_process, logon_guid, logon_id, logon_time, logoff_time, login_type, login_time, last_login, last_logon, last_password_change
client_ip, client_port, client_hostname, client_mac, server_ip, server_port, server_hostname, source_ip, source_port, source_mac, source_hostname, destination_ip, destination_port, destination_mac, destination_hostname, local_ip, local_port, remote_ip, remote_port, remote_host, assigned_ip, public_ip, private_ip, nat_source_ip, nat_destination_ip, scanner_ip, target_ip, target_port, target_hostname
dns_server, dns_servers, dns_query, dns_response, dns_flags, dns_name, dnssec_validated, query_name, query_class, query_time_ms, query_count, response_data, response_ip, response_count, response_ttl, authoritative, recursion_desired, recursion_available, dhcp_*, lease_duration, lease_start, lease_expiry, lease_state, scope_name, scope_id
http_method, http_uri, http_host, http_status_code, http_protocol, http_referer, http_user_agent, http_query_string, request_id, request_size, request_body_sample, request_headers, response_code, response_size, response_time_ms, response_body_sample, response_headers, content_type, content_length, user_agent, referer, cookie, cookies, url, url_category, url_categories
gateway_name, api_key, api_name, api_endpoint, api_operation, api_version, api_parameters, api_call, oauth_client_id, oauth_scope, rate_limit_policy, rate_limit_remaining, quota_policy, quota_remaining, backend_server, backend_response_time_ms, backend_status_code, cache_status, cache_hit
file_name, file_path, file_type, file_size, file_hash, file_hash_md5, file_hash_sha1, file_hash_sha256, file_hash_imphash, file_owner, file_group, file_permissions, file_attributes, file_version, original_filename, creation_time, modification_time, deletion_time, access_time, old_hash, new_hash, old_size, new_size, old_permissions, new_permissions
process_id, process_name, process_guid, parent_process_name, parent_process_guid, parent_command_line, parent_image, pid, ppid, executable_path, command_line, command, arguments, args, working_directory, cwd, image, image_path, image_loaded, start_time, stop_time, exit_code, cpu_time, thread_count, handle_count
event_id, event_type, event_record_id, event_category, logon_id, linked_logon_id, virtual_account, elevated_token, mandatory_label, integrity_level, terminal_session_id, current_directory, source_pid, source_process_name, source_image, source_user, target_pid, target_process_name, target_image, granted_access, call_trace
registry_key, registry_value_name, registry_value_type, registry_value_data, old_value_type, old_value_data, new_value_type, new_value_data, target_object, details, new_name
service_name, service_type, service_state, service_path, service_file_name, service_start_type, service_unit, service_account, task_name, task_content, task_id, task_status, task_result, trigger_type, trigger_value, run_level, enabled, schedule, last_run_time, next_run_time
module_name, module_path, module_base_address, module_size, module_version, module_parameters, module_hash, driver_name, signature_status, signature_level, signed, signed_by, signer, load_reason, load_result, load_address, load_time, is_kernel_mode
script_block_text, script_path, script_content, script_hash, script_content_hash, script_block_id, script_engine, host_application, engine_version, runspace_id, pipeline_id, interpreter, obfuscation_score
container_id, container_name, container_image, namespace, pod_name, pod_uid, node_name, cluster, labels, annotations, resource_limits, security_context, service_account, restart_count, exit_code_previous, environment_variables, cgroup, namespace_pid, capabilities
cloud_provider, region, instance_id, instance_name, instance_type, ami_id, vpc_id, subnet_id, security_groups, iam_role, resource_type, resource_id, resource_name, resource_arn, bucket_name, bucket_arn, volume_id, volume_name, volume_type, volume_size, snapshot_id, snapshot_name, tags
hypervisor_type, vm_id, vm_name, vm_uuid, cpu_usage, memory_usage, cpu_count, memory_mb, disk_size_gb, network_adapters, template_name, resource_pool, datastore, target_vm, target_host, boot_time_ms, uptime_seconds, previous_state
database_name, database_role, query_type, query_text, query, command_type, command_text, object_name, schema_name, execution_status, execution_time_ms, affected_rows, transaction_id, privilege, error_code, error_message
sender, recipient, sender_email, recipient_email, sender_domain, recipient_domain, subject, message_id, message_size, message_count, attachment_name, attachment_type, attachment_size, attachment_hash, attachment_count, attachment_names, attachment_types, attachment_hashes, spam_score, phishing_score, spf_result, dkim_result, dmarc_result
firewall_name, rule_id, rule_name, rule_type, rule_number, rule_action, acl_name, acl_type, action, action_taken, zone_source, zone_destination, interface_in, interface_out, input_interface, output_interface, source_network, destination_network, port_range, tcp_flags, packets, bytes, bytes_sent, bytes_received
signature_id, signature_name, signature_category, attack_type, attack_vector, attack_category, attack_severity, threat_type, threat_name, threat_category, threat_score, threat_level, threat_severity, threat_indicator, threat_detected, detection_name, detection_type, mitre_tactic, mitre_technique, cve_id, cvss_score, cvss_vector
agent_id, agent_version, scan_id, scan_type, scan_result, scan_status, scan_start, scan_end, scan_duration, finding_id, vulnerability_id, vulnerability_name, vulnerability_description, remediation, quarantine_id, quarantine_status, quarantine_path, quarantined, blocked
incident_id, incident_name, incident_type, incident_severity, incident_status, alert_id, alert_type, alert_name, playbook_id, playbook_name, analyst_notes, confidence, risk_score, risk_level, severity, priority
ssl_protocol, ssl_version, ssl_cipher, ssl_subject, ssl_issuer, ssl_client_cert_cn, ssl_ja3_hash, ssl_ja3s_hash, tls_version, tls_cipher, cipher_suite, certificate_cn, certificate_serial, certificate_issuer, certificate_subject, certificate_validity_start, certificate_validity_end, certificate_chain_valid, certificate_revocation_status, ja3_hash, ja3s_hash
vpn_group, tunnel_type, tunnel_id, encryption_algorithm, idle_timeout, session_timeout, bytes_quota, client_version
ssid, ap_name, ap_mac, bssid, eap_type, vlan_assigned, radio_type, channel, rssi, snr, roam_count, association_time, data_rate, power_save_mode
identity_group, policy_matched, nas_ip, nas_port, calling_station_id, called_station_id, radius_attributes, switch_ip, switch_port, vlan_id, vlan_name, posture_status, endpoint_policy
data_classification, sensitive_data_flag, sensitive_data_types, sensitive_data_detected, sensitive_data_added, sensitive_data_removed, pattern_matched, bytes_inspected, dlp_verdict, dlp_violation, dlp_scan_result, masked_fields, channel_type
scanner_ip, target_os, target_os_version, service_detected, service_version, banner, vulnerability_checks, vulnerabilities_found, vulnerabilities_critical, vulnerabilities_high, vulnerabilities_medium, vulnerabilities_low, vulnerabilities_info, compliance_score, exploit_available, patch_available, first_detected, last_detected
device_type, device_id, device_name, enrollment_status, enrollment_method, enrollment_time, serial_number, imei, jailbreak_status, passcode_compliant, installed_apps_count, managed_apps_count, certificates_installed, profiles_installed
vault_name, checkout_id, checkout_reason, checkout_time, checkin_time, session_duration_limit, recording_enabled, recording_id, target_account, target_account_type, target_system, credential_type, credential_name
application, application_name, application_version, environment, log_level, logger_name, message, exception_type, exception_message, stack_trace, thread_name, thread_id, span_id, trace_id, custom_fields
audit_id, operation, operation_type, modification_type, change_type, change_description, change_reason, old_value, new_value, justification, approval_id, approval_status, approver, workflow_id, compliance_status, policy_name, policy_violation
metric_name, metric_value, threshold, cpu_usage, memory_usage, disk_usage, throughput_bps, connection_count, response_time_ms, execution_time_ms, duration, jitter_ms, offset_ms, latency
When a field has no explicit value, Rosetta infers a reasonable value based on name patterns. This makes large schemas usable without hardcoding every field.
| Category | Patterns |
|---|---|
| Network | *_ip, *_ipv6, *_port, *_mac, *_domain, *_hostname, *_url |
| Identity | *_email, *_user, *_sid, *_arn |
| Identifiers | *_id, *_uuid, *_guid |
| Hashing | *_hash, *_md5, *_sha1, *_sha256 |
| Status | *_status, *_result, *_outcome, *_verdict, *_action |
| Metrics | *_size, *_bytes, *_count, *_duration, *_ms, *_score, *_percent |
| Time | *_time, *_timestamp, *_date |
| HTTP/API | http_*, request_*, response_*, api_* |
| DNS/DHCP | dns_*, dhcp_* |
| Authentication | auth_*, mfa_*, token_*, session_*, role_*, permission_* |
| Kubernetes | namespace, pod_*, container_*, node_*, cluster, labels, annotations, service_account |
| Threats | vulnerability_*, cve, cvss_*, threat_*, mitre_*, ioc_* |
| Email/SMTP | sender_*, recipient_*, smtp_*, dkim_*, spf_*, dmarc_* |
| Boolean | is_*, *_enabled, *_flag |
Send synthetic events to TCP/UDP/HTTP/HTTPS endpoints using multi-threaded workers.
SYSLOGCEFLEEFWINEVENTJSONINCIDENT
- UDP:
udp:127.0.0.1:514 - TCP:
tcp:127.0.0.1:514 - HTTP:
http://127.0.0.1:8000/endpoint - HTTPS:
https://127.0.0.1:8000/endpoint
from rosetta import Sender, WorkerTypeEnum
# UDP syslog
udp_worker = Sender(
data_type=WorkerTypeEnum.SYSLOG,
destination="udp:127.0.0.1:514",
observables=observables,
count=5,
interval=2
)
udp_worker.start()
# HTTP JSON
http_worker = Sender(
data_type=WorkerTypeEnum.JSON,
destination="http://127.0.0.1:8000/logs",
observables=observables,
count=5,
interval=2
)
http_worker.start()from rosetta import Converter, ConverterToEnum, ConverterFromEnum
cef_log = "CEF:0|Security|IDS|1.0|Alert|10|src=192.168.0.1 dst=192.168.0.2 act=blocked"
converted = Converter.convert(from_type=ConverterFromEnum.CEF, to_type=ConverterToEnum.JSON, data=cef_log)python3 -m unittest discover -s testsRosetta can generate realistic database activity logs including normal operations and attack patterns.
SELECT, INSERT, UPDATE, DELETE, ALTER, CREATE, DROP, TRUNCATE, GRANT, REVOKE, MERGE, CALL
- SQL injection queries
- Unauthorized data manipulation
- Privilege escalation attempts
Rosetta includes built-in OWASP Top 10 attack technique indicators:
- Injection (SQL, Command)
- Broken Authentication and Session Management
- Cross-Site Scripting (XSS)
- Broken Access Control
- Security Misconfiguration
- Insecure Cryptographic Storage
- Insufficient Transport Layer Protection
- Unvalidated Redirects and Forwards
- Using Components with Known Vulnerabilities
- Insufficient Logging and Monitoring
Supported protocols for telemetry generation:
TCP, UDP, HTTP, SSL, SQL, SSH, FTP, RTP, RDP
Rosetta generates realistic Windows endpoint data including:
- 18 common Windows processes (explorer.exe, svchost.exe, lsass.exe, etc.)
- PowerShell commands for attack simulation
- Windows Event Log XML templates (Sysmon, Security events)
See the examples/ directory for complete usage examples:
observables.py- Generate indicatorsevents_formats.py- Create events in different formatsincidents.py- Build incident bundlessender_tcp_udp_http.py- Send events to endpointsconverter.py- Convert between formatsk8s_fields.py- Kubernetes field generationpresets_schema.py- Schema validation
- Some observable generators fetch from public sources. When offline, Rosetta falls back to synthetic values.
- Preset and schema files are generated from the CSV mapping in the project root and can be updated as your schema evolves.