Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

allow splitting out indexes by other field values #450

Closed
mmguero opened this issue Mar 19, 2024 · 1 comment
Closed

allow splitting out indexes by other field values #450

mmguero opened this issue Mar 19, 2024 · 1 comment
Assignees
Labels
elastic Related to issue with external ElasticSearch/Kibana output enhancement New feature or request logstash Relating to Malcolm's use of Logstash opensearch Relating to Malcolm's use of OpenSearch performance Related to speed/performance
Milestone

Comments

@mmguero
Copy link
Collaborator

mmguero commented Mar 19, 2024

As of release v24.01.0, the MALCOLM_NETWORK_INDEX_PATTERN and MALCOLM_NETWORK_INDEX_SUFFIX environment variables allow splitting out Suricata and Zeek to a different index pattern from the one Arkime creates.

It may be useful to add another replacer to the pattern definable in either one or the other of these variables (probably the suffix?) to further allow it to be split out based on the event.provider variable (suricata vs. zeek, etc.).

@mmguero mmguero added enhancement New feature or request opensearch Relating to Malcolm's use of OpenSearch logstash Relating to Malcolm's use of Logstash performance Related to speed/performance elastic Related to issue with external ElasticSearch/Kibana output labels Mar 19, 2024
@mmguero mmguero added this to the v24.03.1 milestone Mar 19, 2024
@mmguero mmguero self-assigned this Mar 19, 2024
@mmguero mmguero added this to Malcolm Mar 19, 2024
@mmguero mmguero moved this to Todo (develop) in Malcolm Mar 19, 2024
@mmguero mmguero modified the milestones: v24.03.1, v24.04.0 Mar 19, 2024
@mmguero mmguero removed their assignment Mar 27, 2024
@mmguero mmguero added the falcon label Apr 2, 2024
@mmguero mmguero modified the milestones: v24.04.0, v24.05.0 Apr 8, 2024
@mmguero mmguero removed the falcon label May 7, 2024
@mmguero mmguero modified the milestones: v24.05.0, v24.06.0 May 13, 2024
@mmguero mmguero modified the milestones: v24.06.0, v24.07.0 Jun 17, 2024
@mmguero mmguero modified the milestones: v24.07.0, v24.08.0 Jun 27, 2024
@mmguero mmguero modified the milestones: v24.08.0, v24.07.0 Jul 19, 2024
@mmguero mmguero self-assigned this Jul 19, 2024
@mmguero mmguero moved this from Todo (develop) to In Progress in Malcolm Jul 19, 2024
@mmguero mmguero moved this from In Progress to Todo (develop) in Malcolm Jul 24, 2024
@mmguero mmguero modified the milestones: v24.07.0, v24.08.0 Jul 29, 2024
@mmguero mmguero removed this from the v24.08.0 milestone Aug 12, 2024
@mmguero mmguero added this to the v24.09.0 milestone Aug 12, 2024
@mmguero mmguero modified the milestones: v24.09.0, z.staging Aug 20, 2024
@mmguero mmguero assigned mmguero and unassigned mmguero Aug 26, 2024
@mmguero mmguero moved this from Todo (develop) to In Progress in Malcolm Aug 28, 2024
@mmguero mmguero changed the title allow splitting out indexes by event.provider allow splitting out indexes by other field values Aug 28, 2024
@mmguero
Copy link
Collaborator Author

mmguero commented Aug 28, 2024

Values from event fields will now be expanded in the logstash filter that figures out the index to use when writing documents. From opensearch.env, MALCOLM_NETWORK_INDEX_SUFFIX and MALCOLM_OTHER_INDEX_SUFFIX now support expanding dot-separated field names to their values.

So, for example:

MALCOLM_NETWORK_INDEX_SUFFIX={{event.provider}}-{{event.dataset}}-%{%y%m%d}

could produce something like this:

$ docker compose exec api curl -sSL 'localhost:5000/mapi/indices' | jq -r '.indices[].index' | sort | grep sessions
arkime_sessions3-200428
arkime_sessions3-suricata-alert-200428
arkime_sessions3-zeek-analyzer-200428
arkime_sessions3-zeek-bestguess-200428
arkime_sessions3-zeek-cip-200428
arkime_sessions3-zeek-conn-200428
arkime_sessions3-zeek-cotp-200428
arkime_sessions3-zeek-dce_rpc-200428
arkime_sessions3-zeek-dhcp-200428
arkime_sessions3-zeek-dns-200428
arkime_sessions3-zeek-dpd-200428
arkime_sessions3-zeek-enip-200428
arkime_sessions3-zeek-files-200428
arkime_sessions3-zeek-http-200428
arkime_sessions3-zeek-ja4ssh-200428
arkime_sessions3-zeek-kerberos-200428
arkime_sessions3-zeek-known_hosts-200428
arkime_sessions3-zeek-known_modbus-200428
arkime_sessions3-zeek-known_services-200428
arkime_sessions3-zeek-login-200428
arkime_sessions3-zeek-modbus-200428
arkime_sessions3-zeek-modbus_detailed-200428
arkime_sessions3-zeek-notice-200428
arkime_sessions3-zeek-ntlm-200428
arkime_sessions3-zeek-pe-200428
arkime_sessions3-zeek-rdp-200428
arkime_sessions3-zeek-rfb-200428
arkime_sessions3-zeek-s7comm-200428
arkime_sessions3-zeek-s7comm_read_szl-200428
arkime_sessions3-zeek-s7comm_upload_download-200428
arkime_sessions3-zeek-signatures-200428
arkime_sessions3-zeek-signatures-240828
arkime_sessions3-zeek-smb_files-200428
arkime_sessions3-zeek-smb_filessmb_cmd-200428
arkime_sessions3-zeek-smb_mapping-200428
arkime_sessions3-zeek-software-200428
arkime_sessions3-zeek-ssh-200428
arkime_sessions3-zeek-ssl-200428
arkime_sessions3-zeek-tds-200428
arkime_sessions3-zeek-weird-200428
arkime_sessions3-zeek-x509-200428

Of course, these ONLY apply to events that get indexed through the logstash pipeline (not ones indexed by Arkime capture) itself and for which the user isn't somehow overriding the index value in another way.

Also, updated the documentation on environment variables:

  • The following variables control the OpenSearch indices to which network traffic metadata are written. Changing them from their defaults may cause logs from non-Arkime data sources (i.e., Zeek, Suricata) to not show up correctly in Arkime.
    • MALCOLM_NETWORK_INDEX_PATTERN - Index pattern for network traffic logs written via Logstash (default is arkime_sessions3-*)
    • MALCOLM_NETWORK_INDEX_TIME_FIELD - Default time field to use for network traffic logs in Logstash and Dashboards (default is firstPacket)
    • MALCOLM_NETWORK_INDEX_SUFFIX - Suffix used to create index to which network traffic logs are written
      • supports Ruby strftime strings in %{}) (e.g., hourly: %{%y%m%dh%H}, twice daily: %{%P%y%m%d}, daily (default): %{%y%m%d}, weekly: %{%yw%U}, monthly: %{%ym%m}
      • supports expanding dot-delimited field names in {{ }} (e.g., {{event.provider}}%{%y%m%d})
  • The following variables control the OpenSearch indices to which other logs (third-party logs, resource utilization reports from network sensors, etc.) are written.
    • MALCOLM_OTHER_INDEX_PATTERN - Index pattern for other logs written via Logstash (default is malcolm_beats_*)
    • MALCOLM_OTHER_INDEX_TIME_FIELD - Default time field to use for other logs in Logstash and Dashboards (default is @timestamp)
    • MALCOLM_OTHER_INDEX_SUFFIX - Suffix used to create index to which other logs are written (with the same rules as MALCOLM_NETWORK_INDEX_SUFFIX above) (default is %{%y%m%d})

mmguero added a commit to mmguero-dev/Malcolm that referenced this issue Aug 28, 2024
@mmguero mmguero moved this from In Progress to Testing in Malcolm Aug 28, 2024
@mmguero mmguero closed this as completed Aug 28, 2024
@mmguero mmguero moved this from Testing to Done in Malcolm Aug 28, 2024
This was referenced Sep 18, 2024
@mmguero mmguero moved this from Done to Released in Malcolm Sep 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
elastic Related to issue with external ElasticSearch/Kibana output enhancement New feature or request logstash Relating to Malcolm's use of Logstash opensearch Relating to Malcolm's use of OpenSearch performance Related to speed/performance
Projects
Status: Released
Development

No branches or pull requests

1 participant