Skip to content

Latest commit

 

History

History
204 lines (147 loc) · 9.63 KB

specification.md

File metadata and controls

204 lines (147 loc) · 9.63 KB

ua-parser Specification

Version 0.2 Draft

This document describes the specification on how a parser must implement the regexes.yaml file for correctly parsing user-agent strings on basis of that file.

This specification intends to help maintainers and contributors to correctly use the provided information within the regexes.yaml file for obtaining information from the different user-agent strings. Furthermore this specification tries to be the basis for discussions on evolving the projects and the needed parsing algorithms.

This document will not provide any information on how to implement the ua-parser project on your server and how to retrieve the user-agent string for further processing.

regexes.yaml

Any information which can be obtained from a user-agent string may contain information on:

  • User-Agent aka “the browser”
  • OS (Operating System) the User-Agent currently uses (or runs on)
  • Device information by means of the physical device the User-Agent is using

This information is provided within the regexes.yaml file. Each kind of information requires a different parser which extracts the related type. These are:

  • user_agent_parser
  • os_parsers
  • device_parsers

Each parser contains a list of regular-expressions which are named regex. For each regex replacements specific to the parser can be named to attribute or change information. A replacement may require a match from the regular-expression which is extracted by an expression enclosed in parenthesis "()". Each match can be addressed with $1 to $9 and used in a parser specific replacement.

TODO: Provide some insights into the used chars. E.g. escape "." as "\." and "(" as "\(". "/" does not need to be escaped.

user_agent_parsers

The user_agent_parsers returns information of the family type of the User-Agent. If available the version information specifying the family may be extracted as well if available. Here major, minor and patch version information can be addressed or overwritten.

match in regex default replacement placeholder in replacement note
1 family_replacement $1 specifies the User-Agents family
2 v1_replacement $2 major version number/info of the family
3 v2_replacement $3 minor version number/info of the family
4 v3_replacement $4 patch version number/info of the family

In case that no replacement is specified, the association is given by order of the match. If in the regex no first match (within parenthesis) is given, the family_replacement shall be returned. To overwrite the respective value the replacement value needs to be named for a regex-item.

Parser Implementation:

The list of regular-expressions regex shall be evaluated for a given user-agent string beginning with the first regex-item in the list to the last item. The first matching regex stops processing the list. Regex-matching shall be case sensitive but not anchored.

In case that no replacement for a match is specified for a regex-item, the first match defines the family, the second major, the third minorand the fourth patch information. If a *_replacement string is specified it shall overwrite or replace the match.

As placeholder for inserting matched characters use within

  • family_replacement: $1
  • v1_replacement: $2
  • v2_replacement: $3
  • v3_replacement: $4

If no matching regex is found the value for family shall be “Other”. The version information major, minor and patch shall not be defined.

Example:

For the User-Agent: Mozilla/5.0 (Windows; Windows NT 5.1; rv:2.0b3pre) Gecko/20100727 Minefield/4.0.1pre the matching regex:

  - regex: '(Namoroka|Shiretoko|Minefield)/(\d+)\.(\d+)\.(\d+(?:pre)?)'
    family_replacement: 'Firefox ($1)'

resolves to:

  family: Firefox (Minefield)
  major : 4
  minor : 0
  patch : 1pre

os_parsers

The os_parsers return information of the os type of the Operating System (OS) the User-Agent runs. If available the version information specifying the os may be extracted as well if available. Here major, minor and patch version information can be addressed or overwritten.

match in regex default replacement placeholder in replacement note
1 os_replacement $1 specifies the OS
2 os_v1_replacement $2 major version number/info of OS
3 os_v2_replacement $3 minor version number/info of the OS
4 os_v3_replacement $4 patch version number/info of the OS
5 os_v4_replacement $5 patchMinor version number/info of the OS

In case that no replacement is specified, the association is given by order of the match. If in the regex no first match (within normal brackets) is given, the os_replacement shall be specified! To overwrite the respective value the replacement value needs to be named for a regex-item.

Parser Implementation:

The list of regular-expressions regex shall be evaluated for a given user-agent string beginning with the first regex-item in the list to the last item. The first matching regex stops processing the list. Regex-matching shall be case sensitive.

In case that no replacement for a match is specified for a regex-item, the first match defines the os family, the second major, the third minor, the forth patch and the fifth patchMinor version information. If a *_replacement string is specified it shall overwrite or replace the match.

As placeholder for inserting matched characters use within

  • os_replacement: $1
  • os_v1_replacement: $2
  • os_v2_replacement: $3
  • os_v3_replacement: $4
  • os_v4_replacement: $5

In case that no matching regex is found the value for os shall be “Other”. The version information major, minor, patch and patchMinor shall not be defined.

Example:

For the User-Agent: Mozilla/5.0 (Windows; U; Win95; en-US; rv:1.1) Gecko/20020826 the matching regex:

  - regex: 'Win(95|98|3.1|NT|ME|2000)'
    os_replacement: 'Windows $1'

resolves to:

  os: Windows 95

device_parsers

The device_parsers return information of the device family the User-Agent runs on. Furthermore brand and model of the device can be specified. brand names the manufacturer of the device, where model specifies the model of the device.

match in regex default replacement placeholder in replacement note
1 device_replacement $1...$9 specifies the device family
any brand_replacement $1...$9 major version number/info of OS
1 model_replacement $1...$9 minor version number/info of the OS

In case that no replacement is specified the association is given by order of the match. If in the regex no first match (within normal brackets) is given the device_replacement together with the model_replacement shall be specified! To overwrite the respective value the replacement value needs to be named for a given regex.

For the device_parsers some regex require case insensitive parsing for proper matching. (E.g. Generic Feature Phones). To distinguish this from the case sensitive default case, the value regex_flag: 'i' is used to indicate that the regular-expression matching shall be case-insensitive for this regular expression.

Parser Implementation:

The list of regular-expressions regex shall be evaluated for a given user-agent string beginning with the first regex-item in the list to the last item. The first matching regex stops processing the list. Regex-matching shall be case sensitive.

In case that no replacement for a match is given, the first match defines the family and the model. If a *_replacement string is specified it shall overwrite or replace the match.

As placeholder for inserting matched characters $1 to $9 can be used to insert the matched characters from the regex into the replacement string.

In case that no matching regex is found the value for family shall be “Other”. brand and model shall not be defined. Leading and tailing whitespaces shall be trimmed from the result.

Example:

For the User-Agent: Mozilla/5.0 (Linux; U; Android 4.2.2; de-de; PEDI_PLUS_W Build/JDQ39) AppleWebKit/534.30 (KHTML, like Gecko) Version/4.0 Safari/534.30 the matching regex:

  - regex: '; *(PEDI)_(PLUS)_(W) Build'
    device_replacement: 'Odys $1 $2 $3'
    brand_replacement: 'Odys'
    model_replacement: '$1 $2 $3'

resolves to:

  family: 'Odys PEDI PLUS W'
  brand: 'Odys'
  model: 'PEDI PLUS W'

Parser Output

To allow interoperability with code that builds upon ua-parser, it is recommended to provide the parser output in a standardized way. The structure defined in WebIDL may follow:

interface ua-parser-output {
  attribute string string;      // The "user-agent" string
  object ua: {                  // The "user_agent_parsers" result
    attribute string family;
    attribute string major;
    attribute string minor;
    attribute string patch;
  };
  object os: {                 	// The "os_parsers" result
    attribute string family;
    attribute string major;
    attribute string minor;
    attribute string patch;
    attribute string patchMinor;
  };
  object device: {              // The "device_parsers" result
    attribute string family;
    attribute string brand;
    attribute string model;
  };
};