Version 0.2 Draft
This document describes the specification on how a parser must implement the regexes.yaml
file for correctly parsing user-agent strings on basis of that file.
This specification intends to help maintainers and contributors to correctly use the provided information within the regexes.yaml
file for obtaining information from the different user-agent strings. Furthermore this specification tries to be the basis for discussions on evolving the projects and the needed parsing algorithms.
This document will not provide any information on how to implement the ua-parser project on your server and how to retrieve the user-agent string for further processing.
Any information which can be obtained from a user-agent string may contain information on:
- User-Agent aka “the browser”
- OS (Operating System) the User-Agent currently uses (or runs on)
- Device information by means of the physical device the User-Agent is using
This information is provided within the regexes.yaml
file. Each kind of information requires a different parser which extracts the related type. These are:
user_agent_parser
os_parsers
device_parsers
Each parser contains a list of regular-expressions which are named regex
. For each regex
replacements specific to the parser can be named to attribute or change information. A replacement may require a match from the regular-expression which is extracted by an expression enclosed in parenthesis "()"
. Each match can be addressed with $1
to $9
and used in a parser specific replacement.
TODO: Provide some insights into the used chars. E.g. escape "."
as "\."
and "("
as "\("
. "/"
does not need to be escaped.
The user_agent_parsers
returns information of the family
type of the User-Agent.
If available the version information specifying the family
may be extracted as well if available.
Here major, minor and patch version information can be addressed or overwritten.
match in regex | default replacement | placeholder in replacement | note |
---|---|---|---|
1 | family_replacement | $1 | specifies the User-Agents family |
2 | v1_replacement | $2 | major version number/info of the family |
3 | v2_replacement | $3 | minor version number/info of the family |
4 | v3_replacement | $4 | patch version number/info of the family |
In case that no replacement is specified, the association is given by order of the match. If in the regex
no first match (within parenthesis) is given, the family_replacement
shall be returned.
To overwrite the respective value the replacement value needs to be named for a regex
-item.
Parser Implementation:
The list of regular-expressions regex
shall be evaluated for a given user-agent string beginning with the first regex
-item in the list to the last item. The first matching regex
stops processing the list. Regex-matching shall be case sensitive but not anchored.
In case that no replacement for a match is specified for a regex
-item, the first match defines the family
, the second major
, the third minor
and the fourth patch
information.
If a *_replacement
string is specified it shall overwrite or replace the match.
As placeholder for inserting matched characters use within
family_replacement
:$1
v1_replacement
:$2
v2_replacement
:$3
v3_replacement
:$4
If no matching regex
is found the value for family
shall be “Other”. The version information major
, minor
and patch
shall not be defined.
Example:
For the User-Agent: Mozilla/5.0 (Windows; Windows NT 5.1; rv:2.0b3pre) Gecko/20100727 Minefield/4.0.1pre
the matching regex
:
- regex: '(Namoroka|Shiretoko|Minefield)/(\d+)\.(\d+)\.(\d+(?:pre)?)'
family_replacement: 'Firefox ($1)'
resolves to:
family: Firefox (Minefield)
major : 4
minor : 0
patch : 1pre
The os_parsers
return information of the os
type of the Operating System (OS) the User-Agent runs.
If available the version information specifying the os
may be extracted as well if available.
Here major, minor and patch version information can be addressed or overwritten.
match in regex | default replacement | placeholder in replacement | note |
---|---|---|---|
1 | os_replacement | $1 | specifies the OS |
2 | os_v1_replacement | $2 | major version number/info of OS |
3 | os_v2_replacement | $3 | minor version number/info of the OS |
4 | os_v3_replacement | $4 | patch version number/info of the OS |
5 | os_v4_replacement | $5 | patchMinor version number/info of the OS |
In case that no replacement is specified, the association is given by order of the match. If in the regex
no first match (within normal brackets) is given, the os_replacement
shall be specified!
To overwrite the respective value the replacement value needs to be named for a regex
-item.
Parser Implementation:
The list of regular-expressions regex
shall be evaluated for a given user-agent string beginning with the first regex
-item in the list to the last item. The first matching regex
stops processing the list. Regex-matching shall be case sensitive.
In case that no replacement for a match is specified for a regex
-item, the first match defines the os
family, the second major
, the third minor
, the forth patch
and the fifth patchMinor
version information.
If a *_replacement
string is specified it shall overwrite or replace the match.
As placeholder for inserting matched characters use within
os_replacement
:$1
os_v1_replacement
:$2
os_v2_replacement
:$3
os_v3_replacement
:$4
os_v4_replacement
:$5
In case that no matching regex
is found the value for os
shall be “Other”. The version information major
, minor
, patch
and patchMinor
shall not be defined.
Example:
For the User-Agent: Mozilla/5.0 (Windows; U; Win95; en-US; rv:1.1) Gecko/20020826
the matching regex
:
- regex: 'Win(95|98|3.1|NT|ME|2000)'
os_replacement: 'Windows $1'
resolves to:
os: Windows 95
The device_parsers
return information of the device family
the User-Agent runs on.
Furthermore brand
and model
of the device can be specified.
brand
names the manufacturer of the device, where model specifies the model of the device.
match in regex | default replacement | placeholder in replacement | note |
---|---|---|---|
1 | device_replacement | $1...$9 | specifies the device family |
any | brand_replacement | $1...$9 | major version number/info of OS |
1 | model_replacement | $1...$9 | minor version number/info of the OS |
In case that no replacement is specified the association is given by order of the match.
If in the regex
no first match (within normal brackets) is given the device_replacement
together with the model_replacement
shall be specified!
To overwrite the respective value the replacement value needs to be named for a given regex
.
For the device_parsers
some regex
require case insensitive parsing for proper matching. (E.g. Generic Feature Phones). To distinguish this from the case sensitive default case, the value regex_flag: 'i'
is used to indicate that the regular-expression matching shall be case-insensitive for this regular expression.
Parser Implementation:
The list of regular-expressions regex
shall be evaluated for a given user-agent string beginning with the first regex
-item in the list to the last item. The first matching regex
stops processing the list. Regex-matching shall be case sensitive.
In case that no replacement for a match is given, the first match defines the family
and the model
.
If a *_replacement
string is specified it shall overwrite or replace the match.
As placeholder for inserting matched characters $1
to $9
can be used to insert the matched characters from the regex into the replacement string.
In case that no matching regex
is found the value for family
shall be “Other”. brand
and model
shall not be defined.
Leading and tailing whitespaces shall be trimmed from the result.
Example:
For the User-Agent: Mozilla/5.0 (Linux; U; Android 4.2.2; de-de; PEDI_PLUS_W Build/JDQ39) AppleWebKit/534.30 (KHTML, like Gecko) Version/4.0 Safari/534.30
the matching regex
:
- regex: '; *(PEDI)_(PLUS)_(W) Build'
device_replacement: 'Odys $1 $2 $3'
brand_replacement: 'Odys'
model_replacement: '$1 $2 $3'
resolves to:
family: 'Odys PEDI PLUS W'
brand: 'Odys'
model: 'PEDI PLUS W'
To allow interoperability with code that builds upon ua-parser, it is recommended to provide the parser output in a standardized way. The structure defined in WebIDL may follow:
interface ua-parser-output {
attribute string string; // The "user-agent" string
object ua: { // The "user_agent_parsers" result
attribute string family;
attribute string major;
attribute string minor;
attribute string patch;
};
object os: { // The "os_parsers" result
attribute string family;
attribute string major;
attribute string minor;
attribute string patch;
attribute string patchMinor;
};
object device: { // The "device_parsers" result
attribute string family;
attribute string brand;
attribute string model;
};
};