Regular Expression Support
Document version: 5.6 Date: September 1, 2009
Overview
The OVAL Language supports a common subset of the regular expression character classes, operations, expressions and other lexical tokens defined within Perl 5's regular expression specification. This common subset was identified through a survey of several regular expression libraries in an effort to ensure that the regular expression elements supported by OVAL will be compatible with a wide variety of regular expression libraries. A listing of the surveyed regular expression libraries is provided later in this document.
Supported Regular Expression Syntax
Perl regular expression modifiers (m, i, s, x) are not supported. These modifiers should be considered to always be 'OFF' unless specifically permitted by documentation on an OVAL Language construct.
Character matching assumes a Unicode character set. Note that no syntax is supplied for specifying code points in hex; actual Unicode characters must be used instead.
The following regular expression elements are specifically identified as supported in the OVAL Language. For more detailed definitions of the regular expression elements listed below, refer to their descriptions in the Perl 5.004 Regular Expression documentation. A copy of the this documentation has been preserved here for reference purposes. Regular expression elements that are not listed below should be avoided as they are likely to be incompatible or have different meanings with commonly used regular expression libraries.
Metacharacters
\ Quote the next metacharacter ^ Match the beginning of the line . Match any character (except newline) $ Match the end of the line (or before newline at the end) | Alternation () Grouping [] Character class
Greedy Quantifiers
* Match 0 or more times + Match 1 or more times ? Match 1 or 0 times {n} Match exactly n times {n,} Match at least n times {n,m} Match at least n but not more than m times
Reluctant Quantifiers
*? Match 0 or more times +? Match 1 or more times ?? Match 0 or 1 time {n}? Match exactly n times {n,}? Match at least n times {n,m}? Match at least n but not more than m times
Escape Sequences
\t tab (HT, TAB) \n newline (LF, NL) \r return (CR) \f form feed (FF) \033 octal char (think of a PDP-11) \x1B hex char \c[ control char
Character Classes
\w Match a "word" character (alphanumeric plus "_") \W Match a non-word character \s Match a whitespace character \S Match a non-whitespace character \d Match a digit character \D Match a non-digit character
Zero Width Assertions
\b Match a word boundary \B Match a non-(word boundary)
Extensions
(?:regexp) - Group without capture (?=regexp) - Zero-width positive lookahead assertion (?!regexp) - Zero-width negative lookahead assertion
Version 8 Regular Expressions
[chars] - Match any of the specified characters [^chars] - Match anything that is not one of the specified characters [a-b] - Match any character in the range between "a" and "b", inclusive a|b - Alternation; match either the left side of the "|" or the right side \n - When 'n' is a single digit: the nth capturing group matched.
Surveyed Regular Expression Libraries
The following regular expression libraries were surveyed to identify a common subset of the Perl 5 regular expression specification that is widely supported:
- .NET
- BOOST
- Java
- JavaScript
- PCRE
- Perl 5.004 (as a baseline and reference for the detailed definitions of regular exrpression elements.)
- Python