All About Using Regular Expressions (RegExp) in Dart/Flutter
Contents
Introduction
Regular expressions are patterns used to match character combinations in strings. Regular expressions
are ubiquitous in
most programming languages, if you understand it in one you may easily be able to apply it to other
languages. That said, there may be some differences in syntax.
In dart, they come in the form of the RegExp
class.
Creating RegExp Patterns
A regular expression is created by using a raw string(a string prefixed by r, e.g r'I am a raw string',
r'12345String',
etc). For example: RegExp(r'I am a RegExp')
, RegExp(r'12345String')
, RegExp(r'[0-9]\d+ab99')
, etc. A RegExp would
match as many characters as possible so long as the pattern is satisfied.
Character Groups \d
, \w
, \s
, \D
, \W
,
\S
, .
A character group is a symbol signifying one of many characters in its group, e.g
\d
: signifies one of the digits 0 - 9
\w
: signifies any alphanumeric character( numbers and letters, a-z,
A-Z, 0-9)
\s
: Any whitespace character e.g space ' ', tab '\t', newline '\n', etc
\D
: This is the opposite of `\d`, it signifies any character that is
not a digit
\W
: This is the opposite of `\w`, it signifies any character that is
not alphanumeric
\S
: This is the opposite of `\s`, it signifies any character that is
not a whitespace character
.
:
This signifies any character that is not a newline character( newline character: `\n`, i.e end of a line
)
Character Range []
, [^]
You might also want to create you own `character group`; you do this by putting all the characters you
might want to
match in a square bracket([]), e.g RegExp(r'[abfky]')
,
this will match
exactly one of the characters
in the
bracket(i.e 'a' or 'b' or 'f' or 'k' or 'y').
If the characters you want to include follow each other on the `ASCII table`, you might use the `-`
character, e.g For a
regular expression that matches all digits except '9' as well as the lowercase english alphabet, you
would have:
RegExp(r'[0-8a-z]')
Also you might want to create a character group that includes only charactersthat are not in the
brackets; you do this
by adding a caret ('^') before the characters. e.g
RegExp(r'[^abfky]')
would only match characters that
RegExp(r'[abfky]')
would match as false, i.e it
would only
match letters that are not 'a', 'b', 'f', 'k', 'y'. Also RegExp(r'[^0-8a-z]')
would be the exact
opposite of
RegExp(r'[0-8a-z]')
.
Special characters like `.`, `+`, `*`, `?`, `{}` lose their special meaning in the square brackets and are just like ordinary characters to be matched.
Repeating and Optional Characters *
, +
,
?
, {}
Optional character ?
:
This indicates a character may or may not occur i.e it may occur once or not e.g
You may be trying to match the word 'color' or 'colour', the american and british versions of the word;
the letter 'u'
is optional. The regular expression that would match either word is RegExp(r'colou?r')
.
One or more +
:
This matches one or more occurrences of a character. e.g
If you are trying to match any number of digits (one or more), the regular expression would be RegExp(r'\d+')
, i.e one
or more digits.
Zero or more *
:
This indicates zero or more occurrences of a character. e.g RegExp(r'\w*')
would match none or more alphanumeric
characters.
Precise range {i,j}
:
This is useful when trying to match characters occurring between a particular range, e.g for between 8
and 15 digits the regular expression would be RegExp(r'\d{8,15}')
, the
minimum number of occurrences is first and the maximum second
(both inclusive).
Do not put a space between the bounds in the curly braces.
Open-ended range {i,}
:
This is useful to match characters occurring i or more times e.g. RegExp(r'\d{10,}')
would match 10 or more digits.
Precise number {i}
:
This is useful when trying to match a particular number of occurrences, e.g if you are trying to match
exactly 10
digis, let's say for a phone number, the regular expression would be RegExp(r'\d{10}')
Grouping Subexpressions ()
Grouping subexpressions is done by putting the characters of the subexpression in a
parenthesis('()'). Grouping
subexpressions have multiple uses, but the most important one is probably the application of
repeating and optional
characters (the previous subtopic). Instead of repeating one character multiple characters as a
group subexpression can
be repeated or made optional, e.g RegExp(r'(ha)+')
would match 1 or or more occurrences of the subexpression 'ha'.
Choice Patterns |
Sometimes you intention might be to match any of a few options of characters or subexpressions; in
this case you use the
pipe ('|') symbol.
The choice pattern works in a scoped model, i.e the options extend to the scope of the
subexpressions or characters. e.g
RegExp(r'pig|chicken|cow')
matches one of 'pig',
'chicken' or 'cow'
RegExp(r'1 (pig|chicken|cow)')
matches one of '1
pig', '1 chicken' or '1 cow'
RegExp(r'23 animals|1 (pig|chicken|cow)')
matches
either '23 animals' or one of '1 pig', '1 chicken', '1 cow'
Word and String Boundaries \b
, ^
,
$
Regular expressions by default finds a match anywhere it first occurs, whether it is at end, beginning or middle of the string, start, middle or end of a word. Deciding where and how a match would occur is the use of word and string boundaries.
Word Boundary \b
:
Word boundaries (\b
) are placed at the beginning or/and end of
words to ensure the match only
occurs at the start
or/and end of a word in the string. e.g.
String boundaries ^
, $
:
String boundaries indicate that the match occurs at start or end of a string. ^
indicates start of string while $
indicates end of string. e.g
Named Group Subexpressions (?<>)
This is one of the pecularities of regular expressions in dart. Naming group subexpressions has no effect on what is matched, the essence of this is to be able to access a particular match by name. e.g.
Parameters for Creating a RegExp Object (multiLine, caseSensitive, unicode, dotAll)
Before now, we have been creating `RegExp` objects with only a raw string
parameter(RegExp(r'example')
), but
implicitly, 4 other parameters have been set: multiLine, caseSensitive, unicode and dotAll.
The signature of the default construction of a RegExp construction is:
RegExp(String source, {bool multiLine = false, bool caseSensitive = true, bool unicode = false, bool dotAll =
false})
So when you create a RegExp object: RegExp(r'example')
, you get: RegExp(r'example', multiLine: false, caseSensitive:
true, unicode: false, dotAll: false)
.
`multiLine` parameter, default value: false
When false, the string boundary characters,^
and $
, match the beginning and end of the whole string. When false,
^
and $
match the beginning and end
of a line. e.g.
`caseSensitive` parameter, default value: true
When true, lowercase alphabet characters match only lowercase characters and uppercase characters match only uppercase characters, i.e 'a' would match 'a' and not 'A', 'B' would match 'B' and not 'b', etc; when false, an alphabet character would match its lowercase and uppercase forms, 'a' would match both 'a' and 'A', 'B' would match both 'b' and 'B', etc. e.g.
`unicode` parameter, default value: false
When true, the regular expression becomes unicode-aware,i.e complex unicode character encodings, e.g\u{0221}
, are parsed and matched.
`dotAll` parameter, default value: false
When false, the.
character any character except the newline
character(\n);
when true, .
matches any character including the newline character
Escaping Special Characters
Special characters include \
, ^
,
$
, ?
, *
, +
, <
, >
, [
, ]
, {
, }
, .
. Sometimes
you may want to
match any of these characters directly, to do that you escape the special meaning of the
character by prefixing it
with a backslash (\
), e.g. to match $
, you instead write \$
, i.e RegExp(r'\$')
, this
would match the dollar
character('$') in a string.
Greedy Matching
By default, regular expressions matching symbols (+
, *
, ?
, {}
) match as many characters
as possible, this is
called 'greedy matching' e.g. RegExp(r'\w+').hasMatch('Hippotamus')
would match the
whole string,
'Hippotamus', even
though the first letter, 'H', is good enough to satisfy the regular expression. To match as few
characters as possible (
to make the match successful ), this is called non-greedy matching, you only need to append the
question mark symbol
('?'), to the part of the regular expression you want to match non-greedily e.g
RegExp(r'\w+?').hasMatch('Hippotamus')
would only match 'H', since that is enough to make the match successful. An example of this is the
following code
snippet:
It is useful to intentional about whether you're trying to match as few characters or as many
characters as possible.
For "as many as possible" use: +
, *
, ?
, {}
, and for "as few as possible" use: +?
, *?
, ??
, {}?
.
Methods for RegExp
and String
RegExp
Methods
RegExp.hasMatch
bool RegExp.hasMatch(String testString)
-> This checks
whether the regular expression has a match in the
testString and
returns a bool, true if match exists, false otherwise.
RegExp.firstMatch
RegExpMatch? RegExp.firstMatch(String testString)
RegExp.allMatches
Iterable RegExp.allMatches(String testString, [int startIndex = 0])
RegExpMatch
.(An
iterable can be converted into a list by its toList()
method)
RegExpMatch
RegExpMatch
(a sub class of Match) results from the
firstMatch and allMatches methods of RegExp
and you
can use it to extract the
string of the named group subexpressions in the match as well as index named or unnamed
subexpressions as well as the
whole match.
If you intend to use firstMatch or allMatches methods to extract named subexpressions you should name
group expressions
you plan to match using ?<name>
as shown in the section Named Group
Subexpressions.
RegExpMatch
also shares the indexing operation with
its superclass Match
, index 0 gets the whole match,
index 1 gets
the first grouped subexpression, index 2 the next and so on in a depth-first-search way. In this
there is no need for
naming group subexpressions e.g.
String Methods
String.replaceFirst
:
String replaceFirst(
RegExp pattern,
String replace,
[int startIndex = 0],
)
This checks the first incident of a match the pattern and replaces it with the replace string.
String.replaceAll
:
String replaceAll(
RegExp pattern,
String replace,
)
This replaces all matches of the pattern with the replace string.
String.replaceFirstMapped
:
String replaceFirstMapped(
RegExp pattern,
String replace(
Match match
),
[int startIndex = 0],
)
This is like the `replaceFirst` method except instead of a replace `String`, there is a function that takes a `Match` object and returns a `String` e.g.
String.replaceAllMapped
:
String replaceAllMapped(
RegExp pattern,
String replace(
Match match
),
)
This is just like the `replaceFirstMapped` method but instead replaces all occurrences of a match in the String. e.g.