All About Using Regular Expressions (RegExp) in Dart/Flutter
Contents
Introduction
Regular expressions are patterns used to match character combinations in strings. Regular expressions
are ubiquitous in
most programming languages, if you understand it in one you may easily be able to apply it to other
languages. That said, there may be some differences in syntax.
In dart, they come in the form of the RegExp
class.
Creating RegExp Patterns
// hasMatch checks if a pattern matches a string, returns a bool
RegExp regExp = RegExp(r'hello');
regExp.hasMatch('hello'); // true
regExp.hasMatch('hello John'); // true
regExp.hasMatch('hello Amit'); // true
regExp.hasMatch('hell no'); // false
A regular expression is created by using a raw string(a string prefixed by r, e.g r'I am a raw string',
r'12345String',
etc). For example: RegExp(r'I am a RegExp')
, RegExp(r'12345String')
, RegExp(r'[0-9]\d+ab99')
, etc. A RegExp would
match as many characters as possible so long as the pattern is satisfied.
Character Groups \d
, \w
, \s
, \D
, \W
,
\S
, .
A character group is a symbol signifying one of many characters in its group, e.g
\d
: signifies one of the digits 0 - 9
\w
: signifies any alphanumeric character( numbers and letters, a-z,
A-Z, 0-9)
\s
: Any whitespace character e.g space ' ', tab '\t', newline '\n', etc
\D
: This is the opposite of `\d`, it signifies any character that is
not a digit
\W
: This is the opposite of `\w`, it signifies any character that is
not alphanumeric
\S
: This is the opposite of `\s`, it signifies any character that is
not a whitespace character
.
:
This signifies any character that is not a newline character( newline character: `\n`, i.e end of a line
)
// `\d`
RegExp digitRegExp = RegExp(r'\d'); // Only matches digits
digitRegExp.hasMatch("I'm on cloud 9"); // true
digitRegExp.hasMatch("I'm on cloud nine"); // false
// `\w`
RegExp alphaNumericRegExp = RegExp(r'\w'); // Matches any digit or letters
alphaNumericRegExp.hasMatch("I'm on cloud 9"); // true
alphaNumericRegExp.hasMatch("I'm on cloud nine"); // true
alphaNumericRegExp.hasMatch(" \n\t"); // false
// `\s`
RegExp whitespaceRegExp = RegExp(r'\s'); // Matches any whitespace character ' ', '\t', '\n'
whitespaceRegExp.hasMatch("I'moncloud9"); // false
whitespaceRegExp.hasMatch("I'moncloudnine"); // false
whitespaceRegExp.hasMatch(" \n\t"); // true
// `\D`
RegExp nonDigitRegExp = RegExp(r'\D'); // Matches any character that is not a digit
nonDigitRegExp.hasMatch("1237854895"); // false
nonDigitRegExp.hasMatch("I'm on cloud nine"); // true
nonDigitRegExp.hasMatch(" \n\t"); // true
// `\W`
RegExp nonAlphaNumericRegExp = RegExp(r'\W'); // Matches any character that is not alphanumeric
nonAlphaNumericRegExp.hasMatch("1237854895"); // false
nonAlphaNumericRegExp.hasMatch("Imoncloudnine"); // false
nonAlphaNumericRegExp.hasMatch(" \n\t"); // true
// `\S`
RegExp nonWhitespaceRegExp = RegExp(r'\S'); // Matches any character that is not a whitespace character
nonWhitespaceRegExp.hasMatch("1237854895"); // true
nonWhitespaceRegExp.hasMatch("I'm on cloud nine"); // true
nonWhitespaceRegExp.hasMatch(" \n\t"); // false
// `.`
RegExp dotRegExp = RegExp(r'.'); // Matches any character that is not the newline character (`\n`)
dotRegExp.hasMatch("1237854895"); // true
dotRegExp.hasMatch("I'm on cloud nine"); // true
dotRegExp.hasMatch(" \n\t"); // true
dotRegExp.hasMatch("\n"); // false
Character Range []
, [^]
You might also want to create you own `character group`; you do this by putting all the characters you
might want to
match in a square bracket([]), e.g RegExp(r'[abfky]')
,
this will match
exactly one of the characters
in the
bracket(i.e 'a' or 'b' or 'f' or 'k' or 'y').
If the characters you want to include follow each other on the `ASCII table`, you might use the `-`
character, e.g For a
regular expression that matches all digits except '9' as well as the lowercase english alphabet, you
would have:
RegExp(r'[0-8a-z]')
Also you might want to create a character group that includes only charactersthat are not in the
brackets; you do this
by adding a caret ('^') before the characters. e.g
RegExp(r'[^abfky]')
would only match characters that
RegExp(r'[abfky]')
would match as false, i.e it
would only
match letters that are not 'a', 'b', 'f', 'k', 'y'. Also RegExp(r'[^0-8a-z]')
would be the exact
opposite of
RegExp(r'[0-8a-z]')
.
// Matches only 'a' or 'Y'
RegExp range1 = RegExp(r'[aY]');
range1.hasMatch('a'); // true
range1.hasMatch('Y'); // true
range1.hasMatch('y'); // false
range1.hasMatch('b'); // false
// Matches all characters except 'a' or 'Y'
RegExp range2 = RegExp(r'[^aY]');
range2.hasMatch('a'); // false
range2.hasMatch('Y'); // false
range2.hasMatch('y'); // true
range2.hasMatch('b'); // true
// Matches only one of 'c', 'd', 'e', 'f', '2', '3', '4', '5'
RegExp range3 = RegExp(r'[c-f2-5]');
range3.hasMatch('b'); // false
range3.hasMatch('f'); // true
range3.hasMatch('4'); // true
// Matches all characters except 'c', 'd', 'e', 'f', '2', '3', '4', '5'
RegExp range4 = RegExp(r'[^c-f2-5]');
range4.hasMatch('b'); // true
range4.hasMatch('f'); // false
range4.hasMatch('4'); // false
Special characters like `.`, `+`, `*`, `?`, `{}` lose their special meaning in the square brackets and are just like ordinary characters to be matched.
Repeating and Optional Characters *
, +
,
?
, {}
Optional character ?
:
This indicates a character may or may not occur i.e it may occur once or not e.g
You may be trying to match the word 'color' or 'colour', the american and british versions of the word;
the letter 'u'
is optional. The regular expression that would match either word is RegExp(r'colou?r')
.
One or more +
:
This matches one or more occurrences of a character. e.g
If you are trying to match any number of digits (one or more), the regular expression would be RegExp(r'\d+')
, i.e one
or more digits.
Zero or more *
:
This indicates zero or more occurrences of a character. e.g RegExp(r'\w*')
would match none or more alphanumeric
characters.
Precise range {i,j}
:
This is useful when trying to match characters occurring between a particular range, e.g for between 8
and 15 digits the regular expression would be RegExp(r'\d{8,15}')
, the
minimum number of occurrences is first and the maximum second
(both inclusive).
Do not put a space between the bounds in the curly braces.
Open-ended range {i,}
:
This is useful to match characters occurring i or more times e.g. RegExp(r'\d{10,}')
would match 10 or more digits.
Precise number {i}
:
This is useful when trying to match a particular number of occurrences, e.g if you are trying to match
exactly 10
digis, let's say for a phone number, the regular expression would be RegExp(r'\d{10}')
RegExp optional = RegExp(r'flavou?r'); // matches 'flavour' or 'flavor'
optional.hasMatch('flavour'); // true
optional.hasMatch('flavor'); // true
optional.hasMatch('flavr'); // false
RegExp oneOrMore = RegExp(r'hi+'); // matches 'hi' or 'hii' or 'hiii'...
oneOrMore.hasMatch('hiiii'); // true
oneOrMore.hasMatch('h'); // false
RegExp zeroOrMore = RegExp(r'Sample\d*'); // matches 'Sample', 'Sample1', 'Sample21', 'Sample456',...
zeroOrMore.hasMatch('Sample'); // true
zeroOrMore.hasMatch('Sample468'); // true
zeroOrMore.hasMatch('Sampld'); // false
RegExp range = RegExp(r'\d{2,4}'); // matches two, three or four digits, e.g '12', '458', '7857'
range.hasMatch('457'); // true
range.hasMatch('4'); // false
RegExp openRange = RegExp(r'\d{5,}'); // matches five or more digits
openRange.hasMatch('4789'); // false
openRange.hasMatch('24789'); // true
RegExp precise = RegExp(r'\w{4}'); // Matches four alphanumeric characters, e.g 'look', 'ball', 'boy1', '2002'
precise.hasMatch('walk'); // true
precise.hasMatch('wal'); // false
Grouping Subexpressions ()
Grouping subexpressions is done by putting the characters of the subexpression in a
parenthesis('()'). Grouping
subexpressions have multiple uses, but the most important one is probably the application of
repeating and optional
characters (the previous subtopic). Instead of repeating one character multiple characters as a
group subexpression can
be repeated or made optional, e.g RegExp(r'(ha)+')
would match 1 or or more occurrences of the subexpression 'ha'.
RegExp sub1 = RegExp(r'boo(hoo)*'); // Matches zero or more occurrences of 'hoo', i.e would match 'boo', 'boohoo', 'boohoohoo',...
sub1.hasMatch('boo'); // true
sub1.hasMatch('boohoo'); // true
sub1.hasMatch('boohoohoo'); // true
sub1.hasMatch('hoo'); // false
RegExp sub2 = RegExp(r'What would you( like to)? say'); // ' like to' is optional. This would match 'What would you like to say' and also 'What would you say'
sub2.hasMatch('What would you like to say'); // true
sub2.hasMatch('What would you say') // true
sub2.hasMatch('What you say'); // false
RegExp sub3 = RegExp(r'(Meow ?){3,5}'); // This would match between 3 and 5 occurrences of 'Meow '
sub3.hasMatch('Meow'); // false
sub3.hasMatch('Meow Meow Meow'); // true
sub3.hasMatch('Meow Meow Meow Meow'); // true
Choice Patterns |
Sometimes you intention might be to match any of a few options of characters or subexpressions; in
this case you use the
pipe ('|') symbol.
The choice pattern works in a scoped model, i.e the options extend to the scope of the
subexpressions or characters. e.g
RegExp(r'pig|chicken|cow')
matches one of 'pig',
'chicken' or 'cow'
RegExp(r'1 (pig|chicken|cow)')
matches one of '1
pig', '1 chicken' or '1 cow'
RegExp(r'23 animals|1 (pig|chicken|cow)')
matches
either '23 animals' or one of '1 pig', '1 chicken', '1 cow'
RegExp fruit = RegExp(r'orange|banana|pineapple|water melon|apple'); // This would match one of 'orange', 'banana', 'pineapple', 'water melon' and 'apple'
RegExp regExp1 = RegExp(r'1 girl|(2|3|4) girls'); // This would match one of '1 girl', '2 girls', '3 girls', '4 girls'
Word and String Boundaries \b
, ^
,
$
Regular expressions by default finds a match anywhere it first occurs, whether it is at end, beginning or middle of the string, start, middle or end of a word. Deciding where and how a match would occur is the use of word and string boundaries.
Word Boundary \b
:
Word boundaries (\b
) are placed at the beginning or/and end of
words to ensure the match only
occurs at the start
or/and end of a word in the string. e.g.
RegExp word1 = RegExp(r'\bstand\b'); // matches a string that has a separate word 'stand'
word1.hasMatch('I understand'); // false
word1.hasMatch('You have understanding'); // false
word1.hasMatch('I am standing'); // false
word1.hasMatch('I stand'); // true
RegExp word2 = RegExp(r'\bstand'); // matches a string that contains 'stand' at the beginning of a word
word2.hasMatch('I understand'); // false
word2.hasMatch('You have understanding'); // false
word2.hasMatch('I am standing'); // true
word2.hasMatch('I stand'); // true
RegExp word3 = RegExp(r'stand\b'); // matches a string that contains 'stand' at the end of a word
word3.hasMatch('I understand'); // true
word3.hasMatch('You have understanding'); // false
word3.hasMatch('I am standing'); // false
word3.hasMatch('I stand'); // true
String boundaries ^
, $
:
String boundaries indicate that the match occurs at start or end of a string. ^
indicates start of string while $
indicates end of string. e.g
RegExp string1 = RegExp(r'^stand$'); // Matches a string that contains only 'stand'
string1.hasMatch('stand now'); // false
string1.hasMatch('now stand'); // false
string1.hasMatch('stand'); //true
RegExp string2 = RegExp(r'^stand'); // Matches a string that starts with 'stand'
string2.hasMatch('stand now'); // true
string2.hasMatch('now stand'); // false
string2.hasMatch('stand'); // true
RegExp string3 = RegExp(r'stand$'); // Matches a string that ends with 'stand'
string3.hasMatch('stand now'); // false
string3.hasMatch('now stand'); // true
string3.hasMatch('stand'); // true
Named Group Subexpressions (?<>)
This is one of the pecularities of regular expressions in dart. Naming group subexpressions has no effect on what is matched, the essence of this is to be able to access a particular match by name. e.g.
// Here there are two named group subexpressions; 'number' and 'animal' ('s' is optional)
RegExp named = RegExp(r'(?<number>\d+) (?<animal>cat|dog|cow|pig)s?');
// RegExpMatch: regular expression matches, may be null(if no match)
RegExpMatch? match1 = named.firstMatch('5 dogs');
match1?.namedGroup('number'); // '5'
match1?.namedGroup('animal'); // 'dog'
RegExpMatch? match2 = named.firstMatch('10 cats');
match2?.namedGroup('number'); // '10'
match2?.namedGroup('animal'); // 'cat'
RegExpMatch? match3 = named.firstMatch('1 pig');
match3?.namedGroup('number'); // '1'
match3?.namedGroup('animal'); // 'pig'
Parameters for Creating a RegExp Object (multiLine, caseSensitive, unicode, dotAll)
Before now, we have been creating `RegExp` objects with only a raw string
parameter(RegExp(r'example')
), but
implicitly, 4 other parameters have been set: multiLine, caseSensitive, unicode and dotAll.
The signature of the default construction of a RegExp construction is:
RegExp(String source, {bool multiLine = false, bool caseSensitive = true, bool unicode = false, bool dotAll =
false})
So when you create a RegExp object: RegExp(r'example')
, you get: RegExp(r'example', multiLine: false, caseSensitive:
true, unicode: false, dotAll: false)
.
`multiLine` parameter, default value: false
When false, the string boundary characters,^
and $
, match the beginning and end of the whole string. When false,
^
and $
match the beginning and end
of a line. e.g.
RegExp nonMulti = RegExp(r'end$'); // multiline: false
RegExp multi = RegExp(r'end$', multiLine: true);
String testString = 'This is the end\nThis is another line';
nonMulti.hasMatch(testString); // false
multi.hasMatch(testString); // true
`caseSensitive` parameter, default value: true
When true, lowercase alphabet characters match only lowercase characters and uppercase characters match only uppercase characters, i.e 'a' would match 'a' and not 'A', 'B' would match 'B' and not 'b', etc; when false, an alphabet character would match its lowercase and uppercase forms, 'a' would match both 'a' and 'A', 'B' would match both 'b' and 'B', etc. e.g.RegExp sensitive = RegExp(r'This is a pattern'); // caseSensitive: true
RegExp insensitive = RegExp(r'This is a pattern', caseSensitive: false);
sensitive.hasMatch('This Is A Pattern'); // false
insensitive.hasMatch('This Is A Pattern'); // true
`unicode` parameter, default value: false
When true, the regular expression becomes unicode-aware,i.e complex unicode character encodings, e.g\u{0221}
, are parsed and matched.
RegExp nonUnicode = RegExp(r'\u{0221}'); // unicode: true
RegExp unicode = RegExp(r'\u{0221}', caseSensitive: false);
nonUnicode.hasMatch('ȡ'); // false
unicode.hasMatch('ȡ'); // true
`dotAll` parameter, default value: false
When false, the.
character any character except the newline
character(\n);
when true, .
matches any character including the newline character
RegExp nonDotAll = RegExp(r'^.+$'); // dotAll: false
RegExp dotAll = RegExp(r'^.+$', dotAll: true);
String testString = 'This is the end\nThis is another line';
nonDotAll.hasMatch(testString); // false
dotAll.hasMatch(testString); // true
Escaping Special Characters
Special characters include \
, ^
,
$
, ?
, *
, +
, <
, >
, [
, ]
, {
, }
, .
. Sometimes
you may want to
match any of these characters directly, to do that you escape the special meaning of the
character by prefixing it
with a backslash (\
), e.g. to match $
, you instead write \$
, i.e RegExp(r'\$')
, this
would match the dollar
character('$') in a string.
// This would match a string like a string like '$500.26'
// '\$' escapes the special meaning of end of string and matches '$'
// '\d+' means 1 or more digit characters
// '\.' escapes the special meaning of '.'(matching any string except newline) and matches only '.'
// '\d{2}' means exactly 2 digit characters
RegExp money = RegExp(r'\$\d+\.\d{2}');
money.hasMatch('\$10578.43'); // true ('$' is also a special character in dart)
Greedy Matching
By default, regular expressions matching symbols (+
, *
, ?
, {}
) match as many characters
as possible, this is
called 'greedy matching' e.g. RegExp(r'\w+').hasMatch('Hippotamus')
would match the
whole string,
'Hippotamus', even
though the first letter, 'H', is good enough to satisfy the regular expression. To match as few
characters as possible (
to make the match successful ), this is called non-greedy matching, you only need to append the
question mark symbol
('?'), to the part of the regular expression you want to match non-greedily e.g
RegExp(r'\w+?').hasMatch('Hippotamus')
would only match 'H', since that is enough to make the match successful. An example of this is the
following code
snippet:
String testString = 'Superman45';
final greedy = RegExp(r'^(?\w+)(?\d*)$'); // Greedy matching on the 'letters' group subexpression
final greedyMatch = greedy.firstMatch('Superman45');
greedyMatch?.namedGroup('letters'); // 'Superman45' // Matches whole string
greedyMatch?.namedGroup('digits'); // '' // The digits are not extracted
final nonGreedy = RegExp(r'^(?\w+?)(?\d*)$'); // Non-greedy matching on the 'letters' group subexpression
final nonGreedyMatch = nonGreedy.firstMatch('Superman45');
nonGreedyMatch?.namedGroup('letters') // 'Superman'
nonGreedyMatch?.namedGroup('digits'); // '45'
It is useful to intentional about whether you're trying to match as few characters or as many
characters as possible.
For "as many as possible" use: +
, *
, ?
, {}
, and for "as few as possible" use: +?
, *?
, ??
, {}?
.
Methods for RegExp
and String
RegExp
Methods
RegExp.hasMatch
bool RegExp.hasMatch(String testString)
-> This checks
whether the regular expression has a match in the
testString and
returns a bool, true if match exists, false otherwise.
RegExp regExp = RegExp(r'love');
regExp.hasMatch('I love you.'); // true
regExp.hasMatch('I hate you.'); // false
RegExp.firstMatch
RegExpMatch? RegExp.firstMatch(String testString)
RegExp.allMatches
Iterable
RegExpMatch
.(An
iterable can be converted into a list by its toList()
method)
RegExpMatch
RegExpMatch
(a sub class of Match) results from the
firstMatch and allMatches methods of RegExp
and you
can use it to extract the
string of the named group subexpressions in the match as well as index named or unnamed
subexpressions as well as the
whole match.
If you intend to use firstMatch or allMatches methods to extract named subexpressions you should name
group expressions
you plan to match using ?<name>
as shown in the section Named Group
Subexpressions.
RegExp regExp = RegExp(r'(?(?\d+) (?cat|dog|cow|pig)s?)');
String records = '5 cats, 4 dogs, 9 cows and 15 pigs';
RegExpMatch? firstMatch = regExp.firstMatch(records); // Finds only first match
print(firstMatch?.namedGroup('record')); // '5 cats'
print(firstMatch?.namedGroup('number')); // '5'
print(firstMatch?.namedGroup('animal')); // 'cat'
print(firstMatch?[0]); // '5 cats' (whole match)
print(firstMatch?[1]): // '5 cats' (first subexpression)
print(firstMatch?[2]): // '5' (second subexpression)
print(firstMatch?[3]): // 'cat' (third subexpression)
Iterable allMatches = regExp.allMatches(records); // Finds all matches
for (RegExpMatch match in allMatches) {
print(match.namedGroup('record')) // '5 cats', '4 dogs', '9 cows', '15 pigs'
print(match.namedGroup('number')) // '5', '4', '9', '15'
print(match.namedGroup('animal')) // 'cat', 'dog', 'cow', 'pig'
}
Iterable allMatchesWithStartIndex = regExp.allMatches(records, 8); // Finds all matches from index 8
for (RegExpMatch match in allMatchesWithStartIndex) {
print(match.namedGroup('record')) // 4 dogs', '9 cows', '15 pigs'
print(match.namedGroup('number')) // '4', '9', '15'
print(match.namedGroup('animal')) // 'dog', 'cow', 'pig'
}
RegExpMatch
also shares the indexing operation with
its superclass Match
, index 0 gets the whole match,
index 1 gets
the first grouped subexpression, index 2 the next and so on in a depth-first-search way. In this
there is no need for
naming group subexpressions e.g.
RegExp regExp = RegExp(r'\d(\d(\d)(\d))(\d(\d(\d)(\d)))\d');
RegExpMatch? match = regExp.firstMatch('012345678');
print(match?[0]); // '012345678'
print(match?[1]); // '123'
print(match?[2]); // '2'
print(match?[3]); // '3'
print(match?[4]); // '4567'
print(match?[5]); // '567'
print(match?[6]); // '6'
print(match?[7]); // '7'
String Methods
String.replaceFirst
:
String replaceFirst(
RegExp pattern,
String replace,
[int startIndex = 0],
)
This checks the first incident of a match the pattern and replaces it with the replace string.
String originString = 'This is the origin string';
RegExp pattern = RegExp(r'origin'); // Matches the word 'origin'
String replace = 'first';
print(originString.replaceFirst(pattern, replace)); // 'This is the first string'
String.replaceAll
:
String replaceAll(
RegExp pattern,
String replace,
)
This replaces all matches of the pattern with the replace string.
String originString = 'Almost all information will be lost';
RegExp pattern = RegExp(r'\b\w+\b'); // Matches all words
String replace = 'lost';
print(originString.replaceAll(pattern, replace)); // 'lost lost lost lost lost lost'
String.replaceFirstMapped
:
String replaceFirstMapped(
RegExp pattern,
String replace(
Match match
),
[int startIndex = 0],
)
This is like the `replaceFirst` method except instead of a replace `String`, there is a function that takes a `Match` object and returns a `String` e.g.
RegExp regExp = RegExp(r'(\w+)\s(\w+)');
String lastFirst = 'Smith John';
String firstLast = lastFirst.replaceFirstMapped(regExp,
(Match m) => '${m[2]} ${m[1]}'
);
print(firstLast); // 'John Smith'
String.replaceAllMapped
:
String replaceAllMapped(
RegExp pattern,
String replace(
Match match
),
)
This is just like the `replaceFirstMapped` method but instead replaces all occurrences of a match in the String. e.g.
RegExp regExp = RegExp(r'(\w+)\s(\w+)'); // Matches a name
String lastFirst = 'Smith John\nNakamura Hikaru\nAdebayo Peter\nMa Long';
String firstLast = lastFirst.replaceAllMapped(regExp,
(Match m) => '${m[2]} ${m[1]}'
);
print(firstLast); // 'John Smith\nHikaru Nakamura\nPeter Adebayo\nLong Ma'
Examples of Regular Expressions And Functions
/// Extracts file extension from string
String? extractExt(String fileName) {
final pattern = RegExp(r'\.(?<ext>[0-9a-zA-Z]+)$');
final match = pattern.firstMatch(fileName);
return match?.namedGroup('ext');
}
extractExt('verygoodfile.dart'); // 'dart'
/// Validates email string, *non ascii characters are not accepted*
bool isValidEmail(String email) {
final pattern = RegExp(r'^[a-zA-Z0-9.!#$%&’*+/=?^_`{|}~-]+@[a-zA-Z0-9-]+(?:\.[a-zA-Z0-9-]+)*$');
return pattern.hasMatch(email.trim());
}
/// Password is valid if it has an uppercase, lowercase, number, symbol and has at least 8 characters
bool isPasswordValid(String? password) {
final containsUpperCase = RegExp(r'[A-Z]').hasMatch(password);
final containsLowerCase = RegExp(r'[a-z]').hasMatch(password);
final containsNumber = RegExp(r'\d').hasMatch(password);
final containsSymbols = RegExp(r'[`~!@#$%\^&*\(\)_+\\\-={}\[\]\/.,<>;]').hasMatch(password);
final hasManyCharacters = RegExp(r'^.{8,128}$', dotAll: true).hasMatch(password); // This is variable
return containsUpperCase && containsLowerCase && containsNumber && containsSymbols && hasManyCharacters;
}