Book of Coding


Regular expressions

Regular expressions are special character patterns that are used to check whether a text or character string contains a specific character combination. The RegExp type is available for working with regular expressions, and strings offer various methods for evaluating regular expressions. A total of six different methods are available:

  • test(): The method of type RegExp checks whether a specific pattern occurs in a string and returns a corresponding Boolean value.
  • exec(): The method of type RegExp searches for occurrences of a corresponding pattern and returns them as an array.
  • match(): The method of strings searches within a string for the occurrences that match a pattern and returns them as an array.
  • replace(): The method of character strings replaces occurrences within a string that match a corresponding pattern.
  • search(): The method of strings searches for occurrences of a corresponding pattern and returns the index of the first occurrence.
  • split(): The method of strings separates a string using a corresponding pattern of characters and returns the individual parts of the string as an array.

1. Define regular expressions

Regular expressions can also be created using a constructor function or literal notation, as with objects and arrays.


  • Creating a regular expression using literal notation

 const regExp = /abcde/;
                        

  • Creating a regular expression using a constructor function

 const regExp = new RegExp('abcde');
                        

If a regular expression is to be generated dynamically at the runtime of a program, for example based on user input, the constructor function should be used. If the regular expression is fixed and no longer changes at program runtime, the literal notation should be used.


2. Test characters against a regular expression

First of all, you need a method that can be used to test regular expressions against a character string and check whether the character string matches the regular expression. This is possible with the test() method. This method expects the character string to be checked as an argument. The return value is a Boolean value that indicates whether the character string matches the regular expression or not.

Complete Code - Examples/Part_215/main.js...

 const regExp = /abcde/;
 console.log(regExp.test('abcdefghijklmnopqrstuvwxyz')); // output: true
                        

The method returns true because the passed string contains the regular expression /abcde/.

The example is not particularly complex. This is not usually the case if you want to extract all email addresses from a text, for example. The next example is a little more complex and explains step by step how to understand the various formulations in regular expressions.

First, we check whether a character string contains at least 13 characters. To express with a regular expression that a character string contains at least 13 characters, simply write 13 dots in a row /............./. If you apply this to an international telephone number, for example. Then the method test() returns the Boolean value true for the telephone number if the number has at least 13 characters, and false if it is less than 13 characters.

Complete Code - Examples/Part_216/main.js...

 const regExp = /............./;
 console.log(regExp.test('49 30 1234567'));    // true
 console.log(regExp.test('61 45 123456789'));  // true
 console.log(regExp.test('84 77 12345'));      // false

 /* International phone numbers consist of a 
 country code (e.g. Germany +49, Australia +61), 
 followed by a space and the area code (here e.g. 30), 
 followed by another space and 
 the actual phone number (here e.g. 12345...).*/
                        

The dot has the meaning of any character, so as long as the string has 13 characters, true is always returned:


 const regExp = /............./;
 console.log(regExp.test('Hello World'));        // false
 console.log(regExp.test('Hello Developer'));    // true
 console.log(regExp.test('Hello JavaScript'));   // true
                        

In addition to the dot, which means that any character matches the regular expression, there are other special characters within regular expressions:

ExpressionMeaning
. any character, except newline
a the character "a"
ab the character string "ab"
a|b the characters "a" or "b"

To ensure that the regular expression for the international telephone numbers only matches those character strings that have the required numbers in the corresponding positions, the so-called character class can be used.


3. Character classes

A character class is defined within a regular expression using square brackets: The opening square bracket starts the character class, the closing square bracket ends the character class.


Definition of a character class:

Complete Code - Examples/Part_217/main.js...

 const regExp = /[abcde]/;
 console.log(regExp.test('a'));      // true
 console.log(regExp.test('f'));      // false
 console.log(regExp.test('afghj'));  // true
 console.log(regExp.test('fghij'));  // false  
                        

The different character classes for regular expressions:

NameSpellingMeaning
simple class [abc] one of the characters "a", "b" or "c"
negation [^abc] none of the characters "a", "b" or "c", but any other character
range [a-zA-Z] a lowercase letter or an uppercase letter, i.e. one of the characters between "a" and "z" or "A" and "Z"

With this knowledge, we now return to the telephone number example to adapt the corresponding regular expression. The dot . is now replaced by the character class [0-9] to express that there are really only digits at the corresponding positions in the telephone number:

Complete Code - Examples/Part_218/main.js...

 const regExp = /[0-9][0-9].[0-9][0-9].[0-9][0-9][0-9][0-9][0-9][0-9][0-9]/;
 console.log(regExp.test('49 30 1234567'));      // true
 console.log(regExp.test('61 45 123456789'));    // true
 console.log(regExp.test('84 77 12345'));        // false - because no 13 characters
 console.log(regExp.test('Hello World'));        // false
 console.log(regExp.test('Hello Developer'));    // false  
                        

However, any characters are still allowed in this code, namely at the position where spaces should actually be:


 const regExp = /[0-9][0-9].[0-9][0-9].[0-9][0-9][0-9][0-9][0-9][0-9][0-9]/;

 console.log(regExp.test('49x30x1234567'));      // true  
                        

This can be modified by replacing the period . with a space:


 const regExp = /[0-9][0-9] [0-9][0-9] [0-9][0-9][0-9][0-9][0-9][0-9][0-9]/;

 console.log(regExp.test('49x30x1234567'));      // false
                        

The spaces now ensure that no other characters are allowed between the digits.


Pre-defined character classes

In addition to your own character classes, as shown in the example above, there is also the option of using pre-defined character classes. Instead of using, for example, [0-9] as a character class for a number between 0 and 9, it is sufficient to use \d, as it has the same meaning but is already predefined.


SpellingMeaningShort form for ...
. any character
\d a number between 0 and 9 [0-9]
\D a character that is not exactly a number [^0-9]
\s everything that is a whitespace (empty space) [\t\n\x0B\f\r]
\S everything that is not whitespace (empty space) [^\s]
\w everything that is a word mark [a-zA-Z_0-9]
\W everything that is not a word mark [^\w]

The regular expression for the example with the telephone numbers can be reformulated using predefined character classes:

Complete Code - Examples/Part_219/main.js...

 const regExp = /\d\d\s\d\d\s\d\d\d\d\d\d\d/;
 console.log(regExp.test('49 30 1234567'));      // true
 console.log(regExp.test('Hello World'));        // false
 console.log(regExp.test('Hello Developer'));    // false
 console.log(regExp.test('61 45 123456789'));    // true
 console.log(regExp.test('49x30x1234567'));      // false
                            

4. Limit the start and end of the regular expression

In the example with the telephone numbers, a true is also output for those character strings in which a matching part is found at some point:

Complete Code - Examples/Part_220/main.js...

 const regExp = /\d\d\s\d\d\s\d\d\d\d\d\d\d/;
 console.log(regExp.test('12349 30 12345678'));  // true
 console.log(regExp.test('449 30 123456789'));   // true
                        

However, if you only want a true to be returned for those character strings that not only contain a sequence of characters matching the regular expression at some point, but that completely match the regular expression, you must limit the start and end of the regular expression. This is possible with the characters: ^ (circumflex), this character defines the beginning of a character string and the $ character defines the end of a character string.


  • Limits the start of a string:
Complete Code - Examples/Part_221/main.js...

 const regExp = /^T/;
 console.log(regExp.test('Tortoise'));       // true
 console.log(regExp.test('Baby tortoise'));  // false
                        

The expression /^T/ matches the string Tortoise but not Baby tortoise because the “T” is not at the beginning of the string.


  • Limits the end of a string:
Complete Code - Examples/Part_222/main.js...

 const regExp = /e$/;
 console.log(regExp.test('Tortoise'));   // true
 console.log(regExp.test('Leonardo'));   // false
                        

The expression /e$/ matches Tortoise but not Leonardo, because here the "e" is not at the end.


Both expressions can also be combined with each other:

Complete Code - Examples/Part_223/main.js...

 const regExp = /^Tortoise$/;
 console.log(regExp.test('Tortoise'));       // true
 console.log(regExp.test('Baby tortoise'));  // false
 console.log(regExp.test('Leonardo'));       // false
                        

In addition to limiting the entire character string, it is also possible to test the limits of individual words with \b:

Complete Code - Examples/Part_224/main.js...

 const regExp = /\bplay\b/;
 console.log(regExp.test('I play the guitar.'));         // true
 console.log(regExp.test('I am a guitar player.'));      // false
                        

The various expressions for describing word boundaries and the beginning and end of strings:

ExpressionMeaning
^ beginning of a string
$ end of a string
\b beginning or end of a word
\B no word boundary

If this knowledge of regular expressions is now applied to the example with the telephone numbers, it would look like this:

Complete Code - Examples/Part_225/main.js...

 const regExp = /^\d\d\s\d\d\s\d\d\d\d\d\d\d$/;
 console.log(regExp.test('49 30 1234567'));        // true
 console.log(regExp.test('Hello World'));          // false
 console.log(regExp.test('Hello Developer'));      // false
 console.log(regExp.test('49x30x1234567'));        // false
 console.log(regExp.test('61 45 123456789'));      // false
 console.log(regExp.test('449 30 1234567'));       // false
                        

5. Quantifier

Quantifiers can be used to define how many instances of a character or character class must be present in the string for it to be accepted by the regular expression.

The following can be determined:

  • A character or character class can optionally be present exactly once or not at all.
  • A character or character class can be present any number of times, even not once.
  • A character or character class must be present at least once.
  • A character or character class must be present exactly x times.
  • A character or character class must be present at least x times.
  • A character or character class must be present at least x times and a maximum of y times.

Define optional instances

To define that a character may optionally occur, a question mark ? is placed after this character in the regular expression:

Complete Code - Examples/Part_226/main.js...

 const regExp = /^abcdef?$/;
 console.log(regExp.test('abcde'));    // true  
 console.log(regExp.test('abcdef'));   // true
 console.log(regExp.test('abcdeff'));  // false
                            

The regular expression returns a true for both the character string abcde and the character string abcdef because the f has been marked as optional.


Define any number of instances

To define that a character may occur any number of times, an asterisk symbol * is placed after the character:

Complete Code - Examples/Part_227/main.js...

 const regExp = /^abcdef*$/;
 console.log(regExp.test('abcde'));    // true
 console.log(regExp.test('abcdef'));   // true
 console.log(regExp.test('abcdeff'));  // true
                            

Define at least one instance

To define that a character must occur at least once, but may occur any number of times, a plus sign + is placed after the character:

Complete Code - Examples/Part_228/main.js...

 const regExp = /^abcdef+$/;
 console.log(regExp.test('abcde'));    // false
 console.log(regExp.test('abcdef'));   // true
 console.log(regExp.test('abcdeff'));  // true
                            

The regular expression only returns accordingly if there is at least one f at the end of the string.


Define exact number of instances

It is also possible to define that a certain number of characters should occur. To do this, the number is written in curly brackets after the corresponding character:

Complete Code - Examples/Part_229/main.js...

 const regExp = /^abcdef{2}$/;
 console.log(regExp.test('abcde'));      // false
 console.log(regExp.test('abcdef'));     // false
 console.log(regExp.test('abcdeff'));    // true
 console.log(regExp.test('abcdefff'));   // false
 console.log(regExp.test('abcdeffff'));  // false
                            

The regular expression only returns true if there are exactly two f at the end of the string.


Define minimum number of instances

Any minimum number of required instances can also be defined using the notation of the curly brackets. To do this, a comma is placed after the number (in curly brackets):

Complete Code - Examples/Part_230/main.js...

 const regExp = /^abcdef{2,}$/;
 console.log(regExp.test('abcde'));      // false
 console.log(regExp.test('abcdef'));     // false
 console.log(regExp.test('abcdeff'));    // true
 console.log(regExp.test('abcdefff'));   // true
 console.log(regExp.test('abcdeffff'));  // true
                            

The regular expression only returns true if the f appears at least twice at the end of the string.


Define minimum and maximum number of instances

Both the minimum and maximum number of instances are defined by adding the maximum number of instances after the comma:

Complete Code - Examples/Part_231/main.js...

 const regExp = /^abcdef{2,3}$/;
 console.log(regExp.test('abcde'));      // false
 console.log(regExp.test('abcdef'));     // false
 console.log(regExp.test('abcdeff'));    // true
 console.log(regExp.test('abcdefff'));   // true
 console.log(regExp.test('abcdeffff'));  // false
                            

The regular expression only returns true if there are exactly two instances of f or exactly three instances of f at the end of the string.


If this knowledge of regular expressions is now applied to the example with the telephone numbers, it would look like this:

Complete Code - Examples/Part_232/main.js...

 const regExp = /^\d{2}\s\d{2}\s\d{5,7}$/;
 console.log(regExp.test('49 30 1234567'));          // true
 console.log(regExp.test('61 45 1234567'));          // true
 console.log(regExp.test('49 30 12345'));            // true
 console.log(regExp.test('Hello World'));            // false
 console.log(regExp.test('Hello Developer'));        // false
 console.log(regExp.test('49x30x1234567'));          // false
 console.log(regExp.test('61 45 123456789'));        // false
 console.log(regExp.test('449 30 1234567'));         // false
 console.log(regExp.test('+49 30 1234567'));         // false
                            

The ? qualifier can also be used to extend the regular expression so that a telephone number can optionally begin with a + character. As the + character has a special meaning within regular expressions, it must be escaped with a backslash character\:

Complete Code - Examples/Part_233/main.js...

 const regExp = /^\+?\d{2}\s\d{2}\s\d{5,7}$/;
 console.log(regExp.test('+49 30 1234567'));         // true
                            

6. Search for instances

In the previous examples, the test() method was used to check whether a character string matches a regular expression and a Boolean value was obtained each time. However, there are other methods, e.g. the exec() method. Like the test() method, this method expects a character string as an argument to which the regular expression is to be applied. However, the exec() method does not return a Boolean value, but information about the instance.

More specifically, the return value is an array that contains the character string that matches the regular expression in the first position and has two properties: The index property contains the index at which the instance was found in the passed string, the input property contains the passed string.


Properties and elements of the return value of exec():

Property/ElementDescription
index the position where the instance was found
input the character string that was passed to the exec() method
[0] the instance
[1],[2],...,[n] the substrings of the instance

Complete Code - Examples/Part_234/main.js...

 const text = 'The phone number is +49 30 1234567.';
 const regExp = /\+?\d{2}\s\d{2}\s\d{5,7}/;
 const result = regExp.exec(text)
 console.log(
   'Number ' + result[0]
   + ' found at Index ' + result.index
   + '.'
 );

 /* output:
 Number +49 30 1234567 found at Index 20.
 */
                        

Here, the regular expression for the telephone number is applied to the text The phone number is +49 30 1234567. using exec(). The result array, which is stored in the variable result, then contains the telephone number found at position 0 and the index of this instance in the transferred character string in the index property.


7. Search for all instances within a string

If you want to search for multiple instances in a string, you have to add a g after the regular expression. This is a so-called flag (or modifier) that can be used to configure the way in which the regular expression works. The g stands for global, which means that not only the first instance should be searched for, but all instances. The exec() method must then be called several times to find all instances one after the other.

Complete Code - Examples/Part_235/main.js...

 const text = 'The private phone number is +49 30 7654321,' +
   ' the business telephone number is +49 30 1234567.';
 const regExp = /\+?\d{2}\s\d{2}\s\d{5,7}/g;
 let result;
 while ((result = regExp.exec(text)) !== null) {
   console.log(
     'Number ' + result[0]
     + ' found at Index ' + result.index
     + '.'
   );
 }

 /* output:
 Number +49 30 7654321 found at Index 28.
 Number +49 30 1234567 found at Index 77.
 */
                        

The various modifications for regular expressions:

Flag/ModifierDescription
g The regular expression should be applied globally, g = global.
i The regular expression is not case-sensitive, i = ignore case
m The regular expression should be applied over several lines (m = multiline) if the corresponding character string extends over several lines.

Since ES2021, the replaceAll() method has been available for line strings, which by default replaces all instances found without specifying flags. For many applications, replaceAll() is more suitable.


8. Access to individual parts of an instance

Groups are particularly helpful with regular expressions. Groups can be used to access certain parts of a character string that matches a regular expression. Groups are defined within a regular expression using round brackets. The opening bracket defines the beginning of the group, the closing bracket the end of the group. When using groups from index 1, the return value of exec() contains the characters that fall into one of the defined groups.

Complete Code - Examples/Part_236/main.js...

 const pattern = /^(\d{4})-(\d{2})-(\d{2})$/u;
 const result = pattern.exec('2024-05-28');
 console.log(result[0]);    // 2024-05-28
 console.log(result[1]);    // 2024
 console.log(result[2]);    // 05
 console.log(result[3]);    // 28
 console.log(result.index); // 0
 console.log(result.input); // 2024-05-28
                        

Various groups are defined here, each comprising the various components of a date: Groups are defined for the year, the month and the calendar day. It is then possible to access exactly three parts via the indices 1, 2 and 3 of the result of exec(). All characters that do not fall into a group (here \s = spaces) are not included in the result.

Since ES2018, it has also been possible to specify named groups, whereby the name is defined by a preceding question mark within two angle brackets <>. The individual instances can then also be accessed via the name of the respective group:

Complete Code - Examples/Part_237/main.js...

 const pattern = /^(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})$/u;
 const result = pattern.exec('2024-05-28');
 console.log(result.groups.year);  // 2024
 console.log(result.groups.month); // 05
 console.log(result.groups.day);   // 28
                        

9. Search for specific strings

The match() method, which is available for character strings, works in a similar way to the exec() method of regular expressions. The difference: If the g flag is used, the match() method returns all instances found, while the exec() method still only returns the first instance found.

Complete Code - Examples/Part_238/main.js...

 const regExp = /\+?\d{2}\s\d{2}\s\d{5,7}/g;
 const string = 'One phone number: 49 30 1234567, and another: 49 30 7654321';
 const result = string.match(regExp);
 console.log(result[0]);             // 49 30 1234567
 console.log(result[1]);             // 49 30 7654321
 const result2 = regExp.exec(string);
 console.log(result2[0]);            // 49 30 1234567
 console.log(result2[1]);            // undefined
                        

10. Replace instances within a string

The replace() method can be used to replace individual instances of a pattern with a new character string. Both a regular expression and a character string can be passed to the method as the first argument. The character string with which the respective instance is to be replaced is passed as the second argument.

Complete Code - Examples/Part_239/main.js...

 let text = 'The private phone number is +49 30 7654321,' +
   ' the business phone number is +49 30 1234567.';
 const regExp = /(\+?\d{2})\s(\d{2})\s(\d{5,7})/g;
 text = text.replace(regExp, '<Hidden number>');
 console.log(text);

 /* output:
 The private phone number is <Hidden number>, 
 the business phone number is <Hidden number>. */
                        

Here, replace() is used to replace all phone numbers found with the character string <Hidden number>.

It is much more flexible if you pass a function rather than a string as the second argument. This in turn receives the character string found as an argument. The return value of the function is then the value with which the character string found is replaced.

Complete Code - Examples/Part_240/main.js...

 let text = 'The private phone number is +49 30 7654321,' +
   ' the business phone number is +49 30 1234567.';
 const regExp = /(\+?\d{2})\s(\d{2})\s(\d{5,7})/g;
 text = text.replace(regExp, function(number) {
   return number.substring(0, 9) + 'XXXXX';
 });
 console.log(text);
 /* The private phone number is +49 30 76XXXXX,
 the business phone number is +49 30 12XXXXX. */
                        

In this case, the entire telephone number is not replaced by the character string XXXXX, but only the last five digits.


11. Search for instances

The search() method can be used to search within a character string for the first instance that matches the corresponding regular expression.

Complete Code - Examples/Part_241/main.js...

 const text = 'This text contains a phone number: +49 30 1234567.';
 const text2 = 'This text does not contain a phone number.';
 const regExp = /(\+?\d{2})\s(\d{2})\s(\d{5,7})/g;
 console.log(text.search(regExp));   // output: 35
 console.log(text2.search(regExp));  // output: -1
                        

Here, the regular expression applies to the telephone numbers, the search() method therefore finds the first telephone number within a character string and returns the index within the character string at which the telephone number was found. If no telephone number was found, the method returns the value -1.


12. Split strings

The split() method can be used to split character strings using a regular expression or another character string.

Complete Code - Examples/Part_242/main.js...

 const text = 'Rick,Sample,420,62,176,72';
 const result = text.split(',');
 const firstName = result[0];
 const lastName = result[1];
 const id = result[2];
 const age = result[3];
 const height = result[4];
 const weight = result[5];
 console.log(firstName);   // Rick
 console.log(lastName);    // Sample
 console.log(id);          // 420
 console.log(age);         // 62
 console.log(height);      // 176
 console.log(weight);      // 72
                        

This string is split using split() to the values separated by commas.