Daniel Lemire's blog
Regular expressions can blow up!

Regular expressions, often abbreviated as regex, are a powerful tool for pattern matching within text. For example, the expression
\d*\.?\d+

would match a positive number such as 1.1 or 12. If designed and tested with care, regular expressions may be used in mission-critical software. However, their power comes with a risk: it is possible to design small regular expressions that are very expensive to run on even small strings.

To make matters more complicated, there are several regular-expression engines, and they differ in their syntax and implementation. Let me consider the regular-expression engine used by the C++ language under Linux (libgc++).

Consider the following program. It uses the string “Everyone loves Lucy.” and d a regex pattern (.*+s}}@w. I am not exactly sure what this pattern is supposed to do, but it is accepted by the engine. The program then uses std::regex_search to look for matches of this pattern within the string, storing potential matches in a std::smatch object, and outputs whether a match was found or not.
#include <iostream>
#include <regex>

int main() {
    std::string text = "Everyone loves Lucy.";
    std::regex pattern(R"(.*+s}}@w)");
    // Perform regex search
    std::smatch match;
    bool found = std::regex_search(text, match, pattern);
    std::cout << "Regex search result: "
          << (found ? "Match found" : "No match") << std::endl;
    return 0;
}

Using GCC 12 and a recent Linux server, this program takes about 7 minutes to run.

In other words, a bad regular expression can crash your systems. It is not just theoretical, the Cloudflare corporation suffered a major outage in 2019 due to a bad regular expression.

Use regular expressions with care.

source
 
 
Back to Top