Daniel Lemire's blog
Regular expressions can blow up!
Regular expressions, often abbreviated as regex, are a powerful tool for pattern matching within text. For example, the expression
would match a positive number such as 1.1 or 12. If designed and tested with care, regular expressions may be used in mission-critical software. However, their power comes with a risk: it is possible to design small regular expressions that are very expensive to run on even small strings.
To make matters more complicated, there are several regular-expression engines, and they differ in their syntax and implementation. Let me consider the regular-expression engine used by the C++ language under Linux (libgc++).
Consider the following program. It uses the string “Everyone loves Lucy.” and d a regex pattern (.*+s}}@w. I am not exactly sure what this pattern is supposed to do, but it is accepted by the engine. The program then uses std::regex_search to look for matches of this pattern within the string, storing potential matches in a std::smatch object, and outputs whether a match was found or not.
Using GCC 12 and a recent Linux server, this program takes about 7 minutes to run.
In other words, a bad regular expression can crash your systems. It is not just theoretical, the Cloudflare corporation suffered a major outage in 2019 due to a bad regular expression.
Use regular expressions with care.
source
Regular expressions can blow up!
Regular expressions, often abbreviated as regex, are a powerful tool for pattern matching within text. For example, the expression
\d*\.?\d+
would match a positive number such as 1.1 or 12. If designed and tested with care, regular expressions may be used in mission-critical software. However, their power comes with a risk: it is possible to design small regular expressions that are very expensive to run on even small strings.
To make matters more complicated, there are several regular-expression engines, and they differ in their syntax and implementation. Let me consider the regular-expression engine used by the C++ language under Linux (libgc++).
Consider the following program. It uses the string “Everyone loves Lucy.” and d a regex pattern (.*+s}}@w. I am not exactly sure what this pattern is supposed to do, but it is accepted by the engine. The program then uses std::regex_search to look for matches of this pattern within the string, storing potential matches in a std::smatch object, and outputs whether a match was found or not.
#include <iostream>
#include <regex>
int main() {
std::string text = "Everyone loves Lucy.";
std::regex pattern(R"(.*+s}}@w)");
// Perform regex search
std::smatch match;
bool found = std::regex_search(text, match, pattern);
std::cout << "Regex search result: "
<< (found ? "Match found" : "No match") << std::endl;
return 0;
}
Using GCC 12 and a recent Linux server, this program takes about 7 minutes to run.
In other words, a bad regular expression can crash your systems. It is not just theoretical, the Cloudflare corporation suffered a major outage in 2019 due to a bad regular expression.
Use regular expressions with care.
source