July 22, 2011

Regex matching any sign(.) + whitespace characters

To match any sign with a regex the following simple piece of code would work just fine.

Pattern pattern = Pattern.compile(".*"); //compile the pattern
Matcher marcher = pattern.matcher(string); //assign the string to be matched
matcher.find() //find whether the pattern exists. Returns a boolean

But, if you need to include whitespace characters, that is \n,\t, etc the '.' regex will not do the trick. (There is also "\s" regex which captures whitespace characters, but it didn't quite address my problem)

While wandering through the Java API I found a flag which addresses this problem. That is  DOTALL. It will include line terminators into the regex while compiling the pattern. Therefore the new code would be,

Pattern pattern = Pattern.compile(".*", Pattern.DOTALL); //compile the pattern with terminating characters
Matcher marcher = pattern.matcher(string); //assign the string to be matched
matcher.find() //find whether the pattern exists. Returns a boolean

Failed attempt :
  • Tried to include '\n' in the regex.
To specify any two types of characters in a regex usually it is as "[ab]*". Pattern  "[a\\n]*" also works fine. But for some reason "[.\\n]*" doesn't work. So above was the successful solution I found to deal with it.

For a very good tutorial on regex refer here.

Tips
  • Match period
    "." is the symbol used for any character. So if you want to match "." specifically, you need to define the regex as "\\."