php - Regex - the difference in \\n and \n -
sorry add "regex explanation" question internet must know reason this. have ran regex through regexbuddy , regex101.com no help.
i came across following regex ("%4d%[^\\n]"
) while debugging time parsing function. every , receive 'invalid date' error during months of january , june. mocked code recreate happening can't figure out why removing 1 slash fixes it.
<?php $format = '%y/%b/%d'; $random_date_strings = array( '2015/jan/03', '1985/feb/13', '2001/mar/25', '1948/apr/02', '1948/may/19', '2020/jun/22', '1867/jul/09', '1901/aug/11', '1945/sep/21', '2000/oct/31', '2009/nov/24', '2015/dec/02' ); $year = null; $rest_of_string = null; echo 'bad regex:'; echo '<br/><br/>'; foreach ($random_date_strings $date_string) { sscanf($date_string, "%4d%[^\\n]", $year, $rest_of_string); print_data($date_string, $year, $rest_of_string); } echo 'good regex:'; echo '<br/><br/>'; foreach ($random_date_strings $date_string) { sscanf($date_string, "%4d%[^\n]", $year, $rest_of_string); print_data($date_string, $year, $rest_of_string); } function print_data($d, $y, $r) { echo 'date string: ' . $d; echo '<br/>'; echo 'year: ' . $y; echo '<br/>'; echo 'rest of string: ' . $r; echo '<br/>'; } ?>
feel free run locally 2 outputs i'm concerned months of june , january. "%4d%[^\\n]"
truncate $rest_of_string
/ju
, /ja
while "%4d%[^\n]"
displays rest of string expected (/jan/03
& /jun/22
).
here's interpretation of faulty regex:
%4d%
- 4 digits.[^\\n]
- digits in between beginning of string , new line.
can please correct explanation and/or tell me why removing slash gives me result expect?
i don't care how...i need why.
like @lucastrzesniewski pointed out, that's sscanf()
syntax, has nothing regex. format explained in sprintf()
page.
in pattern "%4d%[^\\n]"
, 2 \\
translate single backslash character. correct interpretation of "faulty" pattern is:
%4d
- 4 digits.%[^\\n]
- characters not backslash or letter "n"
that's why matches until "n" in "jan" , "jun".
the correct pattern "%4d%[^\n]"
, \n translates new line character, , it's interpretation is:
%4d
- 4 digits.%[^\n]
- characters not new line
Comments
Post a Comment