regex - Retain carriage returns in text filtered through a regular expression -
i need search though folder of logs , retrieve recent logs. need filter each log, pull out relevant information , save file.
the problem regular expression use filter log dropping carriage return , line feed new file contains jumble of text.
$reg = "(?ms)\*{6}\sbegin(.|\n){98}13.06.2015(.|\n){104}00000003.*(?!\*\*)+" get-childitem "logfolder" -filter *.log | where-object {$_.lastaccesstime -gt [datetime]$test.starttime} | foreach { $a=get-content $_; [regex]::matches($a,$reg) | foreach {$_.groups[0].value > "myoutfile"} }
log structure:
******* begin message ******* <info line 1> date 18.03.2010 15:07:37 18.03.2010 <info line 2> file number: 00000003 <info line 3> *variable number of lines* ******* end message *******
basically capture between begin
, end
dates , file numbers value. know how can without losing line feeds? tried using out-file | select-string -pattern $reg
, i've never had success using select-string
on multiline record.
as @matt pointed out, need read entire file single string if want multiline matches. otherwise (multiline) regular expression applied single lines 1 after other. there several ways content of file single string:
(get-content 'c:\path\to\file.txt') -join "`r`n"
get-content 'c:\path\to\file.txt' | out-string
get-content 'c:\path\to\file.txt' -raw
(requires powershell v3 or newer)[io.file]::readalltext('c:\path\to\file.txt')
also, i'd modify regular expression little. of time log messages may vary in length, matching fixed lengths may fail if log message changes. it's better match on invariant parts of string , leave rest variable length matches. , find lot easier kind of content extraction in several steps (makes simpler regular expressions). in case first separate log entries each other, , filter content:
$date = [regex]::escape('13.06.2015') $fnum = '00000003' $re1 = "(?ms)\*{7} begin message \*{7}\s*([\s\s]*?)\*{7} end message \*{7}" $re2 = "(?ms)[\s\s]*?date\s+$date[\s\s]*?file number:\s+$fnum[\s\s]*" get-childitem 'c:\log\folder' -filter '*.log' | ? { $_.lastaccesstime -gt [datetime]$test.starttime } | % { get-content $_.fullname -raw | select-string -pattern $re1 -allmatches | select -expand matches | % { $_.groups[1].value | select-string -pattern $re2 | select -expand matches | select -expand groups | select -expand value } } | set-content 'c:\path\to\output.txt'
btw, don't use redirection operator (>
) inside loop. overwrite output file's content each iteration. if must write file inside loop use append redirection operator instead (>>
). however, performance-wise it's better put writing output files @ end of pipeline (see above).
Comments
Post a Comment