regex - Retain carriage returns in text filtered through a regular expression -


i need search though folder of logs , retrieve recent logs. need filter each log, pull out relevant information , save file.

the problem regular expression use filter log dropping carriage return , line feed new file contains jumble of text.

$reg = "(?ms)\*{6}\sbegin(.|\n){98}13.06.2015(.|\n){104}00000003.*(?!\*\*)+" get-childitem "logfolder" -filter *.log |   where-object {$_.lastaccesstime -gt [datetime]$test.starttime} |    foreach {      $a=get-content $_;      [regex]::matches($a,$reg) | foreach {$_.groups[0].value > "myoutfile"}   } 

log structure:

******* begin message *******  <info line 1> date   18.03.2010 15:07:37   18.03.2010 <info line 2>       file number:  00000003 <info line 3>     *variable number of lines* ******* end message ******* 

basically capture between begin , end dates , file numbers value. know how can without losing line feeds? tried using out-file | select-string -pattern $reg, i've never had success using select-string on multiline record.

as @matt pointed out, need read entire file single string if want multiline matches. otherwise (multiline) regular expression applied single lines 1 after other. there several ways content of file single string:

  • (get-content 'c:\path\to\file.txt') -join "`r`n"
  • get-content 'c:\path\to\file.txt' | out-string
  • get-content 'c:\path\to\file.txt' -raw (requires powershell v3 or newer)
  • [io.file]::readalltext('c:\path\to\file.txt')

also, i'd modify regular expression little. of time log messages may vary in length, matching fixed lengths may fail if log message changes. it's better match on invariant parts of string , leave rest variable length matches. , find lot easier kind of content extraction in several steps (makes simpler regular expressions). in case first separate log entries each other, , filter content:

$date = [regex]::escape('13.06.2015') $fnum = '00000003'  $re1 = "(?ms)\*{7} begin message \*{7}\s*([\s\s]*?)\*{7} end message \*{7}" $re2 = "(?ms)[\s\s]*?date\s+$date[\s\s]*?file number:\s+$fnum[\s\s]*"  get-childitem 'c:\log\folder' -filter '*.log' | ? {   $_.lastaccesstime -gt [datetime]$test.starttime } | % {   get-content $_.fullname -raw |     select-string -pattern $re1 -allmatches |     select -expand matches |     % {       $_.groups[1].value |         select-string -pattern $re2 |         select -expand matches |         select -expand groups |         select -expand value     } } | set-content 'c:\path\to\output.txt' 

btw, don't use redirection operator (>) inside loop. overwrite output file's content each iteration. if must write file inside loop use append redirection operator instead (>>). however, performance-wise it's better put writing output files @ end of pipeline (see above).


Comments

Popular posts from this blog

c# - Validate object ID from GET to POST -

node.js - Custom Model Validator SailsJS -

php - Find a regex to take part of Email -