haskell - parsec running out of memory -
i wrote parser large csv file works on smaller subset runs out of memory ~1.5m lines (the actual file). after parsing elements list(using manytill), instead used parser state store them in single binary search tree - worked large file.
i have since split "element type" in 3 separate types , want store them in own tree, resulting in 3 trees of different type. version, though, works small test file while running out of memory larger one.
import qualified data.tree.avl avl import qualified text.parsercombinators.parsec parsec ---- data enw = enw (avl.avl extent) (avl.avl node) (avl.avl way) ---- used element = extent | node | way in (tree element) - worked csvparser :: parsec string enw enw csvparser = (parsec.manytill (parsel) parsec.eof) >> parsec.getstate parsel = parseline >> ((parsec.newline >> return ()) <|> parsec.eof) parseline :: parsec string enw () parseline = parsenode <|> parseway <|> parseextents parsenode :: parsec string enw () parsenode = parsec.string "node" *> (flip addnode <$> (node <$> identifier <*> float <*> float)) >>= parsec.updatestate identifier = parsec.tab *> (read <$> parsec.many1 parsec.digit) float = parsec.tab *> (read <$> parsefloat) addnode :: enw -> node -> enw addnode (enw e n w) node = (enw e (avl.push (sndcc node) node n) w)
parseway , parseextent follow same pattern , whole thing started
parsec.runparser csvparser (enw avl.empty avl.empty avl.empty) "" input
i dont understand how using 3 smaller trees instead of single large 1 can cause memory issues.
do have reason not use cassava? can used stream csv data , more robust ad hoc csv parser. own experience has shown has excellent performance , can extended parse own types.
edit: looks you're working tab separated value data, not comma separated data, cassava lets specify delimiter split columns by.it appears data have potentially different on each line may need use cassava's 'raw' format returns vector bytestring each line, can parse based on first element.
i've never seen use avl tree package before, there reason aren't using more standard structures? package quite old (last updated in 2008) , more recent packages perform better.
Comments
Post a Comment