haskell - parsec running out of memory -


i wrote parser large csv file works on smaller subset runs out of memory ~1.5m lines (the actual file). after parsing elements list(using manytill), instead used parser state store them in single binary search tree - worked large file.

i have since split "element type" in 3 separate types , want store them in own tree, resulting in 3 trees of different type. version, though, works small test file while running out of memory larger one.

import qualified data.tree.avl avl import qualified text.parsercombinators.parsec parsec ---- data enw = enw (avl.avl extent) (avl.avl node) (avl.avl way) ---- used element = extent | node | way  in (tree element) - worked csvparser :: parsec string enw enw csvparser = (parsec.manytill (parsel) parsec.eof) >> parsec.getstate     parsel = parseline >> ((parsec.newline >> return ()) <|> parsec.eof)  parseline :: parsec string enw () parseline = parsenode <|> parseway <|> parseextents  parsenode :: parsec string enw () parsenode = parsec.string "node" *> (flip addnode <$> (node <$> identifier <*> float <*> float)) >>= parsec.updatestate     identifier = parsec.tab *> (read <$> parsec.many1 parsec.digit)           float      = parsec.tab *> (read <$> parsefloat)  addnode :: enw -> node -> enw addnode (enw e n w) node  = (enw e (avl.push (sndcc node) node n) w) 

parseway , parseextent follow same pattern , whole thing started

parsec.runparser csvparser (enw avl.empty avl.empty avl.empty) "" input 

i dont understand how using 3 smaller trees instead of single large 1 can cause memory issues.

do have reason not use cassava? can used stream csv data , more robust ad hoc csv parser. own experience has shown has excellent performance , can extended parse own types.

edit: looks you're working tab separated value data, not comma separated data, cassava lets specify delimiter split columns by.it appears data have potentially different on each line may need use cassava's 'raw' format returns vector bytestring each line, can parse based on first element.

i've never seen use avl tree package before, there reason aren't using more standard structures? package quite old (last updated in 2008) , more recent packages perform better.


Comments

Popular posts from this blog

c# - Validate object ID from GET to POST -

node.js - Custom Model Validator SailsJS -

php - Find a regex to take part of Email -