java - How to extract an unlabelled/untyped dependency tree from a TreeAnnotation using Stanford CoreNLP? -

September 15, 2010

the target language spanish.

the english pipeline has support typed dependencies whereas spanish pipeline, knowledge, not.

the goal produce dependency tree treeannotation end result list of directed edges. possible corenlp 3.4.1 , using spanish models, if so: how?

background

i'm using stanford corenlp 3.4.1 + (3.5.0 spanish models pos tagging) (due compatibility reasons, java 8 cannot used yet) following configuration:

properties props = new properties(); props.setproperty("annotators", "tokenize, ssplit, pos, ner, parse"); props.setproperty("tokenize.options", "invertible=true,ptb3escaping=true"); props.setproperty("tokenize.language", "es");  props.setproperty("pos.model", "edu/stanford/nlp/models/pos-tagger/spanish/spanish-distsim.tagger"); props.setproperty("ner.model", "edu/stanford/nlp/models/ner/spanish.ancora.distsim.s512.crf.ser.gz");  props.setproperty("parse.model", "edu/stanford/nlp/models/srparser/spanishsr.ser.gz"); //stanford parser 3.4.1 shift-reduce models spanish.   props.setproperty("ner.applynumericclassifiers", "false"); props.setproperty("ner.usesutime", "false");

which used create pipeline , run annotation of document.

stanfordcorenlp pipeline = new stanfordcorenlp(props); pipeline.annotate(document);  list<coremap> sentences = document.get(coreannotations.sentencesannotation.class);  for(coremap sentence: sentences) {      // ... extract start, end position of sentence ...      (corelabel token: sentence.get(coreannotations.tokensannotation.class)) {          // ... extract pos tags, ner annotations, id ...     }      //this works, , have tree not empty.     tree tree = sentence.get(treecoreannotations.treeannotation.class); }

by using debugger able examine both sentences , tokens , conclude have following content:

sentence (keys)

from edu.stanford.nlp.ling.coreannotations:

textannotation
characteroffsetbeginannotation
characteroffsetendannotation
tokensannotation
tokenbeginannotation
tokenendannotation
sentenceindexannotation

from edu.stanford.nlp.trees.treecoreannotations

treeannotation

tokens (keys)

from edu.stanford.nlp.ling.coreannotations

textannotation
originaltextannotation
characteroffsetbeginannotation
characteroffsetendannotation
beforeannotation
afterannotation
indexannotation
sentenceindexannotation
partofspeechannotation
namedentitytagannotation

from edu.stanford.nlp.trees.treecoreannotations

headwordannotation - in experiments: 1 points itself, i.e. token annotation retrieved from.
headtagannotation

thanks in advance!

there no support spanish dependency parsing in corenlp @ moment. includes typed dependency conversion constituency parses.

there head finder implemented (but not tested). hack untyped dependency converter using head finder, have no guarantees yield sensible parse.

Search This Blog

ANgular