java - How to extract an unlabelled/untyped dependency tree from a TreeAnnotation using Stanford CoreNLP? -
the target language spanish.
the english pipeline has support typed dependencies whereas spanish pipeline, knowledge, not.
the goal produce dependency tree treeannotation end result list of directed edges. possible corenlp 3.4.1 , using spanish models, if so: how?
background
i'm using stanford corenlp 3.4.1 + (3.5.0 spanish models pos tagging) (due compatibility reasons, java 8 cannot used yet) following configuration:
properties props = new properties(); props.setproperty("annotators", "tokenize, ssplit, pos, ner, parse"); props.setproperty("tokenize.options", "invertible=true,ptb3escaping=true"); props.setproperty("tokenize.language", "es"); props.setproperty("pos.model", "edu/stanford/nlp/models/pos-tagger/spanish/spanish-distsim.tagger"); props.setproperty("ner.model", "edu/stanford/nlp/models/ner/spanish.ancora.distsim.s512.crf.ser.gz"); props.setproperty("parse.model", "edu/stanford/nlp/models/srparser/spanishsr.ser.gz"); //stanford parser 3.4.1 shift-reduce models spanish. props.setproperty("ner.applynumericclassifiers", "false"); props.setproperty("ner.usesutime", "false"); which used create pipeline , run annotation of document.
stanfordcorenlp pipeline = new stanfordcorenlp(props); pipeline.annotate(document); list<coremap> sentences = document.get(coreannotations.sentencesannotation.class); for(coremap sentence: sentences) { // ... extract start, end position of sentence ... (corelabel token: sentence.get(coreannotations.tokensannotation.class)) { // ... extract pos tags, ner annotations, id ... } //this works, , have tree not empty. tree tree = sentence.get(treecoreannotations.treeannotation.class); } by using debugger able examine both sentences , tokens , conclude have following content:
sentence (keys)
from edu.stanford.nlp.ling.coreannotations:
- textannotation
- characteroffsetbeginannotation
- characteroffsetendannotation
- tokensannotation
- tokenbeginannotation
- tokenendannotation
- sentenceindexannotation
from edu.stanford.nlp.trees.treecoreannotations
- treeannotation
tokens (keys)
from edu.stanford.nlp.ling.coreannotations
- textannotation
- originaltextannotation
- characteroffsetbeginannotation
- characteroffsetendannotation
- beforeannotation
- afterannotation
- indexannotation
- sentenceindexannotation
- partofspeechannotation
- namedentitytagannotation
from edu.stanford.nlp.trees.treecoreannotations
- headwordannotation - in experiments: 1 points itself, i.e. token annotation retrieved from.
- headtagannotation
thanks in advance!
there no support spanish dependency parsing in corenlp @ moment. includes typed dependency conversion constituency parses.
there head finder implemented (but not tested). hack untyped dependency converter using head finder, have no guarantees yield sensible parse.
Comments
Post a Comment