Send Close Add comments: (status displays here)
Got it!  This site uses cookies. You consent to this by clicking on "Got it!" or by continuing to use this website.nbsp; Note: This appears on each machine/browser from which this site is accessed.
Natural Language Parsing


1. Natural Language Parsing
To stimulate ideas, here is some output from some available NLP (Natural Language Processing) software systems.

The default settings were used. There are many ways to customize and tweak the system depending on the domain of application.

2. Stanford NLP group
The leading NLP group is at Stanford University, at the following URL.

Their software is summarized and available from the following URL.

3. NLTK
The NLTK (Natural Language Tool Kit) is available at the following URL.

"NLTK is a leading platform for building Python programs to work with human language data. It provides easy-to-use interfaces to over 50 corpora and lexical resources such as WordNet, along with a suite of text processing libraries for classification, tokenization, stemming, tagging, parsing, and semantic reasoning."

4. Example sentences
Here ares some example sentences.

This article contains a discussion of the history of commercial and academic efforts to automate patent classifications. It also suggests new approaches (adding additional structured language to the text) that (it asserts) lead to statistically meaningful improvements.

5. Parse trees
Here are the parse tree from NLTK for the first sentence.

6. Stanford parser
Here is the output from the Stanford Parser, an "implementations of probabilistic natural language parsers in Java: highly optimized PCFG and dependency parsers, a lexicalized PCFG parser, and a deep learning reranker".
(ROOT       (S             (NP (DT This) (NN article))             (VP (VBZ contains)                   (S                         (NP                               (NP (DT a) (NN discussion))                               (PP (IN of)                                     (NP                                           (NP (DT the) (NN history))                                           (PP (IN of)                                                 (NP                                                       (UCP (JJ commercial)                                                             (CC and)                                                             (JJ academic))                                                       (NNS efforts))))))                         (VP (TO to)                               (VP (VB automate)                                     (NP (NN patent) (NNS classifications))))))             (. .))) det(article-2, This-1) nsubj(contains-3, article-2) root(ROOT-0, contains-3) det(discussion-5, a-4) nsubj(automate-15, discussion-5) det(history-8, the-7) prep_of(discussion-5, history-8) amod(efforts-13, commercial-10) conj_and(commercial-10, academic-12) amod(efforts-13, academic-12) prep_of(history-8, efforts-13) aux(automate-15, to-14) xcomp(contains-3, automate-15) nn(classifications-17, patent-16) dobj(automate-15, classifications-17) (ROOT       (S             (NP (PRP It))             (ADVP (RB also))             (VP (VBZ suggests)                   (NP                         (NP (JJ new) (NNS approaches))                         (PRN (-LRB- -LRB-)                               (VP (VBG adding)                                     (NP (JJ additional) (JJ structured) (NN language))                                     (PP (TO to)                                           (NP (DT the) (NN text))))                               (-RRB- -RRB-))                         (SBAR                               (WHNP (WDT that))                               (S                                     (PRN (-LRB- -LRB-)                                           (S                                                 (NP (PRP it))                                                 (VP (VBZ asserts)))                                           (-RRB- -RRB-))                                     (VP (VBP lead)                                           (PP (TO to)                                                 (NP                                                       (ADJP (RB statistically) (JJ meaningful))                                                       (NNS improvements))))))))             (. .))) nsubj(suggests-3, It-1) advmod(suggests-3, also-2) root(ROOT-0, suggests-3) amod(approaches-5, new-4) dobj(suggests-3, approaches-5) nsubj(lead-20, approaches-5) dep(approaches-5, adding-7) amod(language-10, additional-8) amod(language-10, structured-9) dobj(adding-7, language-10) det(text-13, the-12) prep_to(adding-7, text-13) nsubj(asserts-18, it-17) parataxis(lead-20, asserts-18) rcmod(approaches-5, lead-20) advmod(meaningful-23, statistically-22) amod(improvements-24, meaningful-23) prep_to(lead-20, improvements-24)


7. Stanford tagger
Here is the output from the Stanford POS Tagger, a "maximum-entropy (CMM) part-of-speech (POS) tagger for English, Arabic, Chinese, French, and German, in Java ".
This_DT article_NN contains_VBZ a_DT discussion_NN of_IN the_DT history_NN of_IN commercial_JJ and_CC academic_JJ efforts_NNS to_TO automate_VB patent_NN classifications_NNS ._. It_PRP also_RB suggests_VBZ new_JJ approaches_NNS -LRB-_-LRB- adding_VBG additional_JJ structured_JJ language_NN to_TO the_DT text_NN -RRB-_-RRB- that_WDT -LRB-_-LRB- it_PRP asserts_VBZ -RRB-_-RRB- lead_NN to_TO statistically_RB meaningful_JJ improvements_NNS ._.


8. NLTK taggers
Using the NLTK , here is the parts of NLTK and Stanford speech tagger (via NLTK API) output using the default settings, along with the differences between the two taggers.
NLTK tagger: (default settings)       0. This : DT = Determiner       1. article : NN = Noun, singular or mass       2. contains : VBZ = Verb, 3rd person singular present       3. a : DT = Determiner       4. discussion : NN = Noun, singular or mass       5. of : IN = Preposition or subordinating conjunction       6. the : DT = Determiner       7. history : NN = Noun, singular or mass       8. of : IN = Preposition or subordinating conjunction       9. commercial : JJ = Adjective    10. and : CC = Coordinating conjunction    11. academic : JJ = Adjective    12. efforts : NNS = Noun, plural    13. to : TO = to    14. automate : VB = Verb, base form    15. patent : NN = Noun, singular or mass    16. classifications. : NNP = Proper noun, singular    17. It : NNP = Proper noun, singular    18. also : RB = Adverb    19. suggests : VBZ = Verb, 3rd person singular present    20. new : JJ = Adjective    21. approaches : NNS = Noun, plural    22. ( : VBP = Verb, non-3rd person singular present    23. adding : VBG = Verb, gerund or present participle    24. additional : JJ = Adjective    25. structured : JJ = Adjective    26. language : NN = Noun, singular or mass    27. to : TO = to    28. the : DT = Determiner    29. text : NN = Noun, singular or mass    30. ) : : = Colon or ellipsis    31. that : IN = Preposition or subordinating conjunction    32. ( : CD = Cardinal number    33. it : PRP = Personal pronoun    34. asserts : VBZ = Verb, 3rd person singular present    35. ) : : = Colon or ellipsis    36. lead : NN = Noun, singular or mass    37. to : TO = to    38. statistically : RB = Adverb    39. meaningful : JJ = Adjective    40. improvements : NNS = Noun, plural    41. . : . = Termator Stanford tagger: (default settings)       0. This : DT = Determiner       1. article : NN = Noun, singular or mass       2. contains : VBZ = Verb, 3rd person singular present       3. a : DT = Determiner       4. discussion : NN = Noun, singular or mass       5. of : IN = Preposition or subordinating conjunction       6. the : DT = Determiner       7. history : NN = Noun, singular or mass       8. of : IN = Preposition or subordinating conjunction       9. commercial : JJ = Adjective    10. and : CC = Coordinating conjunction    11. academic : JJ = Adjective    12. efforts : NNS = Noun, plural    13. to : TO = to    14. automate : VB = Verb, base form    15. patent : JJ = Adjective    16. classifications. : NN = Noun, singular or mass    17. It : PRP = Personal pronoun    18. also : RB = Adverb    19. suggests : VBZ = Verb, 3rd person singular present    20. new : JJ = Adjective    21. approaches : NNS = Noun, plural    22. ( : VBP = Verb, non-3rd person singular present    23. adding : VBG = Verb, gerund or present participle    24. additional : JJ = Adjective    25. structured : JJ = Adjective    26. language : NN = Noun, singular or mass    27. to : TO = to    28. the : DT = Determiner    29. text : NN = Noun, singular or mass    30. ) : NN = Noun, singular or mass    31. that : WDT = Wh-determiner    32. ( : VBZ = Verb, 3rd person singular present    33. it : PRP = Personal pronoun    34. asserts : VBZ = Verb, 3rd person singular present    35. ) : JJ = Adjective    36. lead : NN = Noun, singular or mass    37. to : TO = to    38. statistically : RB = Adverb    39. meaningful : JJ = Adjective    40. improvements : NNS = Noun, plural    41. . : . = Termator Differences:    15. patent : NN = Noun, singular or mass    15. patent : JJ = Adjective    16. classifications. : NNP = Proper noun, singular    16. classifications. : NN = Noun, singular or mass    17. It : NNP = Proper noun, singular    17. It : PRP = Personal pronoun    30. ) : : = Colon or ellipsis    30. ) : NN = Noun, singular or mass    31. that : IN = Preposition or subordinating conjunction    31. that : WDT = Wh-determiner    32. ( : CD = Cardinal number    32. ( : VBZ = Verb, 3rd person singular present    35. ) : : = Colon or ellipsis    35. ) : JJ = Adjective


9. End of page

10. Acronyms and/or initialisms for this page