Foundational Papers in NLP - Chen and Manning 2014
Contributions
This paper demonstrates that greedy transition-based dependency parsing can gain substantial improvements by having a feedforward neural network induce compact distributed representations instead of manual specification of high-dimensional sparse feature templates. This improves pervasive issues regarding generalization, computational intractability, and informational bottlenecks.
Transition-Based Parsing Algorithms
Prominent (circa 2014) transition-based parsing algorithms construct syntax trees incrementally by altering intermediate parser states comprising a stack of parsed words and buffer of upcoming words. At each step, a classifier guides alteration of these states by adding arcs or transitioning words between data structures, ultimately constructing a full parse. Popular systems include the arc-standard approach, with transitions like shift, left-arc, and right-arc.
Limitations of Feature Engineering
However, explicit exhaustive enumeration of indicator feature conjunctions fundamentally hinders these approaches:
1) Specifying extensive templated combinations of words, tags, labels, and syntactic relationships risks substantial overfitting, despite clear empirical importance of lexical features. For instance, word pair conjunctions prove vital for accuracy, yet still undermine generalization even with regularization.
2) The predominant computational expenditure concentrates on exhaustively generating massive sets of sparse indicator combinations rather than performing parsing actions. Profile analysis indicates this often constitutes over 95-99% of total run time, representing a major bottleneck.
3) Linguists fundamentally struggle to codify many informative higher-order feature interactions between words, parts-of-speech, and syntactic relationships. Conjunctions of more than three elements are often excluded due to considerations of tractability.?
4) Mathematical reasons including concentration of measure phenomena suggest resultant models likely intrinsically exhibit compromised generalization outside observed feature combinations.
Proposed Neural Architecture
Seeking to overcome these issues, the proposed parser learns to induce distributed embeddings for words, tags, and labels directly from data. A simple single hidden layer feedforward network architecture then models beneficial higher-order feature interactions and bundle combinations within this continuous representation space. A novel cubic activation paradigm automatically captures triple interaction effects and beyond without manual specification.
Training Methodology
The training methodology constructs these dense feature extractors by leveraging an oracle algorithm to produce optimal transition sequences from corpus annotations. Cross-entropy optimization of predictions over these transitions enables continued embedding geometry refinement grounded on in-domain corpora. Pre-trained word vectors are used for initialization.
Inference Approach
Application parses novel sentences via greedy neural network classification over candidate valid transitions, utilizing the activations to drive incremental parsing state alterations. Optimizations compile embeddings to ameliorate extraction complexity, with runtime now dominated by matrix operations rather than combinatorial feature specification.
Experimental Results
Quantitative evaluations across English and Chinese treebanks with varying syntactic formalisms including Stanford and CoNLL representations exhibit consistent accuracy and efficiency gains over both benchmark transition and graph-based parsers by acquiring generalizable features directly from data. The approach also exceeds optimized commercial parsers, demonstrating practical viability.
Analysis
Inspection of acquired representations and model weights reveals the automated extraction and selection of beneficial higher-order feature bundling otherwise unavailable to manual specification. Visualizations also confirm encoding of useful linguistic regularities within the geometry.
Conclusion
In conclusion, this work conclusively demonstrates that transition-based parsing can gain fundamental improvements by replacing manual feature engineering with compact distributed neural representations.
Chen, D., & Manning, C. D. (2014, October). A fast and accurate dependency parser using neural networks. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP) (pp. 740-750).