Conference
Conference Ranking
ACL : Annual Meeting of Association for Computational Linguistics
EACL : European chapter for ACL
NAACL : North American chapter for ACL
CONLL : Conference on Natural Language Learning
COLING : International Conference on Computational Linguistics
IWPT : Biennial International Conference on Parsing Technology
GWC : Global WordNet Conference
IJCNLP : International Joint Conference on NLP
COLIPS : Chinese and Oriental Languages Information Processing Society
ICON : International Conference on NLP
AAAI : American Association for Artificial Intelligence National Conference
IJCAI : International Joint Conference on Artificial Intelligence
Journal
Computational Linguistics
Natural Language Engineering
Computer Speech and Languages
Part of Speech tagging
Stanford POS tagger : Log-linear tagger in Java.
MBT : Memory-based Tagger Based on TiMBL.
Tree-Tagger : A language independent decision tree based tagger.
TnT : A Statistical Part-of-Speech Tagger
SVMTool : POS Tagger based on SVMs (uses SVMlight).
MXPOST : Adwait Ratnaparkhi's Maximum Entropy based POS tagger (Java).
YamCha : SVM-based NP-chunker, also usable for POS tagging, NER, etc. (C/C++)
Parsing
MST Parser : A non-projective dependency parser (Java, Python)
Malt Parser : A system for data-driven dependency parsing (Java).
Stanford Statistical Parser : A highly optimized PCFG and lexicalized dependency parsers (Java).
TAG
Non-Directional
PCFG Parser
Corpus
LDC : Linguistic Data Consortium.
ELRA : European Language Resources Association.
CLR : Consortium for Lexical Research.
Leipzig Corpora Collection : Sentence collections in MySQL database for 17 mainly European languages.
BNC : British National Corpus, a 100 million word corpus of British English.
ICE : International Corpus of English
MICASE : Michigan Corpus of Academic Spoken English 1.7 million words from 1997-2001.
Penn-Helsinki Parsed Corpus of Middle English :
American National Corpus
Lancaster Corpus of Mandarin Chinese (LCMC) : A Freely available balanced Corpus.
EMILLE/CIIL : Monolingual written corpus data for 14 South
Asian languages (Assamese, Bengali, Gujarati, Hindi, Kannada, Kashmiri, Malayalam, Marathi, Oriya, Punjabi, Sinhala, Tamil, Telugu and Urdu).
OPUS : An open source parallel corpus, aligned, in many languages,
based on free Linux etc.
NEGRA Corpus : Saarland University Syntactically Annotated Corpus of German Newspaper Texts.
Russian National Corpus 150 million words, 5 million words POS-tagged, some in
dependency treebank.
TreeBank
Penn TreeBank
Penn Chinees TreeBank
BNC
TUT TreeBank
IIIT-H TreeBank
Other Tools
CMU-Cambridge Statistical Language Modeling toolkit :
SRI Language Modeling toolkit :
NLTK : Natural Language Toolkit
GATE : General Architecture for Text Engineering
Emdros : A text database engine for linguistic analysis and research
Sabdanjali :
NLP Research Lab in India
IIIT-H :
IITK :
IITKgp :
Jadabpur University :
Anna University :
Microsoft Research Lab :
JNU :
C-DAC :
CIIL :
LDCIL :