NLP 不入门直接放弃
Feb 17, 2018 · 5 min · nlp
- Online courses
- Libraries and open source
- Active blogs
- Books
- Miscellaneous
- DIY projects and data sets
- NLP on social media
来源:Melanie Tosik(Twitter:@meltomene)列出的 NLP 学习资源清单
Online courses
-
Stanford CS224d: Deep Learning for Natural Language Processing [更高级的机器学习算法、深度学习和 NLP 的神经网络架构]
-
Coursera: Introduction to Natural Language Processing [密西根大学的 NLP 课程]
Libraries and open source
-
spaCy (website, blog) [Python;新兴的开放源码库并自带炫酷的用法示例、API 文档和演示应用程序]
-
Natural Language Toolkit (NLTK) (website, book) [Python;NLP 实用编程介绍,主要用于教学目的]
-
Stanford CoreNLP (website) [由 Java 开发的高质量的自然语言分析工具包]
-
AllenNLP (website) [Python;基于 PyTorch 的 NLP 研究库]
-
fastText (website) [C++;高效的文本分类(text classification)和表示学习(representation learning)工具]
Active blogs
-
language processing blog (Hal Daumé III)
-
Language Log (Mark Liberman)
Books
-
Speech and Language Processing (Jurafsky and Martin)[经典的 NLP 教科书,涵盖了所有 NLP 的基础知识,第 3 版即将出版]
-
Foundations of Statistical Natural Language Processing (Manning and Schütze)[更高级的统计 NLP 方法]
-
Introduction to Information Retrieval (Manning, Raghavan and Schütze)[关于排名/搜索的优秀参考书]
-
Neural Network Methods in Natural Language Processing (Goldberg)[深入介绍 NLP 的 NN 方法,和相对应的入门书籍]
-
Linguistic Fundamentals for Natural Language Processing (Bender)[更成功的 NLP 的词法和句法]
-
Deep Learning (Goodfellow, Courville and Bengio)[很好的深度学习介绍]
Miscellaneous
-
Deep Learning for NLP resources [按主题分类的关于深度学习的顶尖资源的概述]
-
Last Words: Computational Linguistics and Deep Learning — A look at the importance of Natural Language Processing. (Manning)[文章]
-
Natural Language Understanding with Distributed Representation (Cho)[关于 NLU 的 ML / NN 方法的独立讲义]
-
Bayesian Inference with Tears (Knight)[教程工作簿]
-
Association for Computational Linguistics (ACL)[期刊选集]
-
Natural Language Understanding and Computational Semantics (Bowman)[开源的课程大纲和完整幻灯片]
-
fast.ai [“Making neural nets uncool again”]
DIY projects and data sets
Nicolas Iderhoff 已经创建了一份公开、详尽的 NLP 数据集的列表。除了这些,这里还有一些推荐的项目:
-
Implement a part-of-speech (POS) tagger (词性标注) based on a hidden Markov model (HMM) (隐马尔可夫模型)
-
Implement the CYK algorithm for parsing context-free grammars
-
Implement semantic similarity (语义相似度) between two given words in a collection of text, e.g. pointwise mutual information (PMI) (点互信息)
-
Implement a Naive Bayes classifier (朴素贝叶斯分类器) to filter spam
-
Implement a spell checker based on edit distances between words
-
Implement a Markov chain (马尔科夫链) text generator
-
Implement a topic model using latent Dirichlet allocation (LDA)
-
Use word2vec to generate word embeddings from a large text corpus, e.g. Wikipedia
-
Use k-means to cluster tf-idf vectors of text, e.g. news articles
-
Implement a named-entity recognizer (NER) (命名实体识别) (also called a name tagger), e.g. following the CoNLL-2003 shared task
NLP on social media
-
Twitter: #nlproc, list of NLPers (by Jason Baldrige)
-
Reddit: /r/LanguageTechnology
-
Medium: NLP