NLP 不入门直接放弃

Feb 17, 2018 · 5 min · nlp

Online courses
Libraries and open source
Active blogs
Books
Miscellaneous
DIY projects and data sets
NLP on social media

来源：Melanie Tosik（Twitter:@meltomene）列出的 NLP 学习资源清单

Online courses

Dan Jurafsky & Chris Manning: Natural Language Processing
Stanford CS224d: Deep Learning for Natural Language Processing [更高级的机器学习算法、深度学习和 NLP 的神经网络架构]
Coursera: Introduction to Natural Language Processing [密西根大学的 NLP 课程]

Libraries and open source

spaCy (website, blog) [Python；新兴的开放源码库并自带炫酷的用法示例、API 文档和演示应用程序]
Natural Language Toolkit (NLTK) (website, book) [Python；NLP 实用编程介绍，主要用于教学目的]
Stanford CoreNLP (website) [由 Java 开发的高质量的自然语言分析工具包]
AllenNLP (website) [Python；基于 PyTorch 的 NLP 研究库]
fastText (website) [C++；高效的文本分类（text classification）和表示学习（representation learning）工具]

Active blogs

language processing blog （Hal Daumé III）
Language Log （Mark Liberman）
Google Research blog
Explosion AI blog
Hugging Face
Sebastian Ruder’s blog

Books

Speech and Language Processing （Jurafsky and Martin）[经典的 NLP 教科书，涵盖了所有 NLP 的基础知识，第 3 版即将出版]
Foundations of Statistical Natural Language Processing （Manning and Schütze）[更高级的统计 NLP 方法]
Introduction to Information Retrieval （Manning, Raghavan and Schütze）[关于排名/搜索的优秀参考书]
Neural Network Methods in Natural Language Processing （Goldberg）[深入介绍 NLP 的 NN 方法，和相对应的入门书籍]
Linguistic Fundamentals for Natural Language Processing （Bender）[更成功的 NLP 的词法和句法]
Deep Learning （Goodfellow, Courville and Bengio）[很好的深度学习介绍]

Miscellaneous

How to build a word2vec model in TensorFlow [学习指南]
Deep Learning for NLP resources [按主题分类的关于深度学习的顶尖资源的概述]
Last Words: Computational Linguistics and Deep Learning — A look at the importance of Natural Language Processing. （Manning）[文章]
Natural Language Understanding with Distributed Representation （Cho）[关于 NLU 的 ML / NN 方法的独立讲义]
Bayesian Inference with Tears （Knight）[教程工作簿]
Association for Computational Linguistics （ACL）[期刊选集]
Quora: How do I learn Natural Language Processing?
Natural Language Understanding and Computational Semantics （Bowman）[开源的课程大纲和完整幻灯片]
fast.ai [“Making neural nets uncool again”]

DIY projects and data sets

Nicolas Iderhoff 已经创建了一份公开、详尽的 NLP 数据集的列表。除了这些，这里还有一些推荐的项目：

Implement a part-of-speech (POS) tagger (词性标注) based on a hidden Markov model (HMM) (隐马尔可夫模型)
Implement the CYK algorithm for parsing context-free grammars
Implement semantic similarity (语义相似度) between two given words in a collection of text, e.g. pointwise mutual information (PMI) (点互信息)
Implement a Naive Bayes classifier (朴素贝叶斯分类器) to filter spam
Implement a spell checker based on edit distances between words
Implement a Markov chain (马尔科夫链) text generator
Implement a topic model using latent Dirichlet allocation (LDA)
Use word2vec to generate word embeddings from a large text corpus, e.g. Wikipedia
Use k-means to cluster tf-idf vectors of text, e.g. news articles
Implement a named-entity recognizer (NER) (命名实体识别) (also called a name tagger), e.g. following the CoNLL-2003 shared task

Twitter: #nlproc, list of NLPers (by Jason Baldrige)
Reddit: /r/LanguageTechnology
Medium: NLP

Edit this page on GitHub Last updated: 6/12/2025, 9:17:08 AM

高中生活回忆录 p-value