728x90
NLTK (1) - NLTK 및 NLTK 데이터 설치
NLTK 설치 http://www.nltk.org/install.html
NLTK 는 파이썬 2.7, 3.4, 3.5 을 지원한다.
- Install NLTK : conda install nltk
- Install numpy : conda install numpy
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 | (envalicia) root@localhost:~/vikander# conda install nltk Fetching package metadata ......... Solving package specifications: . Package plan for installation in environment /root/anaconda/envs/envalicia: The following NEW packages will be INSTALLED: nltk: 3.2.4-py35_0 requests: 2.14.2-py35_0 six: 1.10.0-py35_0 Proceed ([y]/n)? y nltk-3.2.4-py3 100% |################################| Time: 0:00:00 21.63 MB/s (envalicia) root@localhost:~/vikander# conda install numpy Fetching package metadata ......... Solving package specifications: . Package plan for installation in environment /root/anaconda/envs/envalicia: The following NEW packages will be INSTALLED: mkl: 2017.0.3-0 numpy: 1.13.1-py35_0 Proceed ([y]/n)? y mkl-2017.0.3-0 100% |################################| Time: 0:00:05 23.87 MB/s numpy-1.13.1-p 100% |################################| Time: 0:00:00 22.52 MB/s (envalicia) root@localhost:~/vikander# python Python 3.5.2 |Anaconda custom (64-bit)| (default, Jul 2 2016, 17:53:06) [GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import nltk >>> exit() (envalicia) root@localhost:~/vikander# pip freeze | grep nltk nltk==3.2.4 (envalicia) root@localhost:~/vikander# pip freeze | grep numpy numpy==1.13.1 | cs |
NLTK 데이터 설치 http://www.nltk.org/data.html
파이썬 인터프리터를 통해 설치할 수 있으며, 다운로드를 원하는 패키지를 선택할 수도 있다. 여기서는 전체 패키지를 선택했다.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 | (envalicia) root@localhost:~/vikander# python Python 3.5.2 |Anaconda custom (64-bit)| (default, Jul 2 2016, 17:53:06) [GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import nltk >>> nltk.download() NLTK Downloader --------------------------------------------------------------------------- d) Download l) List u) Update c) Config h) Help q) Quit --------------------------------------------------------------------------- Downloader> d Download which package (l=list; x=cancel)? Identifier> l Packages: [ ] abc................. Australian Broadcasting Commission 2006 [ ] alpino.............. Alpino Dutch Treebank [ ] averaged_perceptron_tagger Averaged Perceptron Tagger [ ] averaged_perceptron_tagger_ru Averaged Perceptron Tagger (Russian) [ ] basque_grammars..... Grammars for Basque [ ] biocreative_ppi..... BioCreAtIvE (Critical Assessment of Information Extraction Systems in Biology) [ ] bllip_wsj_no_aux.... BLLIP Parser: WSJ Model [ ] book_grammars....... Grammars from NLTK Book [ ] brown............... Brown Corpus [ ] brown_tei........... Brown Corpus (TEI XML Version) [ ] cess_cat............ CESS-CAT Treebank [ ] cess_esp............ CESS-ESP Treebank [ ] chat80.............. Chat-80 Data Files [ ] city_database....... City Database [ ] cmudict............. The Carnegie Mellon Pronouncing Dictionary (0.6) [ ] comparative_sentences Comparative Sentence Dataset [ ] comtrans............ ComTrans Corpus Sample [ ] conll2000........... CONLL 2000 Chunking Corpus [ ] conll2002........... CONLL 2002 Named Entity Recognition Corpus [ ] conll2007........... Dependency Treebanks from CoNLL 2007 (Catalan and Basque Subset) [ ] crubadan............ Crubadan Corpus [ ] dependency_treebank. Dependency Parsed Treebank [ ] dolch............... Dolch Word List [ ] europarl_raw........ Sample European Parliament Proceedings Parallel Corpus [ ] floresta............ Portuguese Treebank [ ] framenet_v15........ FrameNet 1.5 [ ] framenet_v17........ FrameNet 1.7 [ ] gazetteers.......... Gazeteer Lists [ ] genesis............. Genesis Corpus [ ] gutenberg........... Project Gutenberg Selections [ ] hmm_treebank_pos_tagger Treebank Part of Speech Tagger (HMM) [ ] ieer................ NIST IE-ER DATA SAMPLE [ ] inaugural........... C-Span Inaugural Address Corpus [ ] indian.............. Indian Language POS-Tagged Corpus [ ] jeita............... JEITA Public Morphologically Tagged Corpus (in ChaSen format) [ ] kimmo............... PC-KIMMO Data Files [ ] knbc................ KNB Corpus (Annotated blog corpus) [ ] large_grammars...... Large context-free and feature-based grammars for parser comparison [ ] lin_thesaurus....... Lins Dependency Thesaurus [ ] mac_morpho.......... MAC-MORPHO: Brazilian Portuguese news text with part-of-speech tags [ ] machado............. Machado de Assis -- Obra Completa [ ] masc_tagged......... MASC Tagged Corpus [ ] maxent_ne_chunker... ACE Named Entity Chunker (Maximum entropy) [ ] maxent_treebank_pos_tagger Treebank Part of Speech Tagger (Maximum entropy) [ ] moses_sample........ Moses Sample Models [ ] movie_reviews....... Sentiment Polarity Dataset Version 2.0 [ ] mte_teip5........... MULTEXT-East 1984 annotated corpus 4.0 [ ] mwa_ppdb............ The monolingual word aligner (Sultan et al. 2015) subset of the Paraphrase Database. [ ] names............... Names Corpus, Version 1.3 (1994-03-29) [ ] nombank.1.0......... NomBank Corpus 1.0 [ ] nonbreaking_prefixes Non-Breaking Prefixes (Moses Decoder) [ ] nps_chat............ NPS Chat [ ] omw................. Open Multilingual Wordnet [ ] opinion_lexicon..... Opinion Lexicon [ ] panlex_swadesh...... PanLex Swadesh Corpora [ ] paradigms........... Paradigm Corpus [ ] pe08................ Cross-Framework and Cross-Domain Parser Evaluation Shared Task [ ] perluniprops........ perluniprops: Index of Unicode Version 7.0.0 character properties in Perl [ ] pil................. The Patient Information Leaflet (PIL) Corpus [ ] pl196x.............. Polish language of the XX century sixties [ ] porter_test......... Porter Stemmer Test Files [ ] ppattach............ Prepositional Phrase Attachment Corpus [ ] problem_reports..... Problem Report Corpus [ ] product_reviews_1... Product Reviews (5 Products) [ ] product_reviews_2... Product Reviews (9 Products) [ ] propbank............ Proposition Bank Corpus 1.0 [ ] pros_cons........... Pros and Cons [ ] ptb................. Penn Treebank [ ] punkt............... Punkt Tokenizer Models [ ] qc.................. Experimental Data for Question Classification [ ] reuters............. The Reuters-21578 benchmark corpus, ApteMod version [ ] rslp................ RSLP Stemmer (Removedor de Sufixos da Lingua Portuguesa) Hit Enter to continue: [ ] rte................. PASCAL RTE Challenges 1, 2, and 3 [ ] sample_grammars..... Sample Grammars [ ] semcor.............. SemCor 3.0 [ ] senseval............ SENSEVAL 2 Corpus: Sense Tagged Text [ ] sentence_polarity... Sentence Polarity Dataset v1.0 [ ] sentiwordnet........ SentiWordNet [ ] shakespeare......... Shakespeare XML Corpus Sample [ ] sinica_treebank..... Sinica Treebank Corpus Sample [ ] smultron............ SMULTRON Corpus Sample [ ] snowball_data....... Snowball Data [ ] spanish_grammars.... Grammars for Spanish [ ] state_union......... C-Span State of the Union Address Corpus [ ] stopwords........... Stopwords Corpus [ ] subjectivity........ Subjectivity Dataset v1.0 [ ] swadesh............. Swadesh Wordlists [ ] switchboard......... Switchboard Corpus Sample [ ] tagsets............. Help on Tagsets [ ] timit............... TIMIT Corpus Sample [ ] toolbox............. Toolbox Sample Files [ ] treebank............ Penn Treebank Sample [ ] twitter_samples..... Twitter Samples [ ] udhr2............... Universal Declaration of Human Rights Corpus (Unicode Version) [ ] udhr................ Universal Declaration of Human Rights Corpus [ ] unicode_samples..... Unicode Samples [ ] universal_tagset.... Mappings to the Universal Part-of-Speech Tagset [ ] universal_treebanks_v20 Universal Treebanks Version 2.0 [ ] vader_lexicon....... VADER Sentiment Lexicon [ ] verbnet............. VerbNet Lexicon, Version 2.1 [ ] webtext............. Web Text Corpus [ ] wmt15_eval.......... Evaluation data from WMT15 [ ] word2vec_sample..... Word2Vec Sample [ ] wordnet............. WordNet [ ] wordnet_ic.......... WordNet-InfoContent [ ] words............... Word Lists [ ] ycoe................ York-Toronto-Helsinki Parsed Corpus of Old English Prose Collections: [ ] all-corpora......... All the corpora [ ] all-nltk............ All packages available on nltk_data gh-pages branch [ ] all................. All packages [ ] book................ Everything used in the NLTK Book [ ] popular............. Popular packages [ ] third-party......... Third-party data packages ([*] marks installed packages) Download which package (l=list; x=cancel)? Identifier> all Downloading collection 'all' | | Downloading package abc to /root/nltk_data... | Unzipping corpora/abc.zip. | Downloading package alpino to /root/nltk_data... | Unzipping corpora/alpino.zip. | Downloading package biocreative_ppi to /root/nltk_data... | Unzipping corpora/biocreative_ppi.zip. | Downloading package brown to /root/nltk_data... | Unzipping corpora/brown.zip. | Downloading package brown_tei to /root/nltk_data... | Unzipping corpora/brown_tei.zip. | Downloading package cess_cat to /root/nltk_data... | Unzipping corpora/cess_cat.zip. | Downloading package cess_esp to /root/nltk_data... | Unzipping corpora/cess_esp.zip. | Downloading package chat80 to /root/nltk_data... | Unzipping corpora/chat80.zip. | Downloading package city_database to /root/nltk_data... | Unzipping corpora/city_database.zip. | Downloading package cmudict to /root/nltk_data... | Unzipping corpora/cmudict.zip. | Downloading package comparative_sentences to | /root/nltk_data... | Unzipping corpora/comparative_sentences.zip. | Downloading package comtrans to /root/nltk_data... | Downloading package conll2000 to /root/nltk_data... | Unzipping corpora/conll2000.zip. | Downloading package conll2002 to /root/nltk_data... | Unzipping corpora/conll2002.zip. | Downloading package conll2007 to /root/nltk_data... | Downloading package crubadan to /root/nltk_data... | Unzipping corpora/crubadan.zip. | Downloading package dependency_treebank to /root/nltk_data... | Unzipping corpora/dependency_treebank.zip. | Downloading package dolch to /root/nltk_data... | Unzipping corpora/dolch.zip. | Downloading package europarl_raw to /root/nltk_data... | Unzipping corpora/europarl_raw.zip. | Downloading package floresta to /root/nltk_data... | Unzipping corpora/floresta.zip. | Downloading package framenet_v15 to /root/nltk_data... | Unzipping corpora/framenet_v15.zip. | Downloading package framenet_v17 to /root/nltk_data... | Unzipping corpora/framenet_v17.zip. | Downloading package gazetteers to /root/nltk_data... | Unzipping corpora/gazetteers.zip. | Downloading package genesis to /root/nltk_data... | Unzipping corpora/genesis.zip. | Downloading package gutenberg to /root/nltk_data... | Unzipping corpora/gutenberg.zip. | Downloading package ieer to /root/nltk_data... | Unzipping corpora/ieer.zip. | Downloading package inaugural to /root/nltk_data... | Unzipping corpora/inaugural.zip. | Downloading package indian to /root/nltk_data... | Unzipping corpora/indian.zip. | Downloading package jeita to /root/nltk_data... | Downloading package kimmo to /root/nltk_data... | Unzipping corpora/kimmo.zip. | Downloading package knbc to /root/nltk_data... | Downloading package lin_thesaurus to /root/nltk_data... | Unzipping corpora/lin_thesaurus.zip. | Downloading package mac_morpho to /root/nltk_data... | Unzipping corpora/mac_morpho.zip. | Downloading package machado to /root/nltk_data... | Downloading package masc_tagged to /root/nltk_data... | Downloading package moses_sample to /root/nltk_data... | Unzipping models/moses_sample.zip. | Downloading package movie_reviews to /root/nltk_data... | Unzipping corpora/movie_reviews.zip. | Downloading package names to /root/nltk_data... | Unzipping corpora/names.zip. | Downloading package nombank.1.0 to /root/nltk_data... | Downloading package nps_chat to /root/nltk_data... | Unzipping corpora/nps_chat.zip. | Downloading package omw to /root/nltk_data... | Unzipping corpora/omw.zip. | Downloading package opinion_lexicon to /root/nltk_data... | Unzipping corpora/opinion_lexicon.zip. | Downloading package paradigms to /root/nltk_data... | Unzipping corpora/paradigms.zip. | Downloading package pil to /root/nltk_data... | Unzipping corpora/pil.zip. | Downloading package pl196x to /root/nltk_data... | Unzipping corpora/pl196x.zip. | Downloading package ppattach to /root/nltk_data... | Unzipping corpora/ppattach.zip. | Downloading package problem_reports to /root/nltk_data... | Unzipping corpora/problem_reports.zip. | Downloading package propbank to /root/nltk_data... | Downloading package ptb to /root/nltk_data... | Unzipping corpora/ptb.zip. | Downloading package product_reviews_1 to /root/nltk_data... | Unzipping corpora/product_reviews_1.zip. | Downloading package product_reviews_2 to /root/nltk_data... | Unzipping corpora/product_reviews_2.zip. | Downloading package pros_cons to /root/nltk_data... | Unzipping corpora/pros_cons.zip. | Downloading package qc to /root/nltk_data... | Unzipping corpora/qc.zip. | Downloading package reuters to /root/nltk_data... | Downloading package rte to /root/nltk_data... | Unzipping corpora/rte.zip. | Downloading package semcor to /root/nltk_data... | Downloading package senseval to /root/nltk_data... | Unzipping corpora/senseval.zip. | Downloading package sentiwordnet to /root/nltk_data... | Unzipping corpora/sentiwordnet.zip. | Downloading package sentence_polarity to /root/nltk_data... | Unzipping corpora/sentence_polarity.zip. | Downloading package shakespeare to /root/nltk_data... | Unzipping corpora/shakespeare.zip. | Downloading package sinica_treebank to /root/nltk_data... | Unzipping corpora/sinica_treebank.zip. | Downloading package smultron to /root/nltk_data... | Unzipping corpora/smultron.zip. | Downloading package state_union to /root/nltk_data... | Unzipping corpora/state_union.zip. | Downloading package stopwords to /root/nltk_data... | Unzipping corpora/stopwords.zip. | Downloading package subjectivity to /root/nltk_data... | Unzipping corpora/subjectivity.zip. | Downloading package swadesh to /root/nltk_data... | Unzipping corpora/swadesh.zip. | Downloading package switchboard to /root/nltk_data... | Unzipping corpora/switchboard.zip. | Downloading package timit to /root/nltk_data... | Unzipping corpora/timit.zip. | Downloading package toolbox to /root/nltk_data... | Unzipping corpora/toolbox.zip. | Downloading package treebank to /root/nltk_data... | Unzipping corpora/treebank.zip. | Downloading package twitter_samples to /root/nltk_data... | Unzipping corpora/twitter_samples.zip. | Downloading package udhr to /root/nltk_data... | Unzipping corpora/udhr.zip. | Downloading package udhr2 to /root/nltk_data... | Unzipping corpora/udhr2.zip. | Downloading package unicode_samples to /root/nltk_data... | Unzipping corpora/unicode_samples.zip. | Downloading package universal_treebanks_v20 to | /root/nltk_data... | Downloading package verbnet to /root/nltk_data... | Unzipping corpora/verbnet.zip. | Downloading package webtext to /root/nltk_data... | Unzipping corpora/webtext.zip. | Downloading package wordnet to /root/nltk_data... | Unzipping corpora/wordnet.zip. | Downloading package wordnet_ic to /root/nltk_data... | Unzipping corpora/wordnet_ic.zip. | Downloading package words to /root/nltk_data... | Unzipping corpora/words.zip. | Downloading package ycoe to /root/nltk_data... | Unzipping corpora/ycoe.zip. | Downloading package rslp to /root/nltk_data... | Unzipping stemmers/rslp.zip. | Downloading package hmm_treebank_pos_tagger to | /root/nltk_data... | Unzipping taggers/hmm_treebank_pos_tagger.zip. | Downloading package maxent_treebank_pos_tagger to | /root/nltk_data... | Unzipping taggers/maxent_treebank_pos_tagger.zip. | Downloading package universal_tagset to /root/nltk_data... | Unzipping taggers/universal_tagset.zip. | Downloading package maxent_ne_chunker to /root/nltk_data... | Unzipping chunkers/maxent_ne_chunker.zip. | Downloading package punkt to /root/nltk_data... | Unzipping tokenizers/punkt.zip. | Downloading package book_grammars to /root/nltk_data... | Unzipping grammars/book_grammars.zip. | Downloading package sample_grammars to /root/nltk_data... | Unzipping grammars/sample_grammars.zip. | Downloading package spanish_grammars to /root/nltk_data... | Unzipping grammars/spanish_grammars.zip. | Downloading package basque_grammars to /root/nltk_data... | Unzipping grammars/basque_grammars.zip. | Downloading package large_grammars to /root/nltk_data... | Unzipping grammars/large_grammars.zip. | Downloading package tagsets to /root/nltk_data... | Unzipping help/tagsets.zip. | Downloading package snowball_data to /root/nltk_data... | Downloading package bllip_wsj_no_aux to /root/nltk_data... | Unzipping models/bllip_wsj_no_aux.zip. | Downloading package word2vec_sample to /root/nltk_data... | Unzipping models/word2vec_sample.zip. | Downloading package panlex_swadesh to /root/nltk_data... | Downloading package mte_teip5 to /root/nltk_data... | Unzipping corpora/mte_teip5.zip. | Downloading package averaged_perceptron_tagger to | /root/nltk_data... | Unzipping taggers/averaged_perceptron_tagger.zip. | Downloading package perluniprops to /root/nltk_data... | Unzipping misc/perluniprops.zip. | Downloading package nonbreaking_prefixes to | /root/nltk_data... | Unzipping corpora/nonbreaking_prefixes.zip. | Downloading package vader_lexicon to /root/nltk_data... | Downloading package porter_test to /root/nltk_data... | Unzipping stemmers/porter_test.zip. | Downloading package wmt15_eval to /root/nltk_data... | Unzipping models/wmt15_eval.zip. | Downloading package mwa_ppdb to /root/nltk_data... | Unzipping misc/mwa_ppdb.zip. | Done downloading collection all --------------------------------------------------------------------------- d) Download l) List u) Update c) Config h) Help q) Quit --------------------------------------------------------------------------- Downloader> q True >>> | cs |
728x90
'프로그래밍 Programming' 카테고리의 다른 글
ImportError: cannot import name 'PunktWordTokenizer' (0) | 2017.09.25 |
---|---|
NLTK (2) - 텍스트 문장으로 분해하기 Tokenizing text into sentences (0) | 2017.09.25 |
주피터 노트북 파이썬3 추가하기 How do I add python3 kernel to jupyter (IPython) (0) | 2017.09.23 |
Getting Started with CLISP (13) - 조건문 The Conditionals: if and Beyond (0) | 2017.09.19 |
Getting Started with CLISP (12) - nil 과 () (0) | 2017.09.16 |