How To Setup Your Docker Container Dockerfile To Use Nltk Packages

In this tutorial I will show you how to easily use NLTK packages in your docker container. One example use case is deploying a docker container to Google Gcloud.

If your application makes use of NLTK packages such as 'stopwords', 'punkit', you will need to somehow download this datasets somewhere in your system to be able to make use of the packages. Failure to do so, if NLTK can't find the packages will result in code error such as:

2023-03-26 01:21:51.383 PDT


2023-03-26 01:21:51.383 PDT Resource stopwords not found. 2023-03-26 01:21:51.383 PDT Please use the NLTK Downloader to obtain the resource: 2023-03-26 01:21:51.383 PDT >>> import nltk 2023-03-26 01:21:51.383 PDT nltk.download('stopwords') 2023-03-26 01:21:51.383 PDT  2023-03-26 01:21:51.383 PDT For more information see: https://www.nltk.org/data.html 2023-03-26 01:21:51.383 PDT Attempted to load corpora/stopwords 2023-03-26 01:21:51.383 PDT Searched in: 2023-03-26 01:21:51.383 PDT '/nltk_data/ ADD . '

Basically it is saying, NLTK cannot find the package dataset for 'stopwords'. The easiest and best solution for this will be to download the stopwords package and associated datasets when building your docker container.

To do that, you need to add the following lines to your docker container, depending on what NLTK package you need for your application.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
# for our nltk data folder
ENV NLTK_DATA /nltk_data/ ADD . $NLTK_DATA
# Install dependencies
COPY requirements.txt .
RUN pip install --upgrade pip
RUN pip install --no-cache-dir -r requirements.txt
# download punkt
RUN python3 -m nltk.downloader punkt -d /usr/share/nltk_data
# download stopwords
RUN python3 -m nltk.downloader stopwords -d /usr/share/nltk_data

0 Comments

12345

    00