우분투 18.04 에 아파치 스파크 설치하기 Install Apache Spark on Ubuntu 19.04/18.04 & Debian 10/9/8

728x90

Ubuntu 19.04/18.04 그리고 Debian 9/8/10 에 Apache Spark 를 설치하는 방법에 대해 알아봅니다.

설치에 앞서 시스템 패키지를 업데이트합니다.

(AnnaM) founder@hilbert:~$ sudo apt -y upgrade
Reading package lists... Done
Building dependency tree
Reading state information... Done
Calculating upgrade... Done

(AnnaM) founder@hilbert:~$ sudo apt -y upgrade

Step 1: Install Java

Apache Spark 는 자바를 필요로 합니다. 어떤 버전의 자바가 설치되어 있는지 java -version 으로 확인해본 후, 설치된 것이 없으면 다음과 같이 설치한다. 우분투 자바 설치와 관련해서는 다음 포스팅을 참조한다.

https://www.digitalocean.com/community/tutorials/how-to-install-java-with-apt-on-ubuntu-18-04

How To Install Java with `apt` on Ubuntu 18.04 | DigitalOcean

In this guide, you will install various versions of the Java Runtime Environment (JRE) and the Java Developer Kit (JDK) using apt . You'll install OpenJDK as well as official packages from Oracle. You'll then select the version you wish to use for you

www.digitalocean.com

(AnnaM) founder@hilbert:~$ sudo apt install default-jdk

제대로 자바가 설치되었는지 확인해보자.

(AnnaM) founder@hilbert:~$ java -version
openjdk version "11.0.4" 2019-07-16
OpenJDK Runtime Environment (build 11.0.4+11-post-Ubuntu-1ubuntu218.04.3)
OpenJDK 64-Bit Server VM (build 11.0.4+11-post-Ubuntu-1ubuntu218.04.3, mixed mode, sharing)

Step 2: Download Apache Spark

현시점 최신 버전은 2.4.4 이다. 해당 파일을 다음과 같이 다운로드받는다.

(AnnaM) founder@hilbert:~/annam$ curl -O https://www-us.apache.org/dist/spark/spark-2.4.4/spark-2.4.4-bin-hadoop2.7.tgz
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  219M  100  219M    0     0  12.7M      0  0:00:17  0:00:17 --:--:-- 14.7M

다운로드받은 Spark 파일의 압축을 해제한다.

(AnnaM) founder@hilbert:~/annam$ tar xvf spark-2.4.4-bin-hadoop2.7.tgz
spark-2.4.4-bin-hadoop2.7/

압축을 푼 후 생성된 Spark 폴더를 /opt/spark 로 이동한다.

(AnnaM) founder@hilbert:~/annam$ sudo mv spark-2.4.4-bin-hadoop2.7/ /opt/spark

Spark 환경 설정을 한다. bashrc 설정 파일을 열어 다음의 내용을 추가한다.

(AnnaM) founder@hilbert:~/annam$ vim ~/.bashrc

export SPARK_HOME=/opt/spark 
export PATH=$PATH:$SPARK_HOME/bin:$SPARK_HOME/sbin

변경사항을 반영한다.

(AnnaM) founder@hilbert:~/annam$ source ~/.bashrc

Step 3: Start a standalone master server

start-master.sh 명령어를 사용하여 다음과 같이 독립 마스터 서버를 구동할 수 있다.

(AnnaM) founder@hilbert:~/annam$ start-master.sh
starting org.apache.spark.deploy.master.Master, logging to /opt/spark/logs/spark-founder-org.apache.spark.deploy.master.Master-1-hilbert.out

프로세스는 TCP 포트 8080 을 리스닝하게 된다.

(AnnaM) founder@hilbert:~/annam$ ss -tunelp | grep 8080
tcp   LISTEN  0       1                           *:8080                *:*      users:(("java",pid=5029,fd=246)) uid:1001 ino:35961 sk:d v6only:0 <->

웹 UI 는 다음과 같은 모습이다.

Spark URL 은 spark://**********:7077 이다.

Step 4: Starting Spark Worker Process

start-slave.sh 명령은 Spark Worker Process 를 가동하기 위해 사용된다.

(AnnaM) founder@hilbert:~/annam$ start-slave.sh spark://hilbert.asia-east1-b.c.alert-almanac-220207.internal:7077
starting org.apache.spark.deploy.worker.Worker, logging to /opt/spark/logs/spark-founder-org.apache.spark.deploy.worker.Worker-1-hilbert.out

만약 $PATH 에 스크립트가 없다면, 다음과 같이 검색해보자.

(AnnaM) founder@hilbert:~/annam$ locate start-slave.sh

/opt/spark/sbin/start-slave.sh 와 같이 절대주소를 이용해서도 스크립트 실행이 가능하다.

Step 5: Using Spark shell

spark-shell 명령을 사용하여 Spark Shell 에 접근할 수 있다.

(AnnaM) founder@hilbert:~/annam$ /opt/spark/bin/spark-shell
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.apache.spark.unsafe.Platform (file:/opt/spark/jars/spark-unsafe_2.11-2.4.4.jar) to method java.nio.Bits.unaligned()
WARNING: Please consider reporting this to the maintainers of org.apache.spark.unsafe.Platform
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
19/11/01 09:17:28 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Spark context Web UI available at http://hilbert.asia-east1-b.c.alert-almanac-220207.internal:4040
Spark context available as 'sc' (master = local[*], app id = local-1572599861434).
Spark session available as 'spark'.
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 2.4.4
      /_/

Using Scala version 2.11.12 (OpenJDK 64-Bit Server VM, Java 11.0.4)
Type in expressions to have them evaluated.
Type :help for more information.

scala> println("Hello Spark World")
Hello Spark World

파이썬이 편하다면, pyspark 를 사용하면 된다.

(AnnaM) founder@hilbert:~/annam$ /opt/spark/bin/pyspark
Python 3.7.4 (default, Aug 13 2019, 20:35:49)
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.apache.spark.unsafe.Platform (file:/opt/spark/jars/spark-unsafe_2.11-2.4.4.jar) to method java.nio.Bits.unaligned()
WARNING: Please consider reporting this to the maintainers of org.apache.spark.unsafe.Platform
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
19/11/01 09:21:56 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /__ / .__/\_,_/_/ /_/\_\   version 2.4.4
      /_/

Using Python version 3.7.4 (default, Aug 13 2019 20:35:49)
SparkSession available as 'spark'.
>>>

다음 명령을 통해 마스터 / 슬레이브 Spark 를 종료할 수 있다.

$ SPARK_HOME/sbin/stop-slave.sh 
$ SPARK_HOME/sbin/stop-master.sh

원문소스 https://computingforgeeks.com/how-to-install-apache-spark-on-ubuntu-debian/

728x90

저작자표시 비영리 변경금지 (새창열림)

'Season 1 아카이브 > 프로그래밍' 카테고리의 다른 글

윈도우에 파이썬 및 pip 설치하기 Python & pip Windows installation (0)	2020.02.03
spaCy 패키지와 코드를 통한 NLP 기초 다지기 Clear the Fundamentals of NLP with Code. (0)	2019.12.14
Plotly를 이용한 데이터 시각화 Data visualization with Plotly (0)	2019.10.26
tqdm 을 사용하여 파이썬/판다 Progress Bars 만들기 (0)	2019.10.24
고급 주피터 노트북 사용팁 Advanced Jupyter Notebooks Tutorial (Part 1) (1)	2019.10.24

갈루아의 반서재

우분투 18.04 에 아파치 스파크 설치하기 Install Apache Spark on Ubuntu 19.04/18.04 & Debian 10/9/8

'Season 1 아카이브 > 프로그래밍' 카테고리의 다른 글

티스토리툴바