×ðÁú¿­Ê±¹ÙÍøµÇ¼

ÔõÑùÔÚLinuxÉϹ¹½¨ÈÝÆ÷»¯µÄ´óÊý¾ÝÆÊÎöƽ̨£¿

ÔõÑùÔÚlinuxÉϹ¹½¨ÈÝÆ÷»¯µÄ´óÊý¾ÝÆÊÎöƽ̨£¿

Ëæ×ÅÊý¾ÝÁ¿µÄ¿ìËÙÔöÌí £¬´óÊý¾ÝÆÊÎö³ÉΪÁËÆóÒµºÍ×éÖ¯ÔÚʵʱ¾öÒé¡¢Êг¡ÓªÏú¡¢Óû§ÐÐΪÆÊÎöµÈ·½ÃæµÄÖ÷Òª¹¤¾ß¡£ÎªÁËÖª×ãÕâЩÐèÇó £¬¹¹½¨Ò»¸ö¸ßЧ¡¢¿ÉÀ©Õ¹µÄ´óÊý¾ÝÆÊÎöƽ̨ÖÁ¹ØÖ÷Òª¡£ÔÚ±¾ÎÄÖÐ £¬ÎÒÃǽ«ÏÈÈÝÔõÑùʹÓÃÈÝÆ÷ÊÖÒÕ £¬ÔÚLinuxÉϹ¹½¨Ò»¸öÈÝÆ÷»¯µÄ´óÊý¾ÝÆÊÎöƽ̨¡£

Ò»¡¢ÈÝÆ÷»¯ÊÖÒÕ¸ÅÊö

ÈÝÆ÷»¯ÊÖÒÕÊÇÒ»ÖÖ½«Ó¦ÓóÌÐò¼°ÆäÒÀÀµ¹Øϵ´ò°üΪһ¸ö×ÔÁ¦µÄÈÝÆ÷ £¬´Ó¶øʵÏÖÓ¦ÓóÌÐòµÄ¿ìËÙ°²ÅÅ¡¢¿ÉÒÆÖ²ÐԺ͸ôÀëÐÔµÄÊÖÒÕ¡£ÈÝÆ÷½«Ó¦ÓóÌÐòÓëµ×²ã²Ù×÷ϵͳ¸ôÍÑÀëÀ´ £¬´Ó¶øʹӦÓóÌÐòÔÚ²î±ðµÄÇéÐÎÖоßÓÐÏàͬµÄÔËÐÐÐÐΪ¡£

DockerÊÇÏÖÔÚ×îÊܽӴýµÄÈÝÆ÷»¯ÊÖÒÕÖ®Ò»¡£Ëü»ùÓÚLinuxÄں˵ÄÈÝÆ÷ÊÖÒÕ £¬ÌṩÁËÒ×ÓÚʹÓõÄÏÂÁîÐй¤¾ßºÍͼÐνçÃæ £¬¿É×ÊÖú¿ª·¢Ö°Ô±ºÍϵͳÖÎÀíÔ±ÔÚ²î±ðµÄLinux¿¯ÐаæÉϹ¹½¨ºÍÖÎÀíÈÝÆ÷¡£

¶þ¡¢¹¹½¨ÈÝÆ÷»¯µÄ´óÊý¾ÝÆÊÎöƽ̨

×°ÖÃDocker

Ê×ÏÈ £¬ÎÒÃÇÐèÒªÔÚLinuxϵͳÉÏ×°ÖÃDocker¡£¿ÉÒÔͨ¹ýÒÔÏÂÏÂÁî¾ÙÐÐ×°Öãº

sudo apt-get update
sudo apt-get install docker-ce

µÇ¼ºó¸´ÖÆ

¹¹½¨»ù´¡¾µÏñ

½ÓÏÂÀ´ £¬ÎÒÃÇÐèÒª¹¹½¨Ò»¸ö»ù´¡¾µÏñ £¬¸Ã¾µÏñ°üÀ¨ÁË´óÊý¾ÝÆÊÎöËùÐèµÄÈí¼þºÍÒÀÀµÏî¡£ÎÒÃÇ¿ÉÒÔʹÓÃDockerfileÀ´½ç˵¾µÏñµÄ¹¹½¨Á÷³Ì¡£

ÏÂÃæÊÇÒ»¸öʾÀýµÄDockerfile£º

FROM ubuntu:18.04

# ×°ÖÃËùÐèµÄÈí¼þºÍÒÀÀµÏî
RUN apt-get update && apt-get install -y 
    python3 
    python3-pip 
    openjdk-8-jdk 
    wget

# ×°ÖÃHadoop
RUN wget https://www.apache.org/dyn/closer.cgi/hadoop/common/hadoop-3.1.2/hadoop-3.1.2.tar.gz && 
    tar xvf hadoop-3.1.2.tar.gz && 
    mv hadoop-3.1.2 /usr/local/hadoop && 
    rm -rf hadoop-3.1.2.tar.gz

# ×°ÖÃSpark
RUN wget https://www.apache.org/dyn/closer.cgi/spark/spark-2.4.4/spark-2.4.4-bin-hadoop2.7.tgz && 
    tar xvf spark-2.4.4-bin-hadoop2.7.tgz && 
    mv spark-2.4.4-bin-hadoop2.7 /usr/local/spark && 
    rm -rf spark-2.4.4-bin-hadoop2.7.tgz

# ÉèÖÃÇéÐαäÁ¿
ENV JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
ENV HADOOP_HOME=/usr/local/hadoop
ENV SPARK_HOME=/usr/local/spark
ENV PATH=$PATH:$HADOOP_HOME/bin:$SPARK_HOME/bin

µÇ¼ºó¸´ÖÆ

ͨ¹ýʹÓÃdocker buildÏÂÁî £¬ÎÒÃÇ¿ÉÒÔ¹¹½¨»ù´¡¾µÏñ£º

docker build -t bigdata-base .

µÇ¼ºó¸´ÖÆ

½¨ÉèÈÝÆ÷

½ÓÏÂÀ´ £¬ÎÒÃÇ¿ÉÒÔ½¨ÉèÒ»¸öÈÝÆ÷À´ÔËÐдóÊý¾ÝÆÊÎöƽ̨¡£

docker run -it --name bigdata -p 8888:8888 -v /path/to/data:/data bigdata-base

µÇ¼ºó¸´ÖÆ

ÒÔÉÏÏÂÁ½¨ÉèÒ»¸öÃûΪbigdataµÄÈÝÆ÷ £¬²¢½«Ö÷»úµÄ/path/to/dataĿ¼¹ÒÔص½ÈÝÆ÷µÄ/dataĿ¼Ï¡£ÕâÔÊÐíÎÒÃÇÔÚÈÝÆ÷ÖÐÀû±ãµØ»á¼ûÖ÷»úÉϵÄÊý¾Ý¡£

ÔËÐдóÊý¾ÝÆÊÎöʹÃü

ÏÖÔÚ £¬ÎÒÃÇ¿ÉÒÔÔÚÈÝÆ÷ÖÐÔËÐдóÊý¾ÝÆÊÎöʹÃü¡£ÀýÈç £¬ÎÒÃÇ¿ÉÒÔʹÓÃPythonµÄPySpark¿âÀ´¾ÙÐÐÆÊÎö¡£

Ê×ÏÈ £¬ÔÚÈÝÆ÷ÖÐÆô¶¯Spark£º

spark-shell

µÇ¼ºó¸´ÖÆ

È»ºó £¬¿ÉÒÔʹÓÃÒÔÏÂʾÀý´úÂëÀ´¾ÙÐÐÒ»¸ö¼òÆÓµÄWord CountÆÊÎö£º

val input = sc.textFile("/data/input.txt")
val counts = input.flatMap(line => line.split(" ")).map(word => (word, 1)).reduceByKey(_ + _)
counts.saveAsTextFile("/data/output")

µÇ¼ºó¸´ÖÆ

Õâ¶Î´úÂ뽫ÊäÈëÎļþ/data/input.txtÖеÄÎı¾¾ÙÐзִÊ £¬²¢Í³¼Æÿ¸öµ¥´Ê·ºÆðµÄ´ÎÊý £¬×îºó½«Ð§¹ûÉúÑĵ½/data/outputĿ¼Ï¡£

Ч¹ûÉó²éºÍÊý¾Ýµ¼³ö

ÆÊÎöÍê³Éºó £¬ÎÒÃÇ¿ÉÒÔͨ¹ýÒÔÏÂÏÂÁîÀ´Éó²éÆÊÎöЧ¹û£º

cat /data/output/part-00000

µÇ¼ºó¸´ÖÆ

ÈôÊÇÐèÒª½«Ð§¹ûµ¼³öµ½Ö÷»úÉÏ £¬¿ÉÒÔʹÓÃÒÔÏÂÏÂÁ

docker cp bigdata:/data/output/part-00000 /path/to/output.txt

µÇ¼ºó¸´ÖÆ

Õ⽫°ÑÈÝÆ÷ÖеÄÎļþ/data/output/part-00000¸´ÖƵ½Ö÷»úµÄ/path/to/output.txtÎļþÖС£

Èý¡¢×ܽá

±¾ÎÄÏÈÈÝÁËÔõÑùʹÓÃÈÝÆ÷»¯ÊÖÒÕÔÚLinuxÉϹ¹½¨Ò»¸ö´óÊý¾ÝÆÊÎöƽ̨¡£Í¨¹ýʹÓÃDockerÀ´¹¹½¨ºÍÖÎÀíÈÝÆ÷ £¬ÎÒÃÇ¿ÉÒÔ¿ìËÙ¡¢¿É¿¿µØ°²ÅÅ´óÊý¾ÝÆÊÎöÇéÐΡ£Í¨¹ýÔÚÈÝÆ÷ÖÐÔËÐдóÊý¾ÝÆÊÎöʹÃü £¬ÎÒÃÇ¿ÉÒÔÇáËɵؾÙÐÐÊý¾ÝÆÊÎöºÍ´¦Àí £¬²¢½«Ð§¹ûµ¼³öµ½Ö÷»úÉÏ¡£Ï£Íû±¾ÎĶÔÄú¹¹½¨ÈÝÆ÷»¯µÄ´óÊý¾ÝÆÊÎöƽ̨ÓÐËù×ÊÖú¡£

ÒÔÉϾÍÊÇÔõÑùÔÚLinuxÉϹ¹½¨ÈÝÆ÷»¯µÄ´óÊý¾ÝÆÊÎöƽ̨£¿µÄÏêϸÄÚÈÝ £¬¸ü¶àÇë¹Ø×¢±¾ÍøÄÚÆäËüÏà¹ØÎÄÕ£¡

ÃâÔð˵Ã÷£ºÒÔÉÏչʾÄÚÈÝȪԴÓÚÏàÖúýÌå¡¢ÆóÒµ»ú¹¹¡¢ÍøÓÑÌṩ»òÍøÂçÍøÂçÕûÀí £¬°æȨÕùÒéÓë±¾Õ¾ÎÞ¹Ø £¬ÎÄÕÂÉæ¼°¿´·¨Óë¿´·¨²»´ú±í×ðÁú¿­Ê±¹ÙÍøµÇ¼ÂËÓÍ»úÍø¹Ù·½Ì¬¶È £¬Çë¶ÁÕß½ö×ö²Î¿¼¡£±¾ÎĽӴýתÔØ £¬×ªÔØÇë˵Ã÷À´ÓÉ¡£ÈôÄúÒÔΪ±¾ÎÄÇÖÕ¼ÁËÄúµÄ°æȨÐÅÏ¢ £¬»òÄú·¢Ã÷¸ÃÄÚÈÝÓÐÈκÎÉæ¼°ÓÐÎ¥¹«µÂ¡¢Ã°·¸Ö´·¨µÈÎ¥·¨ÐÅÏ¢ £¬ÇëÄúÁ¬Ã¦ÁªÏµ×ðÁú¿­Ê±¹ÙÍøµÇ¼ʵʱÐÞÕý»òɾ³ý¡£

Ïà¹ØÐÂÎÅ

ÁªÏµ×ðÁú¿­Ê±¹ÙÍøµÇ¼

18523999891

¿É΢ÐÅÔÚÏß×Éѯ

ÊÂÇéʱ¼ä£ºÖÜÒ»ÖÁÖÜÎå £¬9:30-18:30 £¬½ÚãåÈÕÐÝÏ¢

QR code
ÍøÕ¾µØͼ