还剩13页未读,继续阅读
本资源只提供10页预览,全部文档请下载后查看!喜欢就下载吧,查找使用更方便
文本内容:
年试验汇报2023Spark年试验2023Spark汇报湫小温下文献中文字总数,count READM.txt我们过滤README.txt包括单词有多种Thescala vartheCount=readmeFile.filterline=line.containsnTheMtheCount:org.apache.spark.rdd.RDD[String]EilteredRDD[3:]JhltL filterat console:14我们算出来一共有个单词4The[spark@SlPA222hadoop-
2.
6.0]$IsI--------------------------TiieENSE-rtx±---------------------NOT ICE.txt READMJ[spar4t^BlPA222hadoop-
2.610]$grep TheREADME.txt|wc437269__________________L_71171SUIllIllOr〕♦[spark@SlPA222hadoop-
2.
6.0]$我们通过也算出来有个单词我们再实现下功能:wc4The Hadoopwordcount首先对买取勺执行如下命令:t HreadmeFilescala valwordCount=readmeFile.flatMapline=line.splitnn.mapword=-wordCount:org.apache.spark.rdd.RDD[String Int]r■hl tp://blog.csdn.net/staii\summer=ShuffledRDD
[6]at rediscala另一方面使用命令提交并执行collect job:scala wordcount.collect15/01/0615:08:42INFO spark.SparkContext:Starting job:collect at console:1715/01/0615:08:42INFO scheduler.DAGScheduler:Registering RDD5map at console:1415/01/0615:08:42INFO scheduler.DAGScheduler:Got job2collect at console:17with2output partitionsallowLocal=false15/01/0615:08:42INFO scheduler.DAGScheduler:Final stage:Stage3collect atconsole:1715/01/0615:08:42INFO scheduler.DAGScheduler:Parents offinal stage:ListStage215/01/0615:08:42INFO scheduler.DAGScheduler:Missing parents:ListStage215/01/0615:08:42INFO scheduler.DAGScheduler:Submitting Stage2MappedRDD
[5]at map atconsole:14r whichhas nomissing parents15/01/0615/01/0615:08:42INFO storage.MemoryStore:ensureFreeSpace3560called withcurMem=l99448,maxMem=27784249315/01/0615:08:42INFO storage.MemoryStore:Block broadcast_3stored asvalues in memory estimated size
3.5KB,free
264.8MB15/01/0615:08:42INFO storage.MemoryStore:ensureFreeSpace2525called withcurMem=203008,maxMem=27784249315/01/0615:08:42INFO storage.MemoryStore:Block broadcast_3_pieceO stored as bytesin memory estimatedsize
2.5KB,free
264.8MB INFO15/01/0615:08:42storage.BlockManagerlnfo:Added broadcast_3_piece0inmemoryon localhost:49231size:
2.5KB,free:
264.9MB INFO15/01/0615:08:42storage.BlockManagerMaster:Updated infoof blockbroadcast_3_piece015/01/0615:08:42INFO spark.SparkContext:Created broadcast3from broadcastat DAGScheduler.scala:83815/01/0615:08:42INFO scheduler.DAGScheduler:Submitting2missing tasksfrom Stage2MappedRDD
[5]at mapatconsole:1415/01/0615:08:42INFO scheduler.TaskSchedulerln^l:Adding taskset
2.0with2tasks15/01/0615:08:42INFO scheduler.TaskSetManager:Starting task
0.0in stage
2.0TID4,localhost,ANY,1286bytes15/01/0615/01/0615:08:42INFO scheduler.TaskSetManager:Starting task
1.0in stage
2.0TID5,localhost,ANY,1286bytes15/01/0615:08:42INFO executor.Executor:Running task
0.0in stage.
2.0,TID4t|门|[15/01/0615:08:42INFO executor.Executor:Running task
1.0in stage
2.0TID515/01/0615:08:42INFO rdd.HadoopRDD:Input split:hdfs://S1PA11:9000/tmp/README.txt:683+68315/01/0615:08:42INFO rdd.HadoopRDD:Input split:hdfs://S1PA11:9000/tmp/README.txt:0+68315/01/0615:08:42INFO executor.Executor:Finished task
1.0in stage
2.0TID
5.1896bytes resultsent todriver15/01/0615:08:42INFO executor.Executor:Finished task
0.0in stage
2.0TID
4.1896bytes resultsent todriver15/01/0615:08:42INFO scheduler.TaskSetManager:Finished task
1.0in stage
2.0TID5in192ms onlocalhost1/215/01/0615/01/0615:08:42INFO scheduler.TaskSetManager:Finished task
0.0in stage
2.0TID4in195ms onlocalhost2/215/01/0615:08:42INFO scheduler.TaskSchedulerlmpl:Removed TaskSet
2.0,whose taskshave allcompleted,from pool15/01/0615:08:42INFO scheduler.DAGScheduler:Stage2mapatconsole:14finished in
0.195s15/01/0615:08:42INFO scheduler.DAGScheduler:looking fornewly runnablestages15:08:42INFO scheduler.DAGScheduler:running:Set15:08:42INFO scheduler.DAGScheduler:waiting:SetStage315:08:42INFO scheduler.DAGScheduler:failed:Set15/01/0615:08:42INFO scheduler.DAGScheduler:Missing parentsfor Stage3:List15/01/0615:08:42INFO scheduler.DAGScheduler:Submitting Stage3ShuffledRDD
[6]at reduceByKeyatconsole:14,which isnow runnable15/01/0615:08:42INFO storage.MemoryStore:ensureFreeSpace2112called withcurMem=205533,maxMem=27784249315/01/0615:08:42INFO storage.MemoryStore:Block broadcast_4storedasvalues inmemoryestimatedsize
2.1KB,free
264.8MB15/01/0615:08:42INFO storage.MemoryStore:ensureFreeSpace1545called withcurMem=207645,maxMem=27784249315/01/0615:08:42INFO executor.Executor:Runningtask
1.0instage
3.0TID715/01/0615:08:42INFO storage.ShuffleBlockFetcherIterator:Getting2non-emptyblocksoutof2blocks15/01/0615:08:42INFO storage.ShuffleBlockFetcherIterator:Getting2non-emptyblocksoutof2blocks15/01/0615:08:42INFO storage.ShuffleBlockFetcherIterator:Started0remotefetchesin6ms15/01/0615:08:42INFO storage.ShuffleBlockFetcherIterator:executor.Executor:Started0remotefetchesin6ms15/01/0615:08:42INFO Finishedtask
0.0executor.Executor:Finishedtask
1.0in stage
3.0TID
6.2173bytesresultsenttodriverstage
3.0TID
7.2586bytesresultsent15/01/0615:08:42INFO intodriver
0.0instage
3.0TID6in110msonlocalhost1/215/01/0615:08:42INFO scheduler.TaskSetManager:Finishedtask15/01/0615:08:42INFO scheduler.TaskSetManager:Finishedtask_
1.0indta9e
3.0TID1in116msonlocalhost2/215/01/0615:08:42INFO scheduler.DAGScheduler:Stage3collectatcon3ole:17finishedin0於sumracr15/01/0615:08:42INFO scheduler.TaskSchedulerImpl:RemovedTaskSet
3.0,whosetaskshaveallcompleted,frompoolscheduler
1.DAGOuhedule8u:~JobjL5/01/015:00:422~~finished^~uolleut at—17,~took
0.357415s—pes2:Array[String,Int]=Arrayunder,1,country,1,this,3,distribution,2,is,1,Technology,1,Jetty,1,currently,1,check,1rpermitted.,1,have,1,Security,1,U.S.,1,with,1,BIS,1,This,1,mortbay.org.,1,ECCN,1,using,2,security,1,Department,1,export,1,reside,1,any,1,algorithms.,1,from,1,details,1,re-export,2,has,1,SSL,1rIndustry,1,Administration,1,provides,1,http://hadoop.apache.org/core/,1,countrys,1rUnrestricted,1,
740.13,1,policies,1,country,,1,concerning,1,uses,1,Apache,1,information,2,possession,,2,our,2,as,1,18,Bureau,1,wiki,,1,please,2,form,1,information.,1,ENC,1,Export,2,included,1,asymmetric,1,Connnodity,1,For,1,...我们看下界面执行效果:WEBUIDetails for Stage0Total tasktime acrossall tasks:
0.2sInput:
1366.0B^Show additionalmetricsSummaryMetricsfor2CompletedTasksMetric Min25th percentileMedian75th percentileMaxDuration79ms79ms79ms79ms79msGC Time0ms0ms0ms0ms0msInput683OB6830B«683OB6830B683OBrrttp^v csdn.net/st arksunnicTAggregatedMetricsbyExecutorExecutor ID Address Task Time Total Tasks Failed Tasks Succeeded Tasks Input Output Shuffle Read Shuffle Write Shuffle Spill Memory Shuffle Spill Diskdriver localhost4923103s20213660B00B00B00B00B00BTasksIndex IDAttempt SUtusLocality LevelExecutor ID/Host LaunchTime DurationGC TimeInput Errors000SUCCESS ANYdriver〉/localhost2015/01/0614295079ms6830B hadoop110SUCCESS ANYdnver/localhost2015/01/0614:29:5079ms
683.0B hadoopDetails forStage1Total tasktime acrossall tasks:41msInput
1366.0B.Show additionalmetricsSummaryMetricsfor2CompletedTasksMetric Min25th percentileMedian75th percentileMaxDuration19ms19ms22ms22ms22msGC Time0ms0ms0ms0ms0msinput
683.0B
683.0B
683.0B
683.0B6830Bhttp:7/blog.c:son.nei/siarK_suinmerAggregated Metricsby ExecutorExecutor ID Address Task Time Total Tasks Failed Tasks Succeeded Tasks Input Output Shuttle Read Shuffle Write Shuffle Spill Memory ShuffleSpillDiskdriven*localhost4923163ms20213660B00B00B00B00B00BTasksIndex IDAttempt StatusLocality LevelExecutor ID/Host LaunchTime DurationGC TimeInput Errors020SUCCESS ANYdnver/localhost2015/01/061445:2219ms
683.0B hadoop130SUCCESS ANYdriver/localhost2015/01/0614452222ms6830B hadoopSnark120Details forStage2Total tasktime acrossall tasks:04sInput:13660BShuffle write:23KB^Show additionalmetricsSummaryMetricsfor2CompletedTasksMetric Min25th percentileMedian75th percentlieMaxDuration
0.2s
0.2S02s
0.2s02SGC Time0ms0ms0ms0ms0msinput
683.0B
683.0B
683.0B
683.0B6830B.csdf1%6b;tark_suiranerShuffle Write11530B11530B B12100B12100BAggregatedMetricsbyExecutorExecutor IDAddressTaskTimeTotal TasksFailedTasksSucceededTasksInputOutputShuttleRead ShuffleWriteShuffleSpillMemoryShuttle Spill Diskdriver localhosl.
492310.4S
2021366.0B
0.0B
0.0B
2.3KB
0.0B
0.0BTasksIndex IDAttempt StatusLocality LevelExecutor ID/Host LaunchTime DurationGC TimeInput WriteTime ShuffleWrite Errors040SUCCESS ANYdnver/localhost2015/01/0615:084202s68300hadoop
1153.0B150SUCCESS ANYdriver/localhosl02s6830B hadoop
1210.0BSpark120DetailsforStage3Total tasktime acrossall tasks:02s.Show additionalmetricsSummaryMetricsfor2CompletedTasksMetric Min25th percentileMedian75th percentileMaxDuration01s01s01s01s01sGC Time25ms25ms25ms25ms25msAggregatedMetricsbyExecutor.csdn.net/stark summerExecutorIDAddressTaskTimeTotal TasksFailedTasksSucceededTasksInputOutput ShuffleRead ShuffleWriteShuffleSpillMemoryShuffleSpillDiskdriverlocalhost4923102s20200B00B00B00B00B008TasksIndex IDAttempt StatusLocality LevelExecutorID/Host LaunchTime DurationGC TimeErrors060SUCCESS PROCESS_LOCAL driver/localhost2015/01/0615084201s25ms170SUCCESS PROCESS_LOCAL driveo/localhost2015/01/0615084201s25msJobsStagesStorageEnvironmentExecutorsExecutors1Memory:00B Used
265.0MB TotalDisk:
0.0BUsedsummRDD MemoryDisk ActiveFailed CompleteTotalTaskShuffle ShuffleThreadExecutor IDBlocksUsed UsedTasks Tasks Tasks TasksTime GRead WriteDumpAddressdriver localhost:4g
23100.0B/
0.0B
00884.
00.0B
2.3KB thread|931ms
265.0MB KBpumpj
二、记录单词个数例子,使用sparkapiWordCount:环节1val sc=new SparkContextargs0,WordCount”,System.getenvSPARK_HOME,、SeqSystem.getenv SPARK_TEST_JAR2val textFile=sc.textFileargslval inputFormatClass=classOf[SequenceFileInputFormat[Text,Text]]var hadoopRdd=sc.hadoopRDDconf,inputFormatClass,classOf[Text],classOf[Text]环节3val result=hadoopRdd.flatMap{casekey.value.=.value.toString.split\\s+”;}.mapwor.=.word.l,reduceByKe.・._将产生的数据集保留到上可以使用中的哈数将数据集保RDD HDFSSparkContext saveAsTextFile留到目录下,默认采用提供勺每条记录以的形式打HDFS HadoopH TextOutputFormat,“key,value”印输出,你也可以采用函数将数据保留为格式等,saveAsSequenceFile SequenceFile当然,一般我们写程序时,需要包括如下两个头文献result.saveAsSequenceFileargs2Spark import,需要注意日勺是,指定输入输出文献时,需要指定日勺org.apache.spark.Jmport SparkContexthdfs例如输入目录是〃输出目录是〃其中,URI,hdfs:hadoop-test/tmp/input,hdfs:hadoop-test/tmp/output,〃是由配置文献中参数指定时,详细替代“hdfs:hadoop-test”Hadoop core-site.xml fs.default.name成你的配置即可试验汇报Spark-、环境搭建.下载版本下载地址为
1.解压和安装2解压:安装mv scala-
2.
11.4〜/opt/、编辑文献增长环境变量配置,3〜/.bash_profile SCALAJHOMEexportJAVA_HOME=/home/spark/opt/java/jdk
1.
6.0_37exportCLASSPATH=.:$JAVA_HOME/jre/lib:$JAVA_HOME/lib:$JAVA_HOME/lib/tools.j arPATH=$PATH:$HOME/bin:$JAVA_HOME/bin:${SCALA_HOME}/bin立即生效source-/.bash_profile.验证4scala:scala-version至机器
5.copy ijslave scp-/.bash_profile/.bash_profile.下载6spark,wget、在主机配置7master spark:将下载的解压到即〜/opt/配置环境变量-/opt/spark-
1.
2.0-bin-hadoop
2.4,SPARK_HOME#set javaenvexportCLASSPATH=.:$JAVA_HOME/jre/lib:$JAVA_HOME/lib:$JAVA_HOME/lib/tools.jarPATH=$PATH:$HOME/bin:$JAVA_HOME/bin:${SCALA_HOME}/bin:${SPARK_HOME}/bin:${HADOOP_HOME}/bin配置完毕后使用命令使配置生效source进入目录spark conf[spark@S1PA11[spark@S1PA11spark-
1.
2.0-bin-hadoop
2.4]$Isbin confdata ec2examples libLICENSE logsNOTICE pythonREADME.md RELEASEsbin work[spark@S1PA11spark-
1.
2.0-bin-hadoop
2.4]$cd conf/[spark@S1PA11conf]$Isfairscheduler.xml.template metrics.properties.template slaves.templatespark-env.sh修改文献,增长两个节点first:slaves slaveS1PA
11.S1PA222[spark@S1PA11conf]$vi slavesS1PA11S1PA222酉己置second:spark-env.sh首先把spark-env.sh.template copyspark-env.sh文献在最下面增长vi spark-env.shexport SPARK_WORKER_MEMORY=2g是配置文献目录,主机土也址,HADOOP_CONF_DIR HadoopSPARK_MASTER_IP IP是使用时最大内存SPARK_WORKER_MEMORY worker完毕配置后,将目录机器spark copyslave scp-r^/opt/spark-1,
2.0-bin-hadoop
2.4:〜/opt/、启动分布式集群并查看信息8spark[spark@S1PA11sbin]$./start-all.sh查看[spark@S1PA11sbin]$jps31233ResourceManager27201Jps30498NameNode30733SecondaryNameNode5648Worker5399Master15888JobHistoryServer假如没有启动,启动起来.HDFS查看节点slave[spark@S1PA222scala]$jps20352Bootstrap30737NodeManager7219Jps30482DataNode29500Bootstrap757Worker、页面查看集群状况9进去集群的管理页面,访问spark JwebSpark SparkMaster atspark://
10.
58.
44.47:7077URLspark//105844477077Workers:2Cores:28Total.0UsedMemory:40GB Total,00B UsedApplications:0Running.0CompletedDrivers:0Running_0Completed Status:ALIVEWorkersId AddressState CoresMemoryS1PA20957892ALIVE40used
2.0GB
0.0B usedS1PA1128095ALIVE240used20GB00B usedRunningApplicationsIDName CoresMemory perNode SubmittedTime UserState DurationCompletedApplicationsIDName CoresMemory perNode SubmittedTime UserState DurationSpark120由于我们看到两个节点,由于和都是节点worker masterslave workersparkSSlPAllbin$|./spark-shel1Spark.assemblyhasbeenbuiltwithHiIncludingDatanucleusjarsonclasspath15/01/0614:17:14INFO spark.SecurityManager:INFO Changingviewacls七spark15/01/0614:17:14spark.SecurxtyManager:INFO Changingmodx£yaclsto:spark15/01/0614:17:14spark.SecurityManager:SecurityyManager:authenticationdisabled;uxau工2disabled;userswxfchviewpermi.ssd.ons:我们进入的目录,启动控制台spark Ibin spark-shellns:3etspark1S/01/06INFOspark.HttpServer:StartingHTTPServer1S/01/0615/01/06INFOserver.AbstractConnector:StartedSocketConnector@
0.
0.
0.0:5448715/01/06INFOutil.Utils:Successfullystartedservice•HTTPclassserver,onport
54487.WelcometoUsing Scalaversion Typein
2.
10.4JavaHotSpotTMtohave64-BitServerVM.Java
1.
6.037expressions Type:help forthemevaluated,information.more15/01/061417:18INFO spark.Securi€yManager:spark・Changingviewaclsto:sparkbChangingmodifyacls st arksuinmor15/01/061417:18INFO Securi€yManager:spark・to:spark15/01/061417:18INFO Securi€yManager:SecurityyManager:authen€xca€iondisabled;uiaclsdisabled;userswithviewpermissions:Setspark;userswithmodifypermissions:Setspark.15/01/0614:1719INFOslf4j.Slf4jLogger:Slf4jLoggerstarred15/01/061719INFOReraoting:Startingremoving15/01/061719INFORemoving:Removingstarted;listeningonaddresses:[akka.tcp://sparkDriver@S1PA11:8327]15/01/061719INFOutil.Utils:Successfullystartedservice•sparkDriver,onport
8327.15/01/061719INFOspark.SparkEnv:RegisteringMapOutputTracker15/01/061719INFOspark.SparkEnv:RegisteringBlockManagerMaster15/01/06171915/01/061719INFOstorage.Memorystore:Memorystorestartedwithcapacity
265.0MB15/01/061719WARNut;il.NativeCodeLoader:Unabletoloadnative-hadooplibraryforyourplatform...usingbailtin-javaclasseswhereapplicableINFOspark.HttpFileServer:HTTPFileserverdirectoryis2015/01/0617/tmp/spark-f9915646-f382-419f-86ff-0e6e989f89b3INFOspark.HttpServer:StartingHTTPServerINFOserver.Server:jetty-
8.y.z-SNAPSHOT2015/01/0617INFOserver.Abstractconnector:StartedSocketConnector@
0.
0.
0.0:222592015/01/0617INFOut^il.Ueils:SuccessfullystartedservicefHTTPfileserveronport
22259.2015/01/061720INFOserver.Server:jetty-
8.y.z-SNAPSHOT15/01/06172015/01/061715/01/0614:17:20INFO netty.NettyBlockTransferService:Server createdon5740115/01/0614:17:20INFO storage.BlockManagerMaster:Trying toregister BlockManager15/01/0614:17:20INFO storage.BlockManagerMasterActor:Registering blockmanager localhost:57401with
265.0MB RAM,BlockManagerlddriver,localhost,5740115/01/0614:17:20INFO storage.BlockManagerMaster:Registered BlockManager15/01/0614:17:20INFO repl.SparkILoop:Created sparkcontext...][•“],【,、[Spark contextavailable assc.scala|访问我们可以看到页面://master:4040/spark WEBUI5J/jobs/master:404iJobs StagesStorageEnvironmentExecutors Spark shell applicationUI网址大全gomeIzkkafkaJlinux Jjava多线i.大数据spring java基础项目管理.akka博客jvm消息队列Lhtml网络JNoSqldb.人工智能【ideaJDB⑺Spark JobsTotalDuration:34mm SchedulingMode:FIFO ActiveJobs:0Completed Jobs:0Failed Jobs:0ActiveJobs0Job IdDescription SubmittedDuration Stages:Succeeded/Total Tasksfor allstages:Succeeded/TotalCompletedJobs0Job IdDescription SubmittedDuration Stages:Succeeded/Total Tasksfor allstages:Succeeded/TotalFailedJobs0Job IdDescription SubmittedDuration Stages:Succeeded/TotalTasksfor allstages:Succeeded/TotalSpark
1.20WVM.]八十匚Jy51pJZ.K RC1IROl=JMlIMARJJOV二心T_jav i-JCIRMI i—Jtwiti-J ivmJJ/r4myvzu i—»iitim u—J invsni—□iwnvp i-viiy常Spa Jobs Stages Storage Environment ExecutorsSpark shellapplication UISparkStages forall jobsTotalDuration:
4.7mm SchedulingMode:FIFO ActiveStages:0Completed Stages:0Failed Stages:0Stage IdDescription SubmittedDuration Tasks:Succeeded/Total InputOutput ShuffleRead ShuffleWriteActiveStages0Stage IdDescription SubmittedDuration Tasks:Succeeded/Total InputOutput ShuffleReadShuffleWriteCompletedStages0Stage IdDescription SubmittedDuration Tasks:Succeeded/Total InputOutputShuffleReadShuffleWrite FailureReasonFailedStages0SparK
1.20SpQJobs StagesStorage EnvironmentExecutorsSparkshellapplicationUI常EnvironmentRuntimeInformationName ValueJavaHome/home/spark/opVjava/jdki60_37/jre
1.60_37Sun MicrosystemsIncversion210Java Version4Scala versionSparkPropertiesNameValuesparKapp Idocal lsparkapp nameSpark shellsparkdnver hostS1PA11spark driverport8327spark.executor.id dnversparkfileserver urihttp//1058444722259sparkjarsspark masterlocal[*]http//
10.
58.
44.4754487spark schedulermode FIFOspark-a4ce9ad2-96c6-41fa-b2de-09fe9a6db50DQvefAmPrccartia©JobsStagesStorageEnvironmentExecutorsExecutors1Memory:
0.0B Used
265.0MB TotalDisk:
0.0B UsedExecutorRDD MemoryDisk ActiveFailed CompleteTotalID BlocksUsed UsedTasksTasksTasks://b TasksAddressdriverlocalhost:
5740100.0B
00000.0B/
265.0MB集群环境搭建成功了运行测试spark10spark-shell之前我们在目录上传了一种文献,我们目前就用读取中/tmp README.txt sparkhdfs文献README.txt/tmpPermission OwnerGroup-rw-r-r—spark supergroupdrwx------spark supergroupdrwxr-xr-x sparksupergroupHadoop,
2014.获得文献:hdfs。
个人认证
优秀文档
获得点赞 0