微信扫码下载

内容简介

  为了帮助读者学习如何使用、部署和维护ApacheSpark,该开源集群计算框架的部分创建者编写了《Spark权威指南(影印版 英文版)》这本综合指南。
  《Spark权威指南(影印版 英文版)》作者比尔·钱伯斯和马太·扎哈里亚在强调Spark2.0的改进和新功能的同时,将Spark题分为不同的部分,每个部分都有其独特的目标。
  你将探索Spark的结构化API的基本操作和常见功能以及StructuredStreaming,后者是用于构建端到端流应用的一种全新的高层API。开发人员和系统管理员会学Spark监控、调优、调试的基础知识,探索机器学习技术以及Spark可扩展机器学习库MLlib的部署场景。

目录

Preface

PartI.GentleOverviewofBigDataandSpark
1.WhatIsApacheSpark?
ApacheSpark'sPhilosophy
Context:TheBigDataProblem
HistoryofSpark
ThePresentandFutureofSpark
RunningSpark
DownloadingSparkLocally
LaunchingSpark'sInteractiveConsoles
RunningSparkintheCloud
DataUsedinThisBook
2.AGentleIntroductiontoSpark
Spark'sBasicArchitecture
SparkApplications
Spark'sLanguageAPIs
Spark'sAPIs
StartingSpark
TheSparkSession
DataFrames
Partitions
Transformations
LazyEvaluation
Actions
SparkUI
AnEnd-to-EndExample
DataFramesandSQL
Conclusion
3.ATourofSpark'sToo1set
RunningProductionApplications
Datasets:Type-SafeStructuredAPIs
StructuredStreaming
MachineLearningandAdvancedAnalytics
Lower-LevelAPIs
SparkR
Spark'sEcosystemandPackages
Conclusion

PartII.StructuredAPls——DataFrames,SQL,andDatasets
4.StructuredAPIOverview
DataFramesandDatasets
Schemas
OverviewofStructuredSparkTypes
DataFramesVersusDatasets
Columns
Rows
SparkTypes
OverviewofStructuredAPIExecution
LogicalPlanning
PhysicalPlanning
Execution
Conclusion
5.BasicStructuredOperations
Schemas
ColumnsandExpressions
Columns
Expressions
RecordsandRows
CreatingRows
DataFrameTransformations
CreatingDataFrames
selectandselectExpr
ConvertingtoSparkTypes(Literals)
AddingColumns
……
6.WorkingwithDifferentTypesofData
7.Aggregations
8.Joins
9.DataSources
10.SparkSQL
11.Datasets

PartIII.Low—LevelAPIs
12.ResilientDistributedDatasets(RDDs)
13.AdvancedRDDs
14.DistributedSharedVariables

PartIV.ProductionApplications
15.HowSparkRunsonaCluster
16.DevelopingSparkApplications
17.DeployingSpark
18.MonitoringandDebugging
19.PerformanceTuning

PartV.Streaming
20.StreamProcessingFundamentals
21.StructuredStreamingBasics
22.Event-TimeandStatefulProcessing
23.StructuredStreaminginProduction

PartVI.AdvancedAnalyticsandMachineLearning
24.AdvancedAnalyticsandMachineLearningOverview
25.PreprocessingandFeatureEngineering
26.Classification
27.Regression
28.Recommendation
29.UnsupervisedLearning
30.GraphAnalytics
31.DeepLearning

PartVII.Ecosystem
32.LanguageSpecifics:Python(PySpark)andR(SparkRandsparklyr)
33.EcosystemandCommunity

Index

其他推荐