spark dataframe数据预处理---选择多列

发布于:2021-10-14 16:37:24

例如有dataframe如下.


scala> data.show
+--------+------------------+--------------------+--------+------------------+--------------------+-------+------------------+-------------------+------+
| R1| R2| R3| G1| G2| G3| B1| B2| B3|labels|
+--------+------------------+--------------------+--------+------------------+--------------------+-------+------------------+-------------------+------+
|148.6041|3.6259017071619577| -3.2171012251502593|138.6381|4.1254973506233155| -4.095970307240578|64.3694| 10.48593074743487|-10.562315615907579| 1.0|
|163.6788| 3.923369796488728| 2.4431026078492315|145.5487|2.8350005837741903| -0.9631161606689479|54.4581| 3.478971743202293|-0.9333664634383216| 1.0|
|153.9485| 2.208766114825198| -1.2030967082146709| 147.081|1.8033965176854478| -0.6694519301995274|71.9576|3.1154778509885124| -2.416785959921902| 1.0|
|150.3755| 2.015167424806187| 0.8613925688614028|151.3985|1.5140336026654098| -0.8663388920312477|64.3118|2.6989221478212366|-1.3438560574921259| 1.0|
| 150.738|1.9029335248505135|-0.34252632614162226|150.9738|1.6580451019197278| -0.5527702540920649|64.6246| 3.098043711763925| -1.217963558229269| 1.0|
|150.1358|1.9201454007444332| -0.9324655012241138| 145.269| 1.28157676321007| 0.4697941123271102|81.3181| 2.143761271690484|-1.4854558962599924| 1.0|
|150.0713|1.8608643986061963| 0.790641536351872|146.1766|1.2962300876001915| -0.2269171090034302|73.5234|2.9670275428448587| 2.1784078424352917| 1.0|
| 157.623|2.1803832231972433| -0.6314379399746247|137.6215|1.5737972391639274| 0.9360563719906544|70.0067|3.7130115957265737| 3.5667669954514722| 1.0|
|157.7101| 2.269373920269641| 0.7962686717748036|137.5022| 1.490367458045163| 0.8446912968715053|69.7587| 3.609553200882347| 3.6955376437971523| 1.0|
|169.3828|2.7948281092045715| -1.3436316428867088| 146.971| 1.968593152482249| 1.0536623799971396|74.9072| 4.838676281794433| 3.2474732045080197| 1.0|
|178.6782| 2.132098675014831| 0.4757533493932849|153.3402|1.8165527682949374| 0.4693388041061535|74.7462|3.3670143391438057| 1.3843454896413077| 1.0|
|143.7789|2.4834682985695635| -1.3442208539246348|136.4161|1.9944324480914364| 0.7998009820366255|69.3224|3.0158014258236565|-1.3897639737179137| 1.0|
|158.6412|2.5608714454263413| -0.5713484713968339|139.8721|1.4719855943588578| -0.6617804781381068|67.8988| 3.038874554831114| -2.294231978314215| 1.0|
|155.9917| 2.500606148516795| 1.5021009147939788|134.7518|1.7422390077139243| 1.045329627983475| 60.818| 3.73583672020071| 1.9783042025513178| 1.0|
|159.1847| 2.063634151200256| -1.670647593889981|144.9121|1.1840496568978853|-0.29130766135912434|82.3176|2.7956269851323152|-1.7818624457252779| 1.0|
|176.6245| 2.268237145891055| 1.028819825968006|169.4207|1.4478644653419739|-0.49119770109502614|71.3175|3.0532759046637103| -1.58969698499603| 1.0|
|144.7855|1.9818399910184477| -0.8668439163395646|149.2691|1.5353452999244177| -0.6661614310902355|64.1416| 3.239590937140058| 1.3198351337159953| 1.0|
|140.2185|1.8932400138387104| 0.6880430535537285|148.7436|1.3497625865314238|-0.48540483333204276|75.1313| 3.067516961648297| 1.7129996746141525| 1.0|
| 143.5|1.7881834357805688| 1.1822384017567829|144.0105| 1.290654775685582| 0.6703439722227044|73.6078|3.0502096911523973| 0.2579077304813425| 1.0|
|169.2189| 2.032629526008121| -1.2039682278653374|158.9448| 1.492163851592713| -0.578811113248486|53.9639|3.4846802995396864|-2.1036795958850676| 1.0|
+--------+------------------+--------------------+--------+------------------+--------------------+-------+------------------+-------------------+------+

1、选择一列,例如R1


scala> data.select("R1").show
+--------+
| R1|
+--------+
|148.6041|
|163.6788|
|153.9485|
|150.3755|
| 150.738|
|150.1358|
|150.0713|

2、选择多列,例如R1,R2,G1


val arr=new Array[String]("R1","R2","G1")
用Array,ArrayBuffer,Seq
scala> data.select(arr.head,arr.tail:_*).show
+--------+------------------+--------+
| R1| R2| G1|
+--------+------------------+--------+
|148.6041|3.6259017071619577|138.6381|
|163.6788| 3.923369796488728|145.5487|
|153.9485| 2.208766114825198| 147.081|
|150.3755| 2.015167424806187|151.3985|
| 150.738|1.9029335248505135|150.9738|
|150.1358|1.9201454007444332| 145.269|
|150.0713|1.8608643986061963|146.1766|
| 157.623|2.1803832231972433|137.6215|
|157.7101| 2.269373920269641|137.5022|
|169.3828|2.7948281092045715| 146.971|

?

相关推荐

最新更新

猜你喜欢