The show function allows us to preview a data frame. The show method provides us with a few options to edit the output. In this article, we will learn how to use the PySpark show function.
from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()
from datetime import datetime, date
import pandas as pd
from pyspark.sql import Row
df = spark.createDataFrame([
Row(amount = 20000, month = 'jan', date = datetime(2000, 1, 1, 12, 0), desc = "a very very very long description"),
Row(amount = 40000, month = 'feb', date = datetime(2000, 2, 1, 12, 0), desc = "a very very very long description"),
Row(amount = 50000, month = 'mar', date = datetime(2000, 3, 1, 12, 0), desc = "a very very very long description")
])
df.show()
+------+-----+-------------------+--------------------+
|amount|month| date| desc|
+------+-----+-------------------+--------------------+
| 20000| jan|2000-01-01 12:00:00|a very very very ...|
| 40000| feb|2000-02-01 12:00:00|a very very very ...|
| 50000| mar|2000-03-01 12:00:00|a very very very ...|
+------+-----+-------------------+--------------------+
We can specify the number of row to display using the n
named parameter.
df.show(n=2)
+------+-----+-------------------+--------------------+
|amount|month| date| desc|
+------+-----+-------------------+--------------------+
| 20000| jan|2000-01-01 12:00:00|a very very very ...|
| 40000| feb|2000-02-01 12:00:00|a very very very ...|
+------+-----+-------------------+--------------------+
only showing top 2 rows
Notice above that our description column has been cut off. But default, pyspark will truncate this data. We can change that using the truncate
named parameter.
df.show(truncate = False)
+------+-----+-------------------+---------------------------------+
|amount|month|date |desc |
+------+-----+-------------------+---------------------------------+
|20000 |jan |2000-01-01 12:00:00|a very very very long description|
|40000 |feb |2000-02-01 12:00:00|a very very very long description|
|50000 |mar |2000-03-01 12:00:00|a very very very long description|
+------+-----+-------------------+---------------------------------+
We can also specificy the length to truncate by passing a number to the truncate
named parameter.
df.show(truncate = 20)
+------+-----+-------------------+--------------------+
|amount|month| date| desc|
+------+-----+-------------------+--------------------+
| 20000| jan|2000-01-01 12:00:00|a very very very ...|
| 40000| feb|2000-02-01 12:00:00|a very very very ...|
| 50000| mar|2000-03-01 12:00:00|a very very very ...|
+------+-----+-------------------+--------------------+
The final option we can do is to display the dataframe vertically using the vertical
parameter.
df.show(vertical = True)
-RECORD 0----------------------
amount | 20000
month | jan
date | 2000-01-01 12:00:00
desc | a very very very ...
-RECORD 1----------------------
amount | 40000
month | feb
date | 2000-02-01 12:00:00
desc | a very very very ...
-RECORD 2----------------------
amount | 50000
month | mar
date | 2000-03-01 12:00:00
desc | a very very very ...
You can also combine all of these parameters.
df.show(n = 2, truncate = 30, vertical = True)
-RECORD 0--------------------------------
amount | 20000
month | jan
date | 2000-01-01 12:00:00
desc | a very very very long descr...
-RECORD 1--------------------------------
amount | 40000
month | feb
date | 2000-02-01 12:00:00
desc | a very very very long descr...
only showing top 2 rows