Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
Joins with another DataFrame, using the given join expression.
Syntax
join(other: "DataFrame", on: Optional[Union[str, List[str], Column, List[Column]]] = None, how: Optional[str] = None)
Parameters
| Parameter | Type | Description |
|---|---|---|
other |
DataFrame | Right side of the join. |
on |
str, list or Column, optional | a string for the join column name, a list of column names, a join expression (Column), or a list of Columns. If on is a string or a list of strings indicating the name of the join column(s), the column(s) must exist on both sides, and this performs an equi-join. |
how |
str, optional | default inner. Must be one of: inner, cross, outer, full, fullouter, full_outer, left, leftouter, left_outer, right, rightouter, right_outer, semi, leftsemi, left_semi, anti, leftanti and left_anti. |
Returns
DataFrame: Joined DataFrame.
Examples
import pyspark.sql.functions as sf
from pyspark.sql import Row
df = spark.createDataFrame([Row(name="Alice", age=2), Row(name="Bob", age=5)])
df2 = spark.createDataFrame([Row(name="Tom", height=80), Row(name="Bob", height=85)])
df.join(df2, "name").show()
# +----+---+------+
# |name|age|height|
# +----+---+------+
# | Bob| 5| 85|
# +----+---+------+
joined = df.join(df2, df.name == df2.name, "outer").sort(sf.desc(df.name))
joined.show()
# +-----+----+----+------+
# | name| age|name|height|
# +-----+----+----+------+
# | Bob| 5| Bob| 85|
# |Alice| 2|NULL| NULL|
# | NULL|NULL| Tom| 80|
# +-----+----+----+------+
df.alias("a").join(
df.alias("b"), sf.col("a.name") == sf.col("b.name"), "outer"
).sort(sf.desc("a.name")).select("a.name", "b.age").show()
# +-----+---+
# | name|age|
# +-----+---+
# | Bob| 5|
# |Alice| 2|
# +-----+---+