Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
Calculates the approximate quantiles of numerical columns of a DataFrame.
Syntax
approxQuantile(col: Union[str, List[str], Tuple[str]], probabilities: Union[List[float], Tuple[float]], relativeError: float)
Parameters
| Parameter | Type | Description |
|---|---|---|
col |
str, tuple or list | Can be a single column name, or a list of names for multiple columns. |
probabilities |
list or tuple of floats | a list of quantile probabilities. Each number must be a float in the range [0, 1]. For example 0.0 is the minimum, 0.5 is the median, 1.0 is the maximum. |
relativeError |
float | The relative target precision to achieve (>= 0). If set to zero, the exact quantiles are computed, which could be very expensive. Note that values greater than 1 are accepted but gives the same result as 1. |
Returns
list: the approximate quantiles at the given probabilities. If the input col is a string, the output is a list of floats. If the input col is a list or tuple of strings, the output is also a list, but each element in it is a list of floats.
Notes
Null values will be ignored in numerical columns before calculation. For columns only containing null values, an empty list is returned.
Examples
data = [(1,), (2,), (3,), (4,), (5,)]
df = spark.createDataFrame(data, ["values"])
quantiles = df.approxQuantile("values", [0.0, 0.5, 1.0], 0.05)
quantiles
# [1.0, 3.0, 5.0]
data = [(1, 10), (2, 20), (3, 30), (4, 40), (5, 50)]
df = spark.createDataFrame(data, ["col1", "col2"])
quantiles = df.approxQuantile(["col1", "col2"], [0.0, 0.5, 1.0], 0.05)
quantiles
# [[1.0, 3.0, 5.0], [10.0, 30.0, 50.0]]