Skip to main content

Aggregate Functions

Aggregate functions operate on a set of values to compute a single result.

General

avg

Returns the average of numeric values in the specified column.

avg(expression)

Arguments

  • expression: Expression to operate on. Can be a constant, column, or function, and any combination of arithmetic operators.

Aliases

  • mean

bit_and

Computes the bitwise AND of all non-null input values.

bit_and(expression)

Arguments

  • expression: Expression to operate on. Can be a constant, column, or function, and any combination of arithmetic operators.

bit_or

Computes the bitwise OR of all non-null input values.

bit_or(expression)

Arguments

  • expression: Expression to operate on. Can be a constant, column, or function, and any combination of arithmetic operators.

bit_xor

Computes the bitwise exclusive OR of all non-null input values.

bit_xor(expression)

Arguments

  • expression: Expression to operate on. Can be a constant, column, or function, and any combination of arithmetic operators.

bool_and

Returns true if all non-null input values are true, otherwise false.

bool_and(expression)

Arguments

  • expression: Expression to operate on. Can be a constant, column, or function, and any combination of arithmetic operators.

bool_or

Returns true if any non-null input value is true, otherwise false.

bool_or(expression)

Arguments

  • expression: Expression to operate on. Can be a constant, column, or function, and any combination of arithmetic operators.

count

Returns the number of rows in the specified column.

Count includes null values in the total count. To exclude null values from the total count, include <column> IS NOT NULL in the WHERE clause.

count(expression)

Arguments

  • expression: Expression to operate on. Can be a constant, column, or function, and any combination of arithmetic operators.

max

Returns the maximum value in the specified column.

max(expression)

Arguments

  • expression: Expression to operate on. Can be a constant, column, or function, and any combination of arithmetic operators.

mean

Alias of avg.

median

Returns the median value in the specified column.

median(expression)

Arguments

  • expression: Expression to operate on. Can be a constant, column, or function, and any combination of arithmetic operators.

min

Returns the minimum value in the specified column.

min(expression)

Arguments

  • expression: Expression to operate on. Can be a constant, column, or function, and any combination of arithmetic operators.

sum

Returns the sum of all values in the specified column.

sum(expression)

Arguments

  • expression: Expression to operate on. Can be a constant, column, or function, and any combination of arithmetic operators.

array_agg

Returns an array created from the expression elements. If ordering requirement is given, elements are inserted in the order of required ordering.

array_agg(expression [ORDER BY expression])

Arguments

  • expression: Expression to operate on. Can be a constant, column, or function, and any combination of arithmetic operators.

first_value

Returns the first element in an aggregation group according to the requested ordering. If no ordering is given, returns an arbitrary element from the group.

first_value(expression [ORDER BY expression])

Arguments

  • expression: Expression to operate on. Can be a constant, column, or function, and any combination of arithmetic operators.

last_value

Returns the last element in an aggregation group according to the requested ordering. If no ordering is given, returns an arbitrary element from the group.

last_value(expression [ORDER BY expression])

Arguments

  • expression: Expression to operate on. Can be a constant, column, or function, and any combination of arithmetic operators.

Statistical

corr

Returns the coefficient of correlation between two numeric values.

corr(expression1, expression2)

Arguments

  • expression1: First expression to operate on. Can be a constant, column, or function, and any combination of arithmetic operators.
  • expression2: Second expression to operate on. Can be a constant, column, or function, and any combination of arithmetic operators.

covar

Returns the covariance of a set of number pairs.

covar(expression1, expression2)

Arguments

  • expression1: First expression to operate on. Can be a constant, column, or function, and any combination of arithmetic operators.
  • expression2: Second expression to operate on. Can be a constant, column, or function, and any combination of arithmetic operators.

covar_pop

Returns the population covariance of a set of number pairs.

covar_pop(expression1, expression2)

Arguments

  • expression1: First expression to operate on. Can be a constant, column, or function, and any combination of arithmetic operators.
  • expression2: Second expression to operate on. Can be a constant, column, or function, and any combination of arithmetic operators.

covar_samp

Returns the sample covariance of a set of number pairs.

covar_samp(expression1, expression2)

Arguments

  • expression1: First expression to operate on. Can be a constant, column, or function, and any combination of arithmetic operators.
  • expression2: Second expression to operate on. Can be a constant, column, or function, and any combination of arithmetic operators.

stddev

Returns the standard deviation of a set of numbers.

stddev(expression)

Arguments

  • expression: Expression to operate on. Can be a constant, column, or function, and any combination of arithmetic operators.

stddev_pop

Returns the population standard deviation of a set of numbers.

stddev_pop(expression)

Arguments

  • expression: Expression to operate on. Can be a constant, column, or function, and any combination of arithmetic operators.

stddev_samp

Returns the sample standard deviation of a set of numbers.

stddev_samp(expression)

Arguments

  • expression: Expression to operate on. Can be a constant, column, or function, and any combination of arithmetic operators.

var

Returns the statistical variance of a set of numbers.

var(expression)

Arguments

  • expression: Expression to operate on. Can be a constant, column, or function, and any combination of arithmetic operators.

var_pop

Returns the statistical population variance of a set of numbers.

var_pop(expression)

Arguments

  • expression: Expression to operate on. Can be a constant, column, or function, and any combination of arithmetic operators.

var_samp

Returns the statistical sample variance of a set of numbers.

var_samp(expression)

Arguments

  • expression: Expression to operate on. Can be a constant, column, or function, and any combination of arithmetic operators.

regr_slope

Returns the slope of the linear regression line for non-null pairs in aggregate columns. Given input column Y and X: regr_slope(Y, X) returns the slope (k in Y = k*X + b) using minimal RSS fitting.

regr_slope(expression1, expression2)

Arguments

  • expression_y: Expression to operate on. Can be a constant, column, or function, and any combination of arithmetic operators.
  • expression_x: Expression to operate on. Can be a constant, column, or function, and any combination of arithmetic operators.

regr_avgx

Computes the average of the independent variable (input) expression_x for the non-null paired data points.

regr_avgx(expression_y, expression_x)

Arguments

  • expression_y: Dependent variable. Can be a constant, column, or function, and any combination of arithmetic operators.
  • expression_x: Independent variable. Can be a constant, column, or function, and any combination of arithmetic operators.

regr_avgy

Computes the average of the dependent variable (output) expression_y for the non-null paired data points.

regr_avgy(expression_y, expression_x)

Arguments

  • expression_y: Dependent variable. Can be a constant, column, or function, and any combination of arithmetic operators.
  • expression_x: Independent variable. Can be a constant, column, or function, and any combination of arithmetic operators.

regr_count

Counts the number of non-null paired data points.

regr_count(expression_y, expression_x)

Arguments

  • expression_y: Dependent variable. Can be a constant, column, or function, and any combination of arithmetic operators.
  • expression_x: Independent variable. Can be a constant, column, or function, and any combination of arithmetic operators.

regr_intercept

Computes the y-intercept of the linear regression line. For the equation (y = kx + b), this function returns b.

regr_intercept(expression_y, expression_x)

Arguments

  • expression_y: Dependent variable. Can be a constant, column, or function, and any combination of arithmetic operators.
  • expression_x: Independent variable. Can be a constant, column, or function, and any combination of arithmetic operators.

regr_r2

Computes the square of the correlation coefficient between the independent and dependent variables.

regr_r2(expression_y, expression_x)

Arguments

  • expression_y: Dependent variable. Can be a constant, column, or function, and any combination of arithmetic operators.
  • expression_x: Independent variable. Can be a constant, column, or function, and any combination of arithmetic operators.

regr_sxx

Computes the sum of squares of the independent variable.

regr_sxx(expression_y, expression_x)

Arguments

  • expression_y: Dependent variable. Can be a constant, column, or function, and any combination of arithmetic operators.
  • expression_x: Independent variable. Can be a constant, column, or function, and any combination of arithmetic operators.

regr_syy

Computes the sum of squares of the dependent variable.

regr_syy(expression_y, expression_x)

Arguments

  • expression_y: Dependent variable. Can be a constant, column, or function, and any combination of arithmetic operators.
  • expression_x: Independent variable. Can be a constant, column, or function, and any combination of arithmetic operators.

regr_sxy

Computes the sum of products of paired data points.

regr_sxy(expression_y, expression_x)

Arguments

  • expression_y: Dependent variable. Can be a constant, column, or function, and any combination of arithmetic operators.
  • expression_x: Independent variable. Can be a constant, column, or function, and any combination of arithmetic operators.

Approximate

approx_distinct

Returns the approximate number of distinct input values calculated using the HyperLogLog algorithm.

approx_distinct(expression)

Arguments

  • expression: Expression to operate on. Can be a constant, column, or function, and any combination of arithmetic operators.

approx_median

Returns the approximate median (50th percentile) of input values. It is an alias of approx_percentile_cont(x, 0.5).

approx_median(expression)

Arguments

  • expression: Expression to operate on. Can be a constant, column, or function, and any combination of arithmetic operators.

approx_percentile_cont

Returns the approximate percentile of input values using the t-digest algorithm.

approx_percentile_cont(expression, percentile, centroids)

Arguments

  • expression: Expression to operate on. Can be a constant, column, or function, and any combination of arithmetic operators.

  • percentile: Percentile to compute. Must be a float value between 0 and 1 (inclusive).

  • centroids: Number of centroids to use in the t-digest algorithm. Default is 100.

    If there are this number or fewer unique values, you can expect an exact result. A higher number of centroids results in a more accurate approximation, but requires more memory to compute.

approx_percentile_cont_with_weight

Returns the weighted approximate percentile of input values using the t-digest algorithm.

approx_percentile_cont_with_weight(expression, weight, percentile)

Arguments

  • expression: Expression to operate on. Can be a constant, column, or function, and any combination of arithmetic operators.
  • weight: Expression to use as weight. Can be a constant, column, or function, and any combination of arithmetic operators.
  • percentile: Percentile to compute. Must be a float value between 0 and 1 (inclusive).
note

Large portions of this page is copied from the Apache Datafusion documentation on January 26th 2024 - where there have been customisations to match Hypi's deployment this has been noted. Apache Datafusion and the Apache name are the property of the Apache Foundation and licensed under the Apache V2 license .