SQL presto - cross join unnest with ordinality flattening
Publish date: 2024-04-21
Why is my query not flattening the data as expected?
I am querying a table where columns contain arrays. My goal is to unnest the items in the arrays to find unique items and turn those into rows.
SELECT table1.tag_names, table1.tag_ids, rank_position FROM table1 CROSS JOIN UNNEST (tag_ids, tag_names) WITH ORDINALITY as T (tag_ids, tag_names, rank_position) ORDER BY tag_ids Results:
| tag_names | tag_ids | rank_position |
|---|---|---|
| ["red", "blue", "green"] | [111, 222, 333] | 1 |
| ["red", "blue", "yellow"] | [111, 222, 444] | 4 |
Desired Results:
| tag_names | tag_ids | rank_position |
|---|---|---|
| "red" | 111 | 1 |
| "blue" | 222 | 2 |
| "green" | 333 | 3 |
| "yellow" | 444 | 4 |
What am I missing?
Updated to use alias, new results below:
| tag_names | tag_ids | rank_position |
|---|---|---|
| "red" | 111 | 1 |
| "red" | 111 | 10 |
| "red" | 111 | 3 |
| "red" | 111 | 12 |
| "yellow" | 444 | 4 |
1 Answer
You should use alias introduced for the flattened data in the CROSS JOIN UNNEST in the select:
-- sample data WITH dataset (tag_names, tag_ids) AS ( VALUES (array['red', 'blue', 'green'], array[111, 222, 444]) ) -- query select T.tag_names, T.tag_ids, rank_position from dataset CROSS JOIN UNNEST (tag_ids, tag_names) WITH ORDINALITY as T (tag_ids, tag_names, rank_position) ORDER BY tag_ids Output:
| tag_names | tag_ids | rank_position |
|---|---|---|
| red | 111 | 1 |
| blue | 222 | 2 |
| green | 444 | 3 |
UPD
ORDINALITY does not work across multiple rows, one way to achieve desired result is to flatten the arrays, then use group by and row_number:
-- sample data WITH dataset (tag_names, tag_ids) AS ( VALUES (array['red', 'blue', 'green'], array[111, 222, 333]), (array['red', 'blue', 'yellow'], array[111, 222, 444]) ) -- query select *, row_number() over (order by tag_ids) rank_position from ( select T.tag_names, T.tag_ids from dataset CROSS JOIN UNNEST (tag_ids, tag_names) AS T (tag_ids, tag_names) GROUP BY T.tag_names, T.tag_ids ) ORDER BY tag_ids Output:
| tag_names | tag_ids | rank_position |
|---|---|---|
| red | 111 | 1 |
| blue | 222 | 2 |
| green | 333 | 3 |
| yellow | 444 | 4 |
ncG1vNJzZmirpJawrLvVnqmfpJ%2Bse6S7zGiorp2jqbawutJobmpwZ2uAcoWOrKilZaCnsrTAzmaaq6ejqHqru8inZK6mnprAtXnWoquhZZ%2Bnsaq6wKWgrbFdm7miwNOepaKmlw%3D%3D