Recently, I have been asked to form a table, the schema is very simple:
pid: STRING (NULLABLE)
rec_list1: STRING (NULLABLE)
rec_list2: STRING (NULLABLE)
It’s a simple mapping from pid to its recommending list. Note that they are all strings. However, My boss wants me to convert to something like this:
pid: STRING (NULLABLE)
left: STRUCT
pid: NUMBERIC (NULLABLE)
cat_id: NUMBERIC (NULLABLE)
cat_name: STRING (NULLABLE)
right: STRUCT
pid: NUMBERIC (NULLABLE)
cat_id: NUMBERIC (NULLABLE)
cat_name: STRING (NULLABLE)
It seems to be easy but took me some time because I am green on advanced usage of sql. Therefore, I am introducing UNNEST, ARRAY, STRUCT, etc to me.
UNNESTSome documentation from google that I can find.
UNNESTwill breakARRAYinto rows and it’s a powerful breaker. Sometimes there are array of structs, you can useUNNESTto break some and join; Other usage ofUNNESTI saw before is to using it to create additional info and joining the additional info into table to filter accordingly.UNNESTcan be used together asCROSS JOINARRAYAgain some documentation from google that I can find.
ARRAYdoesn’t support fancyJOINinside of it. Something like this below is forbidden by googleselect array( select as struct pid, cat from unnest(some_array) left join some_table using some_column order by some_other_column )LEFT JOINis prohibited inside array, google will error you do an efficient join outside. HoweverINNER JOIN/JOINwill work.STRUCTIt’s like an object that you can put a lot of things inside, but remember it’s hard to do
ORDER BY,GROUPall related operations touching the field inside. I would recommend prepare the data before actually forming theSTRUCT.
Additionally, use Common Table Expression (CTE) wisely. As I mentioned earlier, avoid doing sorting and grouping involves STRUCT. My trick would be join tables using sub CTE before hand and form structs and only do simple sort and group when necessary.