I think Treasure Data should...

Support UDF using sql or python

It would be nice to support like plpythonu not only SQL based UDF.
Of course, I don't care If it wouldn't support file I/O and network access.

e.g. https://aws.amazon.com/jp/blogs/aws/user-defined-functions-for-amazon-redshift/

7 votes
Sign in
Check!
(thinking…)
Reset
or sign in with
  • facebook
  • google
    Password icon
    I agree to the terms of service
    Signed in as (Sign out)

    We’ll send you updates on this idea

    Y.Kentaro (@yoshi_ken) shared this idea  ·   ·  Admin →

    2 comments

    Sign in
    Check!
    (thinking…)
    Reset
    or sign in with
    • facebook
    • google
      Password icon
      I agree to the terms of service
      Signed in as (Sign out)
      Submitting...
      • Rick Otten commented  · 

        I should add to my comment that network access would be useful for when we are transforming data via 3rd party api's (such as geocoding services), so while UDF without network access would still get us a step closer to where we need to be, having the ability to hit other services from a function would be very useful. If we were going to use a plpython interface we might want access to standard libraries such as the Google Phone Numbers library (which standardizes/normalizes phone number fields).

      • Rick Otten commented  · 

        I agree. Sometimes we have to aggregate or transform (or both) data in ways that are not available with the current function set or easily done with Presto/Hive SQL. At this time we have to pull back every row and process it. It would be great if we could do that in the 'database' instead. It would save us significant overhead for some of our processing. (Especially now that we are in the hundreds of millions of rows.)

      Feedback and Knowledge Base