Saturday, June 6, 2015

Oozie Hive2 step from Hue's Oozie's workflow editor


Here is a warm recommendation for moving to hive2 step when running Hive from Oozie and a way to that even when using Hue workflows editor.



The developer wanted to schedule a complex hive query on our cdh 5.3.3 cluster (join over literal views over json serde table ) with oozie Hive step. The query worked when running from regular hive client. The step failed on Oozie due to log4j permissions exception, before it started the hive query mapreduce. Not so indicative.


After 2 days of debugging (Other queries have worked) we decided to try the Hive 2 step (It happened before that the basic hive step didn't work while Hive2 step did work).
I remember that when upgrading to cdh 5.2, Cloudera wrote that it is recommended to migrate to hive2 step (the link), but didn't write anything about specific functionality that won't work with regular hive step.

We wrote a testing workflow xml with hive2 step, ran it with Oozie CLI , and it worked! That was great and made sense because hive2 is acting just as a regular hive client that connects the hive server. That's the natural thing to do, and I don't understand exactly how the basic hive step is working.

Unfortunately, The Hue's Oozie workflows editor doesn't support Hive2 step. That's why manny people probably aren't familiar with this step.

That's too bad cause we didn't want to force the developer writing and maintaining the Oozie workflow xml without having a convenience GUI or API  (who wants to edit xml files?)   (There is an api that someone from my organization had built but it supports only FS step, pig step and hive step  pyoozie ).

The last resort was using a generic step from Hue. We copied the hive2 step block from the XML to the generic step text box on the workflow editor on hue, and it worked! Victory :)

So remember to prefer Hive2 step with Generic step rather than the classic Oozie Hive step that is full of bugs and doesn't work right (not via hive-server).  In addition, you can try the pyoozie, that make it easier to create Oozie workflows from code.









No comments:

Post a Comment