(A standalone bee)
I didn't need HiveServer, Mapreduce, or Hadoop cluster. So how do you do that?
Here are the steps:
- Install Hive Metastore Repository - an instance of one of the dbs that hive metastore works with (MySql, PostgresSql, MsSql, Oracle .. check documentation)
- Install Java
- Download Vanilla Hadoop http://hadoop.apache.org/releases.html and unpack on the hive metastore instance (let's say that you unpacked to /apps/hadoop-2.6.2)
- Set environment variables :
- export HADOOP_PREFIX=/apps/hadoop-2.6.2
- export HADOOP_USER_CLASSPATH_FIRST=true
- Download hive http://www.apache.org/dyn/closer.cgi/hive/ and upack on you instace
- Create a schema (user) for hive user and build the hive schema in the the hive metastore repository db using hive scripts (a sample script for mysql):
- /apps/apache-hive-1.2.1-bin/scripts/metastore/upgrade/mysql/hive-schema-1.2.0.mysql.sql
- configure hive-site.xml with the right parameters for:
- ConnectionUrl (jdbc:mysql://localhost:3666/hive for example)
- ConnectionDriverName
- ConnectionUserName (the created database user)
- ConnectionPassword (the created database user password)
- hive.metastore.warehouse.dir - set it to a local path (file:///home/presto/ for example)
- Copy the required jar for jdbc connection to the metastore repository in the hive class path. (Ojdbc6 for oracle, mysql-jdbc-connector for mysql and so on)
- Start hive metastore - /apps/apache-hive-1.2.1-bin/bin/hive
--service metastore
- For accessing S3:
- copy these jars to the classpath:
- aws-java-sdk-1.6.6.jar (http://mvnrepository.com/artifact/com.amazonaws/aws-java-sdk/1.6.6)
- hadoop-aws-2.6.0.jar (http://central.maven.org/maven2/org/apache/hadoop/hadoop-aws/2.6.0/hadoop-aws-2.6.0.jar)
- you can specify these parameters on the hadoop core-site.xml
- fs.s3.awsAccessKeyId
- fs.s3.awsSecretAccessKey
<property> <name>fs.s3n.impl</name> <value>org.apache.hadoop.fs.s3native.NativeS3FileSystem</value> </property>
- for secured access to s3, use S3A connection in your URL,
- add fs.s3a.connection.ssl.enabled to haddop_home/etc/hadoop/core-site.xml
- you also need to set these parameters for s3 access in the hadoop core-site.xml file:
- fs.s3a.secret.key
- fs.s3a.access.key
- Unfortunately, There is no support currently for temporary S3 credentials
Finally, when running presto, we will use the thrift address and port of the hive metastore service.
That's it. No need for additional Hadoop libraries or settings.
Good luck!