Go Data Infras

Big Data and related stuff. Things that took me hours and days to implement, and that would hopefully take you less.

Tuesday, May 22, 2018

Spark dataframe json schema misinferring - String typed column instead of struct

›
All you wanted is to load some complex json files into a dataframe,  and use sql with [lateral view explode] function to parse the json....
Saturday, January 27, 2018

SolrCloud - suggestions for leaders rebalancing

›
This is a useful and simple bash script that helps rebalancing the cluster leaders, in case of having more than one leader of the same coll...
Friday, December 29, 2017

Apache Zeppelin as a Web Querying Interface for Cassandra

›
Hi! An important thing that you are missing when starting to use Cassandra is a decent web tool for executing CQL queries, and browsing t...
Wednesday, December 20, 2017

Cassandra as a Docker

›
We dockerize our in house micro services, as well as 3rd party tools we use. That includes Apache Solr, Zookeeper, Redis and many more. T...

Apache Solr Cloud Sanity - 3 simple sanity tests

›
Having a clear picture of your services status is a fundamental requirement. Otherwise you are blind to the system.  I am going to pres...
Thursday, May 5, 2016

Spark ClickStream with Talend

›
Ofer Habushi, a friend of mine, that is working as a senior consultant for Talend (an open source Data Integration Software company), wa...
Wednesday, March 16, 2016

Playing with dbpedia (Sematic Web wikipedia)

›
Web 3.0, Semantic web, Dbpedia and IOT are all buzz words that are dealing with the computer's ability to understand the data it has. ...
›
Home
View web version

About Me

moscovig
View my complete profile
Powered by Blogger.