6:00 - 6:15: Welcome by Ashish Thusoo
6:20 - 6:40: Spark demonstration
6:40 - 6:55: Q&A for Qubole
7:00 - 7:45: Pinball and Pinalytics talks
7:45 - 8:00: Q&A for Pintrest
8:00 - 9:00: Reception & Networking
Pinterest does not have on-site bike parking for guests. Please plan accordingly and bring a lock to lock your bicycle on the street.
Understanding the value of Spark in Big Data analytics, Qubole’s goal is to deliver the power of Spark to both technical and business Hadoop users. Qubole is offering Spark as a Service to help organizations run Spark on AWS. With this Service, we have integrated Spark into our Qubole Data Service (QDS) platform, allowing users to launch and provision Spark clusters and start running queries in minutes. Spark as a Service makes it easy to process and query data stored in Hive, HDFS, HBase and Amazon S3. Any number of data sources can be accessed and their data easily combined with Spark. For example, various SQL, NoSQL, and data sinks can be accessed from one interface, and their data can be combined and loaded into any of them (the latter is in development at the moment). QDS’ query editor and visual query builder give developers and data scientists an easy way to access and use Spark to process data. Spark as a Service also offers lower cost and ease of use. It reduces the cloud-compute cost of running Spark on AWS using self-managed auto-scaling to scale capacity up and down as needed without having to manually reconfigure resources. In this talk, Ashish Thusoo will introduce Qubole's approach and philosophy for Big Data and Praveen Seluka will present Qubole's new Spark-as-a-Service offering and demonstrate its primary use cases through a live demo of QDS.
Pinball - A Hadoop Workflow Management Platform
Almost every data-driven company depends on a workflow management system. After experimenting with a few open source workflow managers we found none of them flexible enough to accommodate the ever changing landscape of the data processing solutions at Pinterest. With that in mind, we took up the challenge of building Pinball, a highly customizable workflow manager to cater to a variety of data processing use cases. We believe that the same flexibility that resulted in Pinball’s adoption by every engineering team at Pinterest offers utility outside of our company. In this talk, Mao Ye will present Pinball internals and details about how we use it.
Pinalytics - Scalable Data Analytics Engine
At Pinterest, we’re always considering how we can deliver data in a meaningful way to the rest of the company. To help employees analyze information quickly and better understand metrics more efficiently, we built Pinalytics, our own customizable platform for big data analytics. Pinalytics is a scalable analytics engine that comes with a simple interface to create analytics reports, supports persisting and automatically updating reports and is powered by a high-performance backend system. In this talk, Chunyan Wang will cover the architecture, metrics computation and some of the cool UI features.