
Engineers from Pinterest & Qubole will share the latest on technologies the companies have been building to enable data driven products and insights on Pinterest, including a demo of Qubole's Spark-as-a-Service offering. If you're curious to learn on how Pinterest uses Hadoop to store and process 10s of billions of pins and petabytes of data every day, you won't want to miss this firsthand look!
"The addition of Apache Spark to QDS makes the platform even more valuable to Pinterest. Qubole empowers us to use the latest Big Data tools at petabyte scale without needing to invest in building out, maintaining and updating our own infrastructure. As a result, we can focus on extracting value from our data using the best technologies for the job, and on driving the business forward."
Krishna Gade
Pinterest Engineering Manager
Speakers

Mao Ye
Data Engineer, Pinterest

Ashish Thusoo
CEO, Qubole

Chunyan Wang
Data Scientist, Pinterest

Praveen Seluka
Engineer, Qubole
Schedule
6:00 - 6:15: Welcome by Ashish Thusoo
6:20 - 6:40: Spark demonstration
6:40 - 6:55: Q&A for Qubole
7:00 - 7:45: Pinball and Pinalytics talks
7:45 - 8:00: Q&A for Pintrest
8:00 - 9:00: Reception & Networking
Pinterest does not have on-site bike parking for guests. Please plan accordingly and bring a lock to lock your bicycle on the street.
Spark-as-a-Service
Understanding the value of Spark in Big Data analytics, Qubole’s goal is to deliver the power of Spark to both technical and business Hadoop users. Qubole is offering Spark as a Service to help organizations run Spark on AWS. With this Service, we have integrated Spark into our Qubole Data Service (QDS) platform, allowing users to launch and provision Spark clusters and start running queries in minutes. Spark as a Service makes it easy to process and query data stored in Hive, HDFS, HBase and Amazon S3. Any number of data sources can be accessed and their data easily combined with Spark. For example, various SQL, NoSQL, and data sinks can be accessed from one interface, and their data can be combined and loaded into any of them (the latter is in development at the moment). QDS’ query editor and visual query builder give developers and data scientists an easy way to access and use Spark to process data. Spark as a Service also offers lower cost and ease of use. It reduces the cloud-compute cost of running Spark on AWS using self-managed auto-scaling to scale capacity up and down as needed without having to manually reconfigure resources. In this talk, Ashish Thusoo will introduce Qubole's approach and philosophy for Big Data and Praveen Seluka will present Qubole's new Spark-as-a-Service offering and demonstrate its primary use cases through a live demo of QDS.
Pinball - A Hadoop Workflow Management Platform
Almost every data-driven company depends on a workflow management system. After experimenting with a few open source workflow managers we found none of them flexible enough to accommodate the ever changing landscape of the data processing solutions at Pinterest. With that in mind, we took up the challenge of building Pinball, a highly customizable workflow manager to cater to a variety of data processing use cases. We believe that the same flexibility that resulted in Pinball’s adoption by every engineering team at Pinterest offers utility outside of our company. In this talk, Mao Ye will present Pinball internals and details about how we use it.
Pinalytics - Scalable Data Analytics Engine
At Pinterest, we’re always considering how we can deliver data in a meaningful way to the rest of the company. To help employees analyze information quickly and better understand metrics more efficiently, we built Pinalytics, our own customizable platform for big data analytics. Pinalytics is a scalable analytics engine that comes with a simple interface to create analytics reports, supports persisting and automatically updating reports and is powered by a high-performance backend system. In this talk, Chunyan Wang will cover the architecture, metrics computation and some of the cool UI features.