Seven Tutorial

File uploaded by Andrew Flint Advocate on Jun 21, 2017Last modified by Andrew Flint Advocate on Dec 4, 2017
Version 5Show Document
  • View in full screen mode

Here is a bundle of 7 tutorials to introduce you to the key concepts of using notebooks in Analytics Workbench. Several have been updated and extended to incorporate new features of AW 2.0.



After you download and unzip the file, use the Import button to upload each of the .json files into Analytics Workbench. For more help installing sample notebooks, please read How to Use Sample Notebooks from the Community.


As of this version, the zip contains:

  1. Tutorial #1: Your First Notebook — a very simple introduction to using notebooks, including extensive pointers to key resources
  2. Tutorial #2: Basic Data Access — demonstrates how to access data already loaded in AW using Python, R, SQL and Scala
  3. Tutorial #3: SQL Queries and Visualizations — a Scala and SQL notebook that first retrieves a file from S3, and then makes it available for a series of interactive SQL queries and visualization
  4. Tutorial #4: Construct Flight_Delays Dataset — a Python and SQL notebook that retrieves (and samples) Flight Delay data from an AWS S3 bucket, stores into AW as a dataset, and queries the results
  5. Tutorial #5: Machine Learning on HELOC Data — a fairly thorough example of machine learning techniques on the credit risk modeling dataset called HELOC, almost entirely in Python
  6. Tutorial #6: ML & SQL on AW Datasets — a shorter introduction to Random Forest modeling in Spark ML, using the Flight_Delays dataset, plus some SQL-based data manipulations and queries
  7. Tutorial #7: Extend the Environment — demonstrates how to add user-defined libraries to the AW notebook environment, using the embedded JAR file as an example


Tutorial notebooks #2 and #5 rely on a HELOC dataset that is also included in this .zip archive, and the remaining notebooks draw their data from the internet. After you have downloaded and unzipped the .zip archive attached below, find the file "HELOC_with_scores_trees.csv.bz2" (it's a bzipped CSV file), and upload it as-is in Data > New Dataset. (This step is necessary only if you wish to run notebooks #2 and #5.)


In tutorial #7, we use a sample JAR file ("customTransform-2.2.0-SNAPSHOT.jar"), which is also included in this .zip bundle.


As always, if you have samples of your own that you'd like to share, by all means, share them!