Andrew Flint

Your Zeppelin notebook now has a Stop button

Blog Post created by Andrew Flint Advocate on Dec 11, 2017

In Analytics Workbench 1.0.1, we added a custom button to the Zeppelin toolbar that will help you in three ways:

  1. End your notebook session
  2. Restart your notebook session
  3. Test your notebook for common errors

 

Before we jump into how it will help, let me just show you where the button is and what it does. It's at the right side of the toolbar, and looks like a recycle icon:

 

It's a new and obvious way to restart the back-end processes connected to your notebook. Previously, your only means to do the same was to restart or rebind of individual interpreters (which can still be useful if you need the finer control of restarting individual interpreters).

 

When clicked, the button shares a little more detail on what it's about to do, and asks you to confirm:

 

 

So when and where is this useful? I regularly use it in three different situations:

 

  1. Ending your notebook session.
    Suppose you've been working all day on your data science project, and you want to give everything a rest, so you can start fresh again in the morning (or maybe next week). It's a great idea to terminate your notebook session, and release all the processes and memory on the server and Spark cluster. It's a neighborly thing to do, for sure.

  2. Restarting your notebook session.
    If ever you find yourself in a weird state (maybe some interpreters are throwing ConnectException errors, or you've tangled yourself up in knowing which variables you have or haven't set), this simple button will basically reset "everything". Your Python or Scala or R sessions are terminated, all their variables are flushed away, but your code is still here to cleanly recreate them in a fresh new session.

  3. Testing your notebook for common errors.
    Interactive notebooks are great tools, but are easily susceptible to a subtle coding error that will haunt you — or someone else — later. A short story might help:

    Suppose you make a new notebook, and create a data frame using everyone's favorite name, df. As you code deeper and deeper into the data problem, you realize you're going to need df1, df2, df3, and so on. So you go back and rename that first one df1, and try to remember to correct it everywhere, but you miss just one or two places. Because you've already run the paragraph that created it, that original df object is still around, and any code left expecting it will keep running, without any overt errors. But if you ran the notebook from a blank slate, there would be no such df variable, leading potentially to all kinds of hard-to-debug behaviors or errors.

    You can imagine how later, perhaps much later, this problem can have you pulling out your hair in search of a cause. ("Well, it ran fine yesterday, and I didn't change anything!") A good practice is to test your notebook by stopping the session, and then just running all the code again from top-to-bottom, to ensure it's all still self-consistent. It's always a good idea to perform this as a flight-check before sharing your notebook.

A few more words on the side-effects of the button

In all three situations, the button behaves the same: it shuts down any running jobs from your notebook, it relinquishes all the server-side processes and Spark cluster sessions running on behalf of your notebook. And the next time you ask a paragraph to run, fresh new processes will spring back to life, on demand.

 

Rest assured, the button will not delete or reset any textual or graphical output in your notebook, and it won't delete any datasets that you've written back to Analytics Workbench or other persistent storage areas, like S3.

 

It's an unglamorous but useful little button, and I hope you find it helpful. Happy notebooking!

Outcomes