Troubleshooting Guide for Notebook Users

Document created by Andrew Flint Advocate on Feb 10, 2017Last modified by Andrew Flint Advocate on Dec 11, 2017
Version 23Show Document
  • View in full screen mode

As you code up brilliant solutions to data science problems in your notebooks, you might encounter some gotchas, error messages and other forms of turbulence. This guide might help you find quick resolution to some commonly observed problems. Please feel free to add to it!

 

 

 

TitlePrimary SymptomMeaning / Likely ExplanationResolutionLanguage(s)
Notebook locked message

While editing a notebook, a pop-up dialog saying "Notebook is currently locked", interrupts your editing, and returns frequently.

 

This has been observed in AW 1.0.1 and we are actively troubleshooting the issue. The system is (perhaps incorrectly) detecting that two different users are attempting to edit a notebook simultaneously, and then trying to prevent conflicts. This may be caused by a lost authorization token, even with just a single user.

The simplest mechanisms to fix this are:

  1. Exit the notebook editor
  2. Copy (duplicate) the notebook
  3. Open and resume work on the new copy
  4. Optional: consider deleting the original

OR

  1. Log out of the AW application
  2. Clear your browser cache
  3. Log back in to AW
  4. Resume editing the notebook.

N/A

java.net.ConnectExceptionError message "java.net.ConnectException: Connection refused (Connection refused)", returned almost immediately on execution of a paragraph

Periodically, the interpreter serving a notebook's paragraph may become unresponsive. Most likely, the interpreter needs to be restarted, which the user may achieve by restarting the interpreters.

 

Also old: Previously, we have felt this was caused by the interpreter being dead and needing a simple restart, but more recently, that remedy has not worked. Instead, we have had to restart the entire Docker container for Zeppelin.

Old diagnosis: The interpreter needed for the paragraph is dead. (Sometimes other languages will still succeed, but the one you're attempting to run is busted.)

Please see Your Zeppelin notebook now has a Stop button to reset/restart your notebook session.

 

(In versions older than 1.0.1, the best technique is: How to Rebind Your Notebook's Interpreters.)

 

Notify support@fico.com, and tell them precisely which server is behaving this way, and which interpreter(s) is showing the symptom. Her/his best remedy right now is to "docker restart zeppelin" on the server in question.

Observed with Scala, Pyspark, Spark R, SQL
Spark interpreters broken

Attempt to run a paragraph puts it state of "PENDING" indefinitely, never actually runs.

After long time, eventually state moves to ABORT.

The interpreter needed for the paragraph may be dead, or may be otherwise occupied. (Sometimes other languages will still succeed, but the one you're attempting to run is busted.)

Likely needs a DevOps server person to diagnose and resolve. Not much you can do except notify, wait, maybe try another language (just as a troubleshooting step).Observed with Scala, Pyspark, Spark R, SQL
Notebook gone staleOddly enough, you can no longer run any paragraph in your notebook. The play button doesn't do anything anymore. Shift-Enter also does nothing.

You've been writing and running paragraphs happily for awhile, maybe stepped away from your computer (or not), maybe took a quick break (or not), and now, any paragraph is stubbornly refusing to run. This upsets you a little because you've just made some really delicate edits to the code in your current paragraph and are dying to know if you got it right.

 

(This is a known and annoying but devious defect. We have yet to faithfully recreate, investigate and resolve as of yet.)

First: be sure to copy the code portion of your current paragraph note to the clipboard. (You're going to reload the page, which may cause you to lose any unsaved edits. For safety, select all (Ctrl+A) and copy (Ctrl+C) your current paragraph.)

 

Now, reload the whole notebook page, once or twice. If you indeed lost some code, thankfully it's on your clipboard. Paste it in and try running again. In my experience, this always works.

All languages and interpreters
%sh times out at 60 sec

Your bash (%sh) paragraph stops with a SIGTERM error, and ExitValue of 143:

To protect against too-long-running jobs on the master node, the bash interpreter is deliberately configured to terminate jobs at 60000 milliseconds (1 minute). Any job that takes longer than one minute will fail with this error. (You may observe that your job actually runs quite a bit longer than a minute, that you get some positive side effects from your job, but that any commands attempted to start after 60 seconds will not execute.)First, try to resist the temptation to run long-running shell jobs, since they run on the master node, cannot be scheduled or managed, and can hurt the performance of the master node. If you must do so, try to break up your long %sh job into smaller chunks, across several paragraphs.Observed with only the Bash interpreter.
JAR using older Spark

Your attempt to reference a Java or Scala class or method from an imported JAR fails with java.lang.NoSuchMethodError:

 

Most commonly, this could simply be caused by a typo in your notebook code (i.e., you typed the wrong class name or method name).

 

If that is not the cause, it's possible that this is due to version incompatibilities with Spark or Scala, between the development environment that built the JAR, and the Analytics Workbench versions of corresponding libraries. For example, we have seen this when uploading JARs built and compiled with Spark 1.6 were uploaded to AW, which itself was running on Spark 2.1.

First, review the library names and versions included in Analytics Workbench, by running the paragraphs in Tutorial 1: Your First Notebook. Check the versions of Spark, Scala and Java. If needed and if possible, rebuild the JAR using matching versions, and upload the updated JAR.Observed with Scala
Cannot export notebook

Attempt to Export your notebook fails in the Chrome browser with this error:

("download Failed - Network error")

In our experience, this occurs when the content of the notebook is "too large" to export, often due to large table outputs (such as query results with 1,000 rows and many, many columns). Clearing selected or all output will usually resolve the problem immediately, and enable a successful export. (If you seek a more targeted approach, see Resolution.)

 

Despite the message, this appears to have little or nothing to do with network or connectivity errors. (Other browsers may report different symptoms.)

Reduce the size of notebook, by one of these techniques, and re-attempt to export:

  1. Clear all notebook output (a sledgehammer)
  2. Clear the output of selected cells with large output (especially query results)
  3. Limit the output of large cells such as queries. In a %sql cell, trim down the number of columns (in select statement) or the number of rows (e.g., limit 20).
Observed with Scala, Pyspark, Spark R, SQL
Tabular output too large to display

Tabular output appears blank, such as:

Immediately after running, the Table display appears to be blank, but after reloading the Notebook (refresh, or leave page and return), the results will display properly. This is a known defect in Zeppelin 0.7.0, corrected in 0.7.1, and is typically observed only for fairly wide data frames (e.g., more than 60 columns). See https://issues.apache.org/jira/browse/ZEPPELIN-2084 for more details.

For now, simply reload your notebook page to see the data. A future update of Analytics Workbench will correct this issue.

 

You may also avoid the issue by controlling the number of rows and columns returned. We observe that the default display of 1000 rows can expose this defect for datasets with 40 or more columns.

Observed with Scala and Pyspark.
Interpreter X not found

Attempt to run a cell immediately produces error message "paragraph_N*'s Interpreter X not found", while Interpreter Binding prompt has surfaced.

The interpreter exists (and is properly spelled), but is not active for the current notebook. Often, this is because all of the interpreter binding settings are unsaved (and thus inactive) for the notebook. One precursor to this symptom is when the notebook prompts you to review and save the Interpreter Binding (a lengthy pop-down that looks like pic below).

For often unexplained reasons, this prompt comes up because the notebook has lost its settings. If you attempt to run a paragraph without first saving the binding settings, or if your needed interpreter is deactivated, you will see this symptom.

Save the Interpreter Binding settings (and possibly re-active a specific interpreter that has been shut off), and run the cell again.Observed with all languages and interpreters
Whitespace before prefixError message "<console>:26: error: not found: value %"You probably have whitespace (tab, space, newline, etc.) before the ever-critical "%" indicator. As a result, Zeppelin is not finding the correct language, and defaulting back to its first-class Scala interpreter to run your code. So you're seeing Scala react to a symbol it doesn't recognize: "%".) Likely that the whitespace snuck in there due to a copy-paste from some other location, like an email message, which might have other syntax problems, like "smart" (curly) quotes.Remove any whitespace before the "%", and then run again. Your interpreter should bind correctly, and you'll be off to the races. (Or at least past this problem.)All languages and interpreters
Prefix not foundError message "Prefix not found.", returned almost immediately on execution of a paragraphYou attempted to use an interpreter that is not supported, or you simply misspelled the name of the interpreter you really want. (e.g., "%bogus" or "%pyyspark")Check and fix the spelling of your interpreter name, or recognize that the one you want isn't available.All languages and interpreters
First time use of matplotlibRunning a %pyspark paragraph, you see "

UserWarning: Matplotlib is building the font cache using fc-list. This may take a moment."

This message is harmless, and means that the matplotlib libraries are being used for the first time. This occurs only on the first time use of the AW application after instantiation or restart.Run the paragraph again, and the warning message will not appear.Python (pyspark)
Python interpreter unavailableAttempt to execute %pyspark paragraph takes unusually long, and eventually returns with error message "pyspark is not responding".For any of a number of potential reasons, the Python interpreter has gone unavailable. (This has been observed once internally on our #47 beta testing box.)Only known remedy is to alert the server maintenance team to have them investigate and resolve.Python (pyspark)
Data read failure due to session timeout

Error message "ValueError: Expecting value: line 1 column 1 (char 0)" when attempting to use aw.data.read() in Pyspark paragraph.

 

This specific symptom should be obsolete with the release of AW 1.0.1. Instead, users will now see far more informative error messages.

After some time using a notebook, the user's application session may timeout from the FICO Analytic Cloud, and while most of the AW application continues to behave as expected, an attempt to aw.data.read() a data frame will throw a longer exception.

 

Traceback (most recent call last):
  File "/tmp/zeppelin_pyspark-___.py", line 341, in <module> exec(code)
  File "<stdin>", line 8, in <module>
  File "/usr/lib/zeppelin/python-scripts/aw/data/__init__.py", line 77, in read
    filepath = _get_dataset_url(name)
  File "/usr/lib/zeppelin/python-scripts/aw/data/__init__.py", line 34, in _get_dataset_url
    if 'url' in response.json():
  File "/usr/local/lib/python3.4/site-packages/requests/models.py", line 886, in json
    return complexjson.loads(self.text, **kwargs)
  File "/usr/lib64/python3.4/json/__init__.py", line 318, in loads
    return _default_decoder.decode(s)
  File "/usr/lib64/python3.4/json/decoder.py", line 343, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/lib64/python3.4/json/decoder.py", line 361, in raw_decode
    raise ValueError(errmsg("Expecting value", s, err.value)) from None
ValueError: Expecting value: line 1 column 1 (char 0)

 

The root cause is that the user is no longer authenticated to access that data, although the error message is cryptic and may differ from this example.

In a separate tab of the same browser window, log in again to https://www.ficoanalyticcloud.com/login-init. Once logged in, reload the AW application, and return to the notebook to run the paragraph again.

 

The AW product team is working to address 2 defects in a future release: to reduce the likelihood of timeout, and to improve the error message in the event of a timeout.

Observed with Pyspark
matplotlib not properly initiatlizedError message "_tkinter.TclError: no display name and no $DISPLAY environment variable"

This should be obsolete in AW now that we have upgrade to Zeppelin 0.7.0.

 

You've attempted to use plotting routines from matplotlib, but due to Zeppelin 0.6.0's somewhat goofy implementation, your failure (?) to properly initialize the libraries causes this error.

It's complicated. You need to alter the way you attempt to import matplotlib, use the Agg backen, and you probably also need a restart of the Pyspark interpreter. See this document (notebook): http://10.105.5.58/zeppelin/#/notebook/2CAZN9M6BPython (pyspark)
1 person found this helpful

Attachments

    Outcomes