Troubleshooting Guide for Notebook Users

Document created by Andrew Flint on Feb 10, 2017Last modified by Andrew Flint on Jul 20, 2017
Version 19Show Document
  • View in full screen mode

As you code up brilliant solutions to data science problems in your notebooks, you might encounter some gotchas, error messages and other forms of turbulence. This guide might help you find quick resolution to some commonly observed problems. Please feel free to add to it!

 

 

TitlePrimary SymptomMeaning / Likely ExplanationResolutionLanguage(s)
Data read failure due to session timeoutError message "ValueError: Expecting value: line 1 column 1 (char 0)" when attempting to use aw.data.read() in Pyspark paragraph.

After some time using a notebook, the user's application session may timeout from the FICO Analytic Cloud, and while most of the AW application continues to behave as expected, an attempt to aw.data.read() a data frame will throw a longer exception.

 

Traceback (most recent call last):
  File "/tmp/zeppelin_pyspark-___.py", line 341, in <module> exec(code)
  File "<stdin>", line 8, in <module>
  File "/usr/lib/zeppelin/python-scripts/aw/data/__init__.py", line 77, in read
    filepath = _get_dataset_url(name)
  File "/usr/lib/zeppelin/python-scripts/aw/data/__init__.py", line 34, in _get_dataset_url
    if 'url' in response.json():
  File "/usr/local/lib/python3.4/site-packages/requests/models.py", line 886, in json
    return complexjson.loads(self.text, **kwargs)
  File "/usr/lib64/python3.4/json/__init__.py", line 318, in loads
    return _default_decoder.decode(s)
  File "/usr/lib64/python3.4/json/decoder.py", line 343, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/lib64/python3.4/json/decoder.py", line 361, in raw_decode
    raise ValueError(errmsg("Expecting value", s, err.value)) from None
ValueError: Expecting value: line 1 column 1 (char 0)

 

The root cause is that the user is no longer authenticated to access that data, although the error message is cryptic and may differ from this example.

In a separate tab of the same browser window, log in again to https://www.ficoanalyticcloud.com/login-init. Once logged in, reload the AW application, and return to the notebook to run the paragraph again.

 

The AW product team is working to address 2 defects in a future release: to reduce the likelihood of timeout, and to improve the error message in the event of a timeout.

Observed with Pyspark
java.net.ConnectExceptionError message "java.net.ConnectException: Connection refused (Connection refused)", returned almost immediately on execution of a paragraph

Periodically, the interpreter serving a notebook's paragraph may become unresponsive. Most likely, the interpreter needs to be restarted, which the user may achieve by "rebinding" (or restarting) the interpreters.

Also old: Previously, we have felt this was caused by the interpreter being dead and needing a simple restart, but more recently, that remedy has not worked. Instead, we have had to restart the entire Docker container for Zeppelin.

Old diagnosis: The interpreter needed for the paragraph is dead. (Sometimes other languages will still succeed, but the one you're attempting to run is busted.)

Please see How to Rebind Your Notebook's Interpreters.

 

Notify support@fico.com, and tell them precisely which server is behaving this way, and which interpreter(s) is showing the symptom. Her/his best remedy right now is to "docker restart zeppelin" on the server in question.

Observed with Scala, Pyspark, Spark R, SQL
Spark interpreters broken

Attempt to run a paragraph puts it state of "PENDING" indefinitely, never actually runs.

After long time, eventually state moves to ABORT.

The interpreter needed for the paragraph may be dead, or may be otherwise occupied. (Sometimes other languages will still succeed, but the one you're attempting to run is busted.)

Likely needs a DevOps server person to diagnose and resolve. Not much you can do except notify, wait, maybe try another language (just as a troubleshooting step).Observed with Scala, Pyspark, Spark R, SQL
Notebook gone staleOddly enough, you can no longer run any paragraph in your notebook. The play button doesn't do anything anymore. Shift-Enter also does nothing.

You've been writing and running paragraphs happily for awhile, maybe stepped away from your computer (or not), maybe took a quick break (or not), and now, any paragraph is stubbornly refusing to run. This upsets you a little because you've just made some really delicate edits to the code in your current paragraph and are dying to know if you got it right.

 

(This is a known and annoying but devious defect. We have yet to faithfully recreate, investigate and resolve as of yet.)

First: be sure to copy the code portion of your current paragraph note to the clipboard. (You're going to reload the page, which may cause you to lose any unsaved edits. For safety, select all (Ctrl+A) and copy (Ctrl+C) your current paragraph.)

 

Now, reload the whole notebook page, once or twice. If you indeed lost some code, thankfully it's on your clipboard. Paste it in and try running again. In my experience, this always works.

All languages and interpreters
%sh times out at 60 sec

Your bash (%sh) paragraph stops with a SIGTERM error, and ExitValue of 143:

To protect against too-long-running jobs on the master node, the bash interpreter is deliberately configured to terminate jobs at 60000 milliseconds (1 minute). Any job that takes longer than one minute will fail with this error. (You may observe that your job actually runs quite a bit longer than a minute, that you get some positive side effects from your job, but that any commands attempted to start after 60 seconds will not execute.)First, try to resist the temptation to run long-running shell jobs, since they run on the master node, cannot be scheduled or managed, and can hurt the performance of the master node. If you must do so, try to break up your long %sh job into smaller chunks, across several paragraphs.Observed with only the Bash interpreter.
Cannot export notebook

Attempt to Export your notebook fails in the Chrome browser with this error:

("download Failed - Network error")

In our experience, this occurs when the content of the notebook is "too large" to export, often due to large table outputs (such as query results with 1,000 rows and many, many columns). Clearing selected or all output will usually resolve the problem immediately, and enable a successful export. (If you seek a more targeted approach, see Resolution.)

 

Despite the message, this appears to have little or nothing to do with network or connectivity errors. (Other browsers may report different symptoms.)

Reduce the size of notebook, by one of these techniques, and re-attempt to export:

  1. Clear all notebook output (a sledgehammer)
  2. Clear the output of selected cells with large output (especially query results)
  3. Limit the output of large cells such as queries. In a %sql cell, trim down the number of columns (in select statement) or the number of rows (e.g., limit 20).
Observed with Scala, Pyspark, Spark R, SQL
Tabular output too large to display

Tabular output appears blank, such as:

Immediately after running, the Table display appears to be blank, but after reloading the Notebook (refresh, or leave page and return), the results will display properly. This is a known defect in Zeppelin 0.7.0, corrected in 0.7.1, and is typically observed only for fairly wide data frames (e.g., more than 60 columns). See https://issues.apache.org/jira/browse/ZEPPELIN-2084 for more details.

For now, simply reload your notebook page to see the data. A future update of Analytics Workbench will correct this issue.

 

You may also avoid the issue by controlling the number of rows and columns returned. We observe that the default display of 1000 rows can expose this defect for datasets with 40 or more columns.

Observed with Scala and Pyspark.
Interpreter X not found

Attempt to run a cell immediately produces error message "paragraph_N*'s Interpreter X not found", while Interpreter Binding prompt has surfaced.

The interpreter exists (and is properly spelled), but is not active for the current notebook. Often, this is because all of the interpreter binding settings are unsaved (and thus inactive) for the notebook. One precursor to this symptom is when the notebook prompts you to review and save the Interpreter Binding (a lengthy pop-down that looks like pic below).

For often unexplained reasons, this prompt comes up because the notebook has lost its settings. If you attempt to run a paragraph without first saving the binding settings, or if your needed interpreter is deactivated, you will see this symptom.

Save the Interpreter Binding settings (and possibly re-active a specific interpreter that has been shut off), and run the cell again.Observed with all languages and interpreters
Whitespace before prefixError message "<console>:26: error: not found: value %"You probably have whitespace (tab, space, newline, etc.) before the ever-critical "%" indicator. As a result, Zeppelin is not finding the correct language, and defaulting back to its first-class Scala interpreter to run your code. So you're seeing Scala react to a symbol it doesn't recognize: "%".) Likely that the whitespace snuck in there due to a copy-paste from some other location, like an email message, which might have other syntax problems, like "smart" (curly) quotes.Remove any whitespace before the "%", and then run again. Your interpreter should bind correctly, and you'll be off to the races. (Or at least past this problem.)All languages and interpreters
Prefix not foundError message "Prefix not found.", returned almost immediately on execution of a paragraphYou attempted to use an interpreter that is not supported, or you simply misspelled the name of the interpreter you really want. (e.g., "%bogus" or "%pyyspark")Check and fix the spelling of your interpreter name, or recognize that the one you want isn't available.All languages and interpreters
First time use of matplotlibRunning a %pyspark paragraph, you see "

UserWarning: Matplotlib is building the font cache using fc-list. This may take a moment."

This message is harmless, and means that the matplotlib libraries are being used for the first time. This occurs only on the first time use of the AW application after instantiation or restart.Run the paragraph again, and the warning message will not appear.Python (pyspark)
Python interpreter unavailableAttempt to execute %pyspark paragraph takes unusually long, and eventually returns with error message "pyspark is not responding".For any of a number of potential reasons, the Python interpreter has gone unavailable. (This has been observed once internally on our #47 beta testing box.)Only known remedy is to alert the server maintenance team to have them investigate and resolve.Python (pyspark)
matplotlib not properly initiatlizedError message "_tkinter.TclError: no display name and no $DISPLAY environment variable"

This should be obsolete in AW now that we have upgrade to Zeppelin 0.7.0.

 

You've attempted to use plotting routines from matplotlib, but due to Zeppelin 0.6.0's somewhat goofy implementation, your failure (?) to properly initialize the libraries causes this error.

It's complicated. You need to alter the way you attempt to import matplotlib, use the Agg backen, and you probably also need a restart of the Pyspark interpreter. See this document (notebook): http://10.105.5.58/zeppelin/#/notebook/2CAZN9M6BPython (pyspark)
1 person found this helpful

Attachments

    Outcomes