As you code up brilliant solutions to data science problems in your notebooks, you might encounter some gotchas, error messages and other forms of turbulence. This guide might help you find quick resolution to some commonly observed problems. Please feel free to add to it!
|Title||Primary Symptom||Meaning / Likely Explanation||Resolution||Language(s)|
|Notebook locked message|
While editing a notebook, a pop-up dialog saying "Notebook is currently locked", interrupts your editing, and returns frequently.
|This has been observed in AW 1.0.1 and we are actively troubleshooting the issue. The system is (perhaps incorrectly) detecting that two different users are attempting to edit a notebook simultaneously, and then trying to prevent conflicts. This may be caused by a lost authorization token, even with just a single user.|
The simplest mechanisms to fix this are:
|java.net.ConnectException||Error message "java.net.ConnectException: Connection refused (Connection refused)", returned almost immediately on execution of a paragraph|
Periodically, the interpreter serving a notebook's paragraph may become unresponsive. Most likely, the interpreter needs to be restarted, which the user may achieve by restarting the interpreters.
Also old: Previously, we have felt this was caused by the interpreter being dead and needing a simple restart, but more recently, that remedy has not worked. Instead, we have had to restart the entire Docker container for Zeppelin.
Old diagnosis: The interpreter needed for the paragraph is dead. (Sometimes other languages will still succeed, but the one you're attempting to run is busted.)
Please see Your Zeppelin notebook now has a Stop button to reset/restart your notebook session.
(In versions older than 1.0.1, the best technique is: How to Rebind Your Notebook's Interpreters.)
Notify email@example.com, and tell them precisely which server is behaving this way, and which interpreter(s) is showing the symptom. Her/his best remedy right now is to "docker restart zeppelin" on the server in question.
|Observed with Scala, Pyspark, Spark R, SQL|
|SQL with large LIMIT breaks query||Error message "Job aborted due to stage failure: ... Reason: Container killed by YARN for exceeding memory limits" after a couple minutes of runtime, on a query of large data that includes a large LIMIT clause.||Use of a large value (e.g., 7m or 10m) in a LIMIT statement in a SQL query can cause an out of memory condition for YARN, generating this error message. This is most likely related to the long-standing Spark defect SPARK-9879: OOM in LIMIT clause with large number.|
First, confirm that the code runs as expected with a small LIMIT number, such as two million (2000000). If that is successful, try removing the LIMIT altogether, rather than attempting a much larger, but still limiting value.
For example, in our original testing, the LIMIT statement causes a failure around 8m records, but succeeds in processing all 74m records when the LIMIT is simply removed.
|Observed with SQL interpreter|
|Spark interpreters broken|
Attempt to run a paragraph puts it state of "PENDING" indefinitely, never actually runs.
After long time, eventually state moves to ABORT.
The interpreter needed for the paragraph may be dead, or may be otherwise occupied. (Sometimes other languages will still succeed, but the one you're attempting to run is busted.)
|Likely needs a DevOps server person to diagnose and resolve. Not much you can do except notify, wait, maybe try another language (just as a troubleshooting step).||Observed with Scala, Pyspark, Spark R, SQL|
|Notebook gone stale||Oddly enough, you can no longer run any paragraph in your notebook. The play button doesn't do anything anymore. Shift-Enter also does nothing.|
You've been writing and running paragraphs happily for awhile, maybe stepped away from your computer (or not), maybe took a quick break (or not), and now, any paragraph is stubbornly refusing to run. This upsets you a little because you've just made some really delicate edits to the code in your current paragraph and are dying to know if you got it right.
(This is a known and annoying but devious defect. We have yet to faithfully recreate, investigate and resolve as of yet.)
First: be sure to copy the code portion of your current paragraph note to the clipboard. (You're going to reload the page, which may cause you to lose any unsaved edits. For safety, select all (Ctrl+A) and copy (Ctrl+C) your current paragraph.)
Now, reload the whole notebook page, once or twice. If you indeed lost some code, thankfully it's on your clipboard. Paste it in and try running again. In my experience, this always works.
|All languages and interpreters|
|%sh times out at 60 sec|
Your bash (%sh) paragraph stops with a SIGTERM error, and ExitValue of 143:
|To protect against too-long-running jobs on the master node, the bash interpreter is deliberately configured to terminate jobs at 60000 milliseconds (1 minute). Any job that takes longer than one minute will fail with this error. (You may observe that your job actually runs quite a bit longer than a minute, that you get some positive side effects from your job, but that any commands attempted to start after 60 seconds will not execute.)||First, try to resist the temptation to run long-running shell jobs, since they run on the master node, cannot be scheduled or managed, and can hurt the performance of the master node. If you must do so, try to break up your long %sh job into smaller chunks, across several paragraphs.||Observed with only the Bash interpreter.|
|JAR using older Spark|
Your attempt to reference a Java or Scala class or method from an imported JAR fails with java.lang.NoSuchMethodError:
Most commonly, this could simply be caused by a typo in your notebook code (i.e., you typed the wrong class name or method name).
If that is not the cause, it's possible that this is due to version incompatibilities with Spark or Scala, between the development environment that built the JAR, and the Analytics Workbench versions of corresponding libraries. For example, we have seen this when uploading JARs built and compiled with Spark 1.6 were uploaded to AW, which itself was running on Spark 2.1.
|First, review the library names and versions included in Analytics Workbench, by running the paragraphs in Tutorial 1: Your First Notebook. Check the versions of Spark, Scala and Java. If needed and if possible, rebuild the JAR using matching versions, and upload the updated JAR.||Observed with Scala|
|Cannot export notebook|
Attempt to Export your notebook fails in the Chrome browser with this error:
("download Failed - Network error")
In our experience, this occurs when the content of the notebook is "too large" to export, often due to large table outputs (such as query results with 1,000 rows and many, many columns). Clearing selected or all output will usually resolve the problem immediately, and enable a successful export. (If you seek a more targeted approach, see Resolution.)
Despite the message, this appears to have little or nothing to do with network or connectivity errors. (Other browsers may report different symptoms.)
Reduce the size of notebook, by one of these techniques, and re-attempt to export:
|Observed with Scala, Pyspark, Spark R, SQL|
|Tabular output too large to display|
Tabular output appears blank, such as:
|Immediately after running, the Table display appears to be blank, but after reloading the Notebook (refresh, or leave page and return), the results will display properly. This is a known defect in Zeppelin 0.7.0, corrected in 0.7.1, and is typically observed only for fairly wide data frames (e.g., more than 60 columns). See https://issues.apache.org/jira/browse/ZEPPELIN-2084 for more details.|
For now, simply reload your notebook page to see the data. A future update of Analytics Workbench will correct this issue.
You may also avoid the issue by controlling the number of rows and columns returned. We observe that the default display of 1000 rows can expose this defect for datasets with 40 or more columns.
|Observed with Scala and Pyspark.|
|Interpreter X not found|
Attempt to run a cell immediately produces error message "paragraph_N*'s Interpreter X not found", while Interpreter Binding prompt has surfaced.
The interpreter exists (and is properly spelled), but is not active for the current notebook. Often, this is because all of the interpreter binding settings are unsaved (and thus inactive) for the notebook. One precursor to this symptom is when the notebook prompts you to review and save the Interpreter Binding (a lengthy pop-down that looks like pic below).
For often unexplained reasons, this prompt comes up because the notebook has lost its settings. If you attempt to run a paragraph without first saving the binding settings, or if your needed interpreter is deactivated, you will see this symptom.
|Save the Interpreter Binding settings (and possibly re-active a specific interpreter that has been shut off), and run the cell again.||Observed with all languages and interpreters|
|Whitespace before prefix||Error message "<console>:26: error: not found: value %"||You probably have whitespace (tab, space, newline, etc.) before the ever-critical "%" indicator. As a result, Zeppelin is not finding the correct language, and defaulting back to its first-class Scala interpreter to run your code. So you're seeing Scala react to a symbol it doesn't recognize: "%".) Likely that the whitespace snuck in there due to a copy-paste from some other location, like an email message, which might have other syntax problems, like "smart" (curly) quotes.||Remove any whitespace before the "%", and then run again. Your interpreter should bind correctly, and you'll be off to the races. (Or at least past this problem.)||All languages and interpreters|
|Prefix not found||Error message "Prefix not found.", returned almost immediately on execution of a paragraph||You attempted to use an interpreter that is not supported, or you simply misspelled the name of the interpreter you really want. (e.g., "%bogus" or "%pyyspark")||Check and fix the spelling of your interpreter name, or recognize that the one you want isn't available.||All languages and interpreters|
|First time use of matplotlib||Running a %pyspark paragraph, you see "|
UserWarning: Matplotlib is building the font cache using fc-list. This may take a moment."
|This message is harmless, and means that the matplotlib libraries are being used for the first time. This occurs only on the first time use of the AW application after instantiation or restart.||Run the paragraph again, and the warning message will not appear.||Python (pyspark)|
|Python interpreter unavailable||Attempt to execute %pyspark paragraph takes unusually long, and eventually returns with error message "pyspark is not responding".||For any of a number of potential reasons, the Python interpreter has gone unavailable. (This has been observed once internally on our #47 beta testing box.)||Only known remedy is to alert the server maintenance team to have them investigate and resolve.||Python (pyspark)|
|Data read failure due to session timeout|
Error message "ValueError: Expecting value: line 1 column 1 (char 0)" when attempting to use aw.data.read() in Pyspark paragraph.
This specific symptom should be obsolete with the release of AW 1.0.1. Instead, users will now see far more informative error messages.
After some time using a notebook, the user's application session may timeout from the FICO Analytic Cloud, and while most of the AW application continues to behave as expected, an attempt to aw.data.read() a data frame will throw a longer exception.
Traceback (most recent call last):
The root cause is that the user is no longer authenticated to access that data, although the error message is cryptic and may differ from this example.
In a separate tab of the same browser window, log in again to https://www.ficoanalyticcloud.com/login-init. Once logged in, reload the AW application, and return to the notebook to run the paragraph again.
The AW product team is working to address 2 defects in a future release: to reduce the likelihood of timeout, and to improve the error message in the event of a timeout.
|Observed with Pyspark|
|matplotlib not properly initiatlized||Error message "_tkinter.TclError: no display name and no $DISPLAY environment variable"|
This should be obsolete in AW now that we have upgrade to Zeppelin 0.7.0.
You've attempted to use plotting routines from matplotlib, but due to Zeppelin 0.6.0's somewhat goofy implementation, your failure (?) to properly initialize the libraries causes this error.
|It's complicated. You need to alter the way you attempt to import matplotlib, use the Agg backen, and you probably also need a restart of the Pyspark interpreter. See this document (notebook): http://10.105.5.58/zeppelin/#/notebook/2CAZN9M6B||Python (pyspark)|