java.lang.RuntimeException: Unsupported literal type class scala.collection.immutable.Set$Set1 Set(liberal.ca) #529

JakeBickUKGWA · 2022-03-29T10:29:52Z

Describe the bug
I am working through the AUT walkthrough at: https://aut.docs.archivesunleashed.org/docs/toolkit-walkthrough. I used the installation instructions at: https://github.com/archivesunleashed/docker-aut#build-and-run. I am using an ubuntu-based EC2 instance.

I can run the first step ok and get the count of domains in your sample material (though it does give a few errors at the beginning).

But if I try the second step to extract text it just gives me this error message:

java.lang.RuntimeException: Unsupported literal type class scala.collection.immutable.Set$Set1 Set(liberal.ca)
at org.apache.spark.sql.catalyst.expressions.Literal$.apply(literals.scala:90)
at org.apache.spark.sql.catalyst.expressions.Literal$.$anonfun$create$2(literals.scala:152)
at scala.util.Failure.getOrElse(Try.scala:222)
at org.apache.spark.sql.catalyst.expressions.Literal$.create(literals.scala:152)
at org.apache.spark.sql.functions$.typedLit(functions.scala:131)
at org.apache.spark.sql.functions$.lit(functions.scala:114)
... 59 elided

I've copied the full terminal content to the attached text file.

To Reproduce
Steps to reproduce the behavior (e.g.):

Start AUT (I use sudo docker run --rm -it -v "/home/jbickford/Desktop/AUTdata:/data" aut)
In paste mode, run

import io.archivesunleashed._
import io.archivesunleashed.udfs._

RecordLoader.loadArchives("/aut-resources/Sample-Data/*.gz", sc)
  .all()
  .keepValidPagesDF()
  .groupBy(extractDomain($"url").alias("domain"))
  .count()
  .sort($"count".desc)
  .show(10, false)

This gives some errors, but as expected generates a table of the top domains in the sample collection
Again in paste mode, run

import io.archivesunleashed._
import io.archivesunleashed.udfs._

val domains = Set("liberal.ca")

RecordLoader.loadArchives("/aut-resources/Sample-Data/*.gz", sc)
  .webpages()
  .select($"crawl_date", extractDomain($"url").alias("domain"), $"url", $"content")
  .filter(hasDomains($"domain", lit(domains)))
  .write.csv("/data/liberal-party-text")

This generates the java.lang.RuntimeException error mentioned above.

Expected behavior
As I understand it AUT should generate a folder called liberal-party-text, containing extracted text files from the sample data.

Screenshots
Attached

Environment information

AUT version: I'm afraid I'm not sure, it's the version in the docker image in the walkthrough
OS: Ubuntu 20.04.4 LTS (in an EC2 instance)
Java version: OpenJDK 64-Bit Server VM, Java 11.0.14.1
Apache Spark version: 3.11
Apache Spark w/aut: sorry, I'm also unsure about this, I'm guessing it's determined by the docker image, but if not let me know
Apache Spark command used to run AUT: sudo docker run --rm -it -v "/home/jbickford/Desktop/AUTdata:/data" aut

AUTissue.txt

The text was updated successfully, but these errors were encountered:

- Address archivesunleashed/aut#529

ruebot · 2022-03-29T13:40:10Z

@JakeBickUKGWA sorry about that, it was a documentation issue. I forgot to update the type used for the variable. It should be Array not Set. The documentation has been updated: https://aut.docs.archivesunleashed.org/docs/toolkit-walkthrough#extracting-some-text

ruebot added a commit to archivesunleashed/aut-docs that referenced this issue Mar 29, 2022

Wrong type used in examples s/Set/Array/.

646a52b

- Address archivesunleashed/aut#529

ruebot closed this as completed Mar 29, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

java.lang.RuntimeException: Unsupported literal type class scala.collection.immutable.Set$Set1 Set(liberal.ca) #529

java.lang.RuntimeException: Unsupported literal type class scala.collection.immutable.Set$Set1 Set(liberal.ca) #529

JakeBickUKGWA commented Mar 29, 2022

ruebot commented Mar 29, 2022

java.lang.RuntimeException: Unsupported literal type class scala.collection.immutable.Set$Set1 Set(liberal.ca) #529

java.lang.RuntimeException: Unsupported literal type class scala.collection.immutable.Set$Set1 Set(liberal.ca) #529

Comments

JakeBickUKGWA commented Mar 29, 2022

ruebot commented Mar 29, 2022