Error to fit the Pipeline #1235

ronit450 · 2024-01-25T09:07:29Z

ronit450
Jan 25, 2024

I have made a pipeline which takes the sindhi poetry and the label, generate the embeddings and then finally perform the classifier dl available in the pyspark. The code is :
documentAssembler = DocumentAssembler()
.setInputCol("Couplet")
.setOutputCol("document")

tokenizer = Tokenizer()
.setInputCols(["document"])
.setOutputCol("token")

normalizer = Normalizer()
.setInputCols(["token"])
.setOutputCol("normalized")

stopwords_cleaner = StopWordsCleaner()
.setInputCols("normalized")
.setOutputCol("cleanTokens")
.setCaseSensitive(False)

Use WordEmbeddings instead of WordEmbeddingsModel

word_embeddings = WordEmbeddingsModel.pretrained("w2v_cc_300d","sd")
.setInputCols(["document", "cleanTokens"])
.setOutputCol("embeddings")

Use SentenceEmbeddings for obtaining sentence embeddings

sentence_embeddings = SentenceEmbeddings()
.setInputCols(["document", "embeddings"])
.setOutputCol("sentence_embeddings")
.setPoolingStrategy("AVERAGE")

embeddings_finisher = EmbeddingsFinisher() \

.setInputCols(["sentence_embeddings"]) \

.setOutputCols(["finished_sentence_embeddings"]) \

.setOutputAsVector(True)\

.setCleanAnnotations(False)

classsifierdl = ClassifierDLApproach()
.setInputCols(["sentence_embeddings"])
.setOutputCol("class")
.setEnableOutputLogs(True)
.setLabelColumn("Poet")
.setMaxEpochs(20)
.setBatchSize(32)

sindhi_pip = Pipeline(stages=[documentAssembler, tokenizer, normalizer, stopwords_cleaner, word_embeddings, sentence_embeddings,
classsifierdl])

and the error is :

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error to fit the Pipeline #1235

{{title}}

Replies: 0 comments

Select a reply

Error to fit the Pipeline #1235

ronit450 Jan 25, 2024

Use WordEmbeddings instead of WordEmbeddingsModel

Use SentenceEmbeddings for obtaining sentence embeddings

embeddings_finisher = EmbeddingsFinisher() \

.setInputCols(["sentence_embeddings"]) \

.setOutputCols(["finished_sentence_embeddings"]) \

.setOutputAsVector(True)\

.setCleanAnnotations(False)

Replies: 0 comments

ronit450
Jan 25, 2024