-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Getting Error reading from alluxio://<host>:19998/<path_to_parquet_file> at position xxxxxxx in presto SQL #16833
Comments
Hi @harsh9898 thanks for reporting the issue. The alluxio URI seems weird to me alluxio://:19998/<path_to_parquet_file>. |
HI @jja725 Thank you for your quick reply. Please see my response below. Sure, here is the correct URI: sorry for the confusion
NOTE: Once I removed that file from alluxio storage using --alluxioOnly , error goes away. I am getting that same error. I got that error again and paste the full error trace here: com.facebook.presto.spi.PrestoException: Error reading from alluxio://<host_name>:19998/<path_to_parquet_file> at position xxxxxxxxxx |
@harsh9898 is it happened once or happens regularly? our engineer looks at the issue and find it may not be easy to reproduce, if it happens next time, can we be involved earlier? e.g. before |
It's happening almost every day for multiple files. It's kind of a regular error for us. I am afraid, I won't be able to show it as this is happening in production but I will see what else I can provide apart from the full error log trace. If this happens tomorrow as well then I can provide you a more detailed log for that error. Actually, the query is |
User info For example, we created the hive table called 'sampletable' -> This 'sampletable' reads the data from alluxio's single path/directory - > In this single path, there are multiple files. data in Azure ABFS -> distributedLoad the whole dataset to Alluxio -> run Presto on the whole dataset and fail randomly |
From user: |
May related to #16597 |
From user |
Likely to cause because of the #16597 |
User is willing to tryout the changes in #16597 when it's released |
From @tcrain From @harsh9898 |
Alluxio Version:
2.7.1
Describe the bug
Alluxio version: 2.7.1
Presto version : 0.268
Presto JDBC version: 0.268
Presto coordinator: 1, workers: 4
Single Alluxio master
Data Source: data loaded from Azure ABFS to Alluxio via distributedLoad
UFS: Azure ABFS
alluxio.user.file.metadata.sync.interval=300s
but UFS is not updated for one week but still getting the error.File format: parquet
Tools used for query in presto: DBeaver, presto CLI
All the data files are cached into alluxio and presto is reading from alluxio through hive metadata.
When I am trying run the query with DBeaver, it's giving the following error :
com.facebook.presto.spi.PrestoException: Error reading from alluxio://:19998/<path_to_parquet_file> at position xxxxxxx
To Reproduce
This error is random. It gives an error for some of the files and if we remove it from alluxio using --alluxioOnly option and then if I re-run the query, it does not give any error.
Expected behavior
I expect not to produce this kind of error and it gives the desired results.
Urgency
This is a critical error as I am getting this error almost on a daily basis when it restarts alluxio automatically.
Are you planning to fix it
Not sure how to do it.
Additional context
Important Notes:
The data size for some of the parquet files is large.
This error is a random error when it's trying to read all parquet files stored under Alluxio. It's failing on some of the files randomly on a daily basis
The text was updated successfully, but these errors were encountered: