Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(services/gdrive): List shows modified timestamp gdrive #5226

Merged
merged 2 commits into from
Nov 21, 2024

Conversation

erickguan
Copy link
Contributor

@erickguan erickguan commented Oct 22, 2024

Which issue does this PR close?

Part of #4746.

Are there any user-facing changes?

None.

@erickguan erickguan force-pushed the modified-timestamp-gdrive branch 3 times, most recently from 77cf29b to 0fde283 Compare October 22, 2024 18:48
@erickguan erickguan marked this pull request as ready for review October 22, 2024 19:14
@erickguan erickguan requested a review from Xuanwo as a code owner October 22, 2024 19:14
@erickguan
Copy link
Contributor Author

erickguan commented Oct 22, 2024

I did some experimentation (patch) with async support for listing operations, which helps reduce runtime as expected. However, extending OpList introduces a "breaking" change. Since OpenDAL has a compatibility package, I’m happy to coordinate with you when you're preparing for a breaking release, to minimize the amount of work on both sides. There's no urgency regarding async support, though.

A few other observations from running the behavior tests (not related to the PR):

  1. The behavior test fails on my local machine but passes in CI. I’m investigating the cause casually.
  2. The UUID random generation in behavior tests creates multiple levels of UUIDs. While this results in an extremely low chance of collision in the CI for OpenDAL's test accounts, I’d like to reduce the number of levels when running against my environment too.

// Return self at the first page.
if ctx.token.is_empty() && !ctx.done {
let path = build_rel_path(&self.core.root, &self.path);
let e = oio::Entry::new(&path, Metadata::new(EntryMode::DIR));
let mut metadata = Metadata::new(EntryMode::DIR);
if stat_file_metadata {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, sorry for not making the Metakey behavior clearer. (I'm working on this.)

Metakey represents a best effort hint and is not processed server-side. The service merely needs to supply the most comprehensive metadata available during listing.

The Operator will call stat as required, for example:

fn poll_next(mut self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll<Option<Self::Item>> {
// Returns `None` if we have errored.
if self.errored {
return Poll::Ready(None);
}
// Trying to pull more tasks if there are more space.
if self.tasks.has_remaining() {
// Building future if we have a lister available.
if let Some(mut lister) = self.lister.take() {
let fut = async move {
let res = lister.next_dyn().await;
(lister, res)
};
self.fut = Some(Box::pin(fut));
}
if let Some(fut) = self.fut.as_mut() {
if let Poll::Ready((lister, entry)) = fut.as_mut().poll(cx) {
self.lister = Some(lister);
self.fut = None;
match entry {
Ok(Some(oe)) => {
let (path, metadata) = oe.into_entry().into_parts();
if metadata.contains_metakey(self.required_metakey) {
self.tasks
.push_back(StatTask::Known(Some((path, metadata))));
} else {
let acc = self.acc.clone();
let fut = async move {
let res = acc.stat(&path, OpStat::default()).await;
(path, res.map(|rp| rp.into_metadata()))
};
self.tasks.push_back(StatTask::Stating(Box::pin(fut)));
}
}
Ok(None) => {
self.lister = None;
}
Err(err) => {
self.errored = true;
return Poll::Ready(Some(Err(err)));
}
}
}
}
}
// Try to poll tasks
if let Some((path, rp)) = ready!(self.tasks.poll_next_unpin(cx)) {
let metadata = rp?;
return Poll::Ready(Some(Ok(Entry::new(path, metadata))));
}
if self.lister.is_some() || self.fut.is_some() {
Poll::Pending
} else {
Poll::Ready(None)
}
}
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review. I get a weird Metadata instance problem, that Gdrivebackend::stat returns metadata with a timestamp but Lister::poll_next gets a result of Metadata without the timestamp.

I will debug it a bit.

@erickguan
Copy link
Contributor Author

erickguan commented Oct 27, 2024

@Xuanwo Thanks for the pointer. I've figured out how the PageList and Google Drive backend work together, but I’m currently stuck on a failing test with MinIO in the complete layer. Am I on the right track in continuing this PR?

From what I understand, the `CompleteLayer`` depends on the behavior of different services. Since you mentioned that work on metakey is underway, I’m trying not to alter OpenDAL’s behavior. Unfortunately, adding the new stat behavior in this layer breaks the MinIO service.

@erickguan erickguan requested a review from Xuanwo October 27, 2024 19:14
@@ -174,7 +174,12 @@ impl<A: Access> CompleteAccessor<A> {
}

if path == "/" {
return Ok(RpStat::new(Metadata::new(EntryMode::DIR)));
let meta = if capability.stat {
self.inner.stat(path, args).await?.into_metadata()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, it's intended that we don't send stat for /. The service should always make sure / exists.

let normalized_path = build_rel_path(root, &path);

let mut meta = Metadata::new(file_type);
// Resetting the metakey is necessary when mode is `EntryMode::DIR`,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The underlying cause might be here:

// If mode is dir, we should always mark it as complete.
if mode.is_dir() {
metakey |= Metakey::Complete
}

It seems reasonable to remove it.

@Xuanwo
Copy link
Member

Xuanwo commented Nov 6, 2024

Also related to #5287

@erickguan
Copy link
Contributor Author

Thanks for tentative checking in with metakey!

@erickguan
Copy link
Contributor Author

After a closer look, can I pause this PR until your metakey work?

@Xuanwo
Copy link
Member

Xuanwo commented Nov 7, 2024

After a closer look, can I pause this PR until your metakey work?

Yes, after the metakey is refactored, our work here will be much cleaner.

@Xuanwo
Copy link
Member

Xuanwo commented Nov 14, 2024

Yes, after the metakey is refactored, our work here will be much cleaner.

Hi, you can continue your work now.

@erickguan
Copy link
Contributor Author

Thanks for the ping! I'll pick it up and ping you when it's ready.

@Xuanwo
Copy link
Member

Xuanwo commented Nov 14, 2024

Thanks for the ping! I'll pick it up and ping you when it's ready.

Thank you for your patience. Looking forward to your PR.

@erickguan erickguan force-pushed the modified-timestamp-gdrive branch 3 times, most recently from f22ee2b to 604916d Compare November 19, 2024 16:20
Copy link
Member

@Xuanwo Xuanwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, @erickguan, this PR looks mostly good to me now!

@@ -177,8 +177,19 @@ impl<A: Access> CompleteAccessor<A> {
return Ok(RpStat::new(Metadata::new(EntryMode::DIR)));
}

// Forward to inner if create_dir is supported.
if path.ends_with('/') && capability.create_dir {
if path.ends_with('/')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, let's keep this file unchanged. The create_dir capability check is related to special handling for object storage. I will separate this part to make it clearer.

Copy link
Contributor Author

@erickguan erickguan Nov 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is the problem. The Google Drive service supports create_dir, so OpenDAL will remove the metadata from the Google Drive service without this change.

        // Forward to inner if create_dir is supported.
        if path.ends_with('/') && capability.create_dir {
            let meta = self.inner.stat(path, args).await?.into_metadata();

            if meta.is_file() {
                return Err(Error::new(
                    ErrorKind::NotFound,
                    "stat expected a directory, but found a file",
                ));
            }

            return Ok(RpStat::new(Metadata::new(EntryMode::DIR)));
        }

I read the RFC 3243, and I am skeptical of the change in this PR. I am not familiar with the context so I have a few ideas in mind:

  1. Add a comment explaining the hack to clean up later.
  2. Maybe extend the complete layer to respond meta without creating a new one.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see. Let's return the metadata we fetched from storage directly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. Happy to defer this problem to you!

@erickguan erickguan force-pushed the modified-timestamp-gdrive branch from 222ddd5 to 44823a2 Compare November 20, 2024 18:50
@erickguan erickguan requested a review from Xuanwo November 20, 2024 19:07
Copy link
Member

@Xuanwo Xuanwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @erickguan for this PR and your patience!

@Xuanwo Xuanwo merged commit e5a4104 into apache:main Nov 21, 2024
239 checks passed
@erickguan erickguan deleted the modified-timestamp-gdrive branch November 21, 2024 13:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants