Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can’t download the original MS-Celeb-1M dataset? #1

Open
LiuJoffrey opened this issue Jun 7, 2019 · 6 comments
Open

Can’t download the original MS-Celeb-1M dataset? #1

LiuJoffrey opened this issue Jun 7, 2019 · 6 comments
Labels
good first issue Good for newcomers

Comments

@LiuJoffrey
Copy link

Hi, I would like to download the original datasets, but I can’t find the download link from the website. Could you please provide the original dataset file or the download link for me. Please.

Thank you so much

@ruochunjin
Copy link
Owner

It is a pity that we have lost the original image data due to our carelessness in data preserving in the last two years. This cleaned file list here is what we have now.
We actually did not know that Microsoft Research has taken down the original data until we see these issues.

@jjsjunior
Copy link

Hi, I would like to download the original datasets, but I can’t find the download link from the website. Could you please provide the original dataset file or the download link for me. Please.

Thank you so much

Hi Joffrey,
Did you managed to find out a download link for MS-Celeb-1M?
thanks in advance

@youthM
Copy link

youthM commented Oct 14, 2019

Hi, I would like to download the original datasets, but I can’t find the download link from the website. Could you please provide the original dataset file or the download link for me. Please.
Thank you so much

Hi Joffrey,
Did you managed to find out a download link for MS-Celeb-1M?
thanks in advance

Hi , do you find out a download link for MS-Celeb-1M?

@ha1990-12
Copy link

https://academictorrents.com/details/9e67eb7cc23c9417f39778a8e06cca5e26196a97/tech&hit=1&filelist=1

@ruochunjin ruochunjin added the good first issue Good for newcomers label Nov 22, 2019
@ruochunjin ruochunjin pinned this issue Nov 22, 2019
@ibarrond
Copy link

ibarrond commented Apr 5, 2021

Hi, how do you process the tsv you get from this torrent? I'm not sure what each column contains or how to process it.

@ruochunjin ruochunjin unpinned this issue Apr 6, 2021
@ketan-b
Copy link

ketan-b commented May 20, 2021

This should do the task of extracting the images from .TSV

import argparse
import base64
import csv
import os
# import magic # Detect image type from buffer contents (disabled, all are jpg)

parser = argparse.ArgumentParser()
parser.add_argument('--croppedTSV', type=str)
parser.add_argument('--outputDir', type=str, default='raw')
args = parser.parse_args()

with open(args.croppedTSV, 'r') as tsvF:
    reader = csv.reader(tsvF, delimiter='\t')
    i = 0
    for row in reader:
        MID, imgSearchRank, faceID, data = row[0], row[1], row[4], base64.b64decode(row[-1])

        saveDir = os.path.join(args.outputDir, MID)
        savePath = os.path.join(saveDir, "{}-{}.jpg".format(imgSearchRank, faceID))

        # assert(magic.from_buffer(data) == 'JPEG image data, JFIF standard 1.01')

        os.makedirs(saveDir, exist_ok=True)
        with open(savePath, 'wb') as f:
            f.write(data)

        i += 1

        if i % 1000 == 0:
            print("Extracted {} images.".format(i))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

7 participants