You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi,
I used the -C option to compare two eigenstrat databases and found a few duplicates.
So, I executed the following to remove the dublicates from one of the databases:
eigenstrat_database_tools.py -g v54.1_1240K_public_Olalde2019.geno -s v54.1_1240K_public_Olalde2019.snp -i v54.1_1240K_public_Olalde2019.ind -o v54.1_1240K_public_Olalde2019_no_duplicates -L Olalde2019_duplicates.txt -R
and I met the following message
Traceback (most recent call last):
File "/home/psonis/software/EigenStratDatabaseTools/eigenstrat_database_tools.py", line 86, in
validate_eigenstrat(args.genoFn, args.snpFn, args.indFn)
File "/home/psonis/software/EigenStratDatabaseTools/eigenstrat_database_tools.py", line 21, in validate_eigenstrat
dimsGeno = [file_len(genof), file_width(genof)]
File "/home/psonis/software/EigenStratDatabaseTools/eigenstrat_database_tools.py", line 8, in file_len
for i, l in enumerate(f):
File "/home/psonis/miniconda3/lib/python3.9/codecs.py", line 322, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 90: invalid start byte
Any thoughs on how to resolve this?
Nikos
I just figure out that the geno files in Reich dataset are PACKEDANCESTRYMAP (binary) so your tool needs the non packed EIGENSTRAT (I converted it with convertf). I think that you should either inform the user that the files with geno extension could be not EIGENSTRAT or allow your tool to accept binary files.
The text was updated successfully, but these errors were encountered:
Hi,
I used the -C option to compare two eigenstrat databases and found a few duplicates.
So, I executed the following to remove the dublicates from one of the databases:
eigenstrat_database_tools.py -g v54.1_1240K_public_Olalde2019.geno -s v54.1_1240K_public_Olalde2019.snp -i v54.1_1240K_public_Olalde2019.ind -o v54.1_1240K_public_Olalde2019_no_duplicates -L Olalde2019_duplicates.txt -R
and I met the following message
Traceback (most recent call last):
File "/home/psonis/software/EigenStratDatabaseTools/eigenstrat_database_tools.py", line 86, in
validate_eigenstrat(args.genoFn, args.snpFn, args.indFn)
File "/home/psonis/software/EigenStratDatabaseTools/eigenstrat_database_tools.py", line 21, in validate_eigenstrat
dimsGeno = [file_len(genof), file_width(genof)]
File "/home/psonis/software/EigenStratDatabaseTools/eigenstrat_database_tools.py", line 8, in file_len
for i, l in enumerate(f):
File "/home/psonis/miniconda3/lib/python3.9/codecs.py", line 322, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 90: invalid start byte
Any thoughs on how to resolve this?
Nikos
I just figure out that the geno files in Reich dataset are PACKEDANCESTRYMAP (binary) so your tool needs the non packed EIGENSTRAT (I converted it with convertf). I think that you should either inform the user that the files with geno extension could be not EIGENSTRAT or allow your tool to accept binary files.
The text was updated successfully, but these errors were encountered: