Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support GBK codepage #5

Closed
wants to merge 2 commits into from
Closed

support GBK codepage #5

wants to merge 2 commits into from

Conversation

zhanleewo
Copy link

The code page CP936 or MS936 has another alias name GBK.

support GBK codepage, The code page CP936 or MS936 has another alias name GBK.
@joachimmetz
Copy link
Member

joachimmetz commented Mar 19, 2019

@zhanleewo thanks for the changes

However there appears some debate if GBK is the same as CP936
https://en.wikipedia.org/wiki/GBK_(character_encoding)

I'll have a more detailed look. In which context do you want to translate "gbk" to "cp936"?

@codecov
Copy link

codecov bot commented Mar 19, 2019

Codecov Report

Merging #5 into master will decrease coverage by 0.77%.
The diff coverage is 0%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master       #5      +/-   ##
==========================================
- Coverage   96.56%   95.79%   -0.78%     
==========================================
  Files           3        3              
  Lines         495      499       +4     
==========================================
  Hits          478      478              
- Misses         17       21       +4
Impacted Files Coverage Δ
libclocale/libclocale_codepage.c 98.91% <0%> (-1.09%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 62f7f77...81ccee5. Read the comment docs.

@zhanleewo
Copy link
Author

zhanleewo commented Mar 19, 2019

yes.
In China. I use vmware creating a vm, and I use your libvmdk to parse the vmdk file i created. there is a failure show that the encoding is GBK.

So I think the libclocal should support this code page.

@joachimmetz
Copy link
Member

Maybe in that case putting this check in libvmdk might be safer, seeing the different interpretations of gbk. Also see: libyal/libvmdk#10

Do you know which tool created the VMDK?

@zhanleewo
Copy link
Author

vmware workstation for windows. chinese edition.

@zhanleewo
Copy link
Author

U did may great things. if u need any adtional information to address this issue.I can do my best to help u fixed this issue.

@joachimmetz
Copy link
Member

assuming gbk equals cp936 for vmware workstation for windows, chinese edition

Pending changes for libvmdk

diff --git a/libvmdk/libvmdk_descriptor_file.c b/libvmdk/libvmdk_descriptor_file.c
index 8a27e7b..d18cab5 100644
--- a/libvmdk/libvmdk_descriptor_file.c
+++ b/libvmdk/libvmdk_descriptor_file.c
@@ -966,12 +966,19 @@ int libvmdk_descriptor_file_read_header(
                                         value );
                                }
 #endif
-                               if( ( value_length == 5 )
-                                && ( value[ 0 ] == 'U' )
-                                && ( value[ 1 ] == 'T' )
-                                && ( value[ 2 ] == 'F' )
-                                && ( value[ 3 ] == '-' )
-                                && ( value[ 4 ] == '8' ) )
+                               if( ( value_length == 3 )
+                                && ( value[ 0 ] == 'G' )
+                                && ( value[ 1 ] == 'B' )
+                                && ( value[ 2 ] == 'K' ) )
+                               {
+                                       descriptor_file->encoding = LIBUNA_CODEPAGE_WINDOWS_936;
+                               }
+                               else if( ( value_length == 5 )
+                                     && ( value[ 0 ] == 'U' )
+                                     && ( value[ 1 ] == 'T' )
+                                     && ( value[ 2 ] == 'F' )
+                                     && ( value[ 3 ] == '-' )
+                                     && ( value[ 4 ] == '8' ) )
                                {
                                        descriptor_file->encoding = 0;

if u need any adtional information to address this issue.I can do my best to help u fixed this issue.

Ideally I would like to know what gbk actually refers to for VMDK, but cp936 seems to be relatively safe for now. To get full closer on this I'd first need to know which character values help distinguish between different variants of GBK.

@joachimmetz
Copy link
Member

Changes for libvmdk libyal/libvmdk@a3f6142

@zhanleewo
Copy link
Author

Thank you.

@joachimmetz
Copy link
Member

@zhanleewo thanks, based on that link, one thing that could be useful is to create a VMDK with an "euro sign" that can be shared. That should give more resolution on which encoding is actually used.

@joachimmetz
Copy link
Member

I'll close this issue in favor of libyal/libvmdk#10

@zhanleewo
Copy link
Author

https://en.m.wikipedia.org/wiki/Code_page_936_(Microsoft_Windows)

I think this is helpful. the difference between GBK and cp639 is listed.

@joachimmetz
Copy link
Member

thx

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants