Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes case sensitivity when parsing date and time #1168

Merged
merged 12 commits into from
Aug 26, 2020

Conversation

Arzaghi
Copy link
Contributor

@Arzaghi Arzaghi commented Aug 9, 2020

Fixes #1126.

@Arzaghi Arzaghi changed the title Fix Issue #1126 Fixes Issue #1126 Aug 9, 2020
@Arzaghi Arzaghi changed the title Fixes Issue #1126 Fixes case sensitivity when parsing months Aug 9, 2020
@Arzaghi Arzaghi marked this pull request as ready for review August 9, 2020 01:21
@Arzaghi Arzaghi requested a review from a team as a code owner August 9, 2020 01:21
Copy link
Contributor

@AlexGuteniev AlexGuteniev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stl/inc/xlocale Outdated Show resolved Hide resolved
@StephanTLavavej StephanTLavavej added the bug Something isn't working label Aug 9, 2020
@fsb4000
Copy link
Contributor

fsb4000 commented Aug 9, 2020

I think it will be good if you add to testcases that one(with single 'D'): #1126 (comment)

I didn't dig why the existing tests are failing. But I found one regression after your patch:

before

C:\Dev\STL\playground>chcp 65001
Active code page: 65001

C:\Dev\STL\playground>type main.cpp
#include <iostream>
#include <sstream>
#include <iomanip>

void test_normal_russian()
{
    std::tm t = {};
    std::istringstream ss("2020-Февраль-5");
    std::locale l("Russian");
    ss.imbue(l);
    ss >> std::get_time(&t, "%Y-%b-%d");

    if (ss.fail()) {
        std::cout << "Parse failed\n";
    }
    else {
        std::cout << std::put_time(&t, "%c") << '\n';
    }
}


int main()
{
    test_normal_russian();
}

C:\Dev\STL\playground>cl /EHsc /W4 /WX /source-charset:utf-8 /execution-charset:windows-1251 main.cpp
Оптимизирующий компилятор Microsoft (R) C/C++ версии 19.28.29115 для x64
(C) Корпорация Майкрософт (Microsoft Corporation).  Все права защищены.

main.cpp
Microsoft (R) Incremental Linker Version 14.28.29115.0
Copyright (C) Microsoft Corporation.  All rights reserved.

/out:main.exe
main.obj

C:\Dev\STL\playground>main.exe
02/05/20 00:00:00

after

...
C:\Dev\STL\playground>cl /EHsc /W4 /WX /source-charset:utf-8 /execution-charset:windows-1251 main.cpp
Оптимизирующий компилятор Microsoft (R) C/C++ версии 19.28.29115 для x64
(C) Корпорация Майкрософт (Microsoft Corporation).  Все права защищены.

main.cpp
Microsoft (R) Incremental Linker Version 14.28.29115.0
Copyright (C) Microsoft Corporation.  All rights reserved.

/out:main.exe
main.obj

C:\Dev\STL\playground>main.exe
Parse failed

Here a list of russian full daynames:

понедельник (monday)
вторник
среда
четверг
пятница
суббота
воскресенье (sunday)

short day names:

Пн (monday)
Вт
Ср
Чт
Пт
Сб
Вс (sunday)

and full month names:

Январь (January)
Февраль
Март
Апрель
Май
Июнь
Июль
Август
Сентябрь
Октябрь
Ноябрь
Декабрь (December)

and short month names:

янв (January)
фев
мар
апр
май
июн
июл
авг
сен
окт
ноя
дек (December)

some examples for testcases with mixed case:

ЯНВ
МАР
СЕН
Ноя
дЕк
ИЮНЬ
июнь
СБ
сб
Четверг
чеТверг
ЧЕТВЕРГ

Sorry, I don't know how to create unit tests for STL

But I think STL need some testcase with locale.

@Arzaghi Arzaghi changed the title Fixes case sensitivity when parsing months Fixes case sensitivity when parsing date and time Aug 12, 2020
stl/inc/xlocale Outdated Show resolved Hide resolved
@Arzaghi
Copy link
Contributor Author

Arzaghi commented Aug 12, 2020

Any clue why did the tests fail?

I tested with these cases and everything seemed fine.

#include <iostream>
#include <sstream>
#include <iomanip>

void test_english1() {
    std::tm t = {};
    std::istringstream ss("2018-dEceMBeR-18");
    ss >> std::get_time(&t, "%Y-%b-%d");
    if (ss.fail()) {
        std::cout << "Parse failed\n";
    }
    else {
        std::cout << std::put_time(&t, "%c") << '\n';
    }
}

void test_english2() {
    std::tm t = {};
    std::istringstream ss("2020-fEb-31 06:59:34pM");
    ss >> std::get_time(&t, "%Y-%b-%d %H:%M:%S%p");
    if (ss.fail()) {
        std::cout << "Parse failed\n";
    }
    else {
        std::cout << std::put_time(&t, "%c") << '\n';
    }
}

void test_english3() {
    std::tm t = {};
    std::istringstream ss("2020-TestaUg-06 11:12:34aM");
    ss >> std::get_time(&t, "%Y-Test%b-%d %H:%M:%S%p");
    if (ss.fail()) {
        std::cout << "Parse failed\n";
    }
    else {
        std::cout << std::put_time(&t, "%c") << '\n';
    }
}

void test_russian()
{       
    std::tm t = {};            
    std::wstringstream ss;
    ss.imbue(std::locale("ru_RU.UTF-8"));
    ss << L"2020-Ноя-5";

    ss >> std::get_time(&t, L"%Y-%b-%d");

    if (ss.fail()) {
        std::cout << "Parse failed\n";
    }
    else {
        std::cout << std::put_time(&t, "%c") << '\n';
    }
}

void test_german() {
    std::tm t = {};
    std::wstringstream ss;
    ss.imbue(std::locale("de_DE.utf-8"));
    ss << L"2015-fEBrUAr-27 08:09:10pM";

    ss >> std::get_time(&t, L"%Y-%b-%d");

    if (ss.fail()) {
        std::cout << "Parse failed\n";
    }
    else {
        std::cout << std::put_time(&t, "%c") << '\n';
    }
}

void test_chinese() {
    std::tm t = {};
    std::wstringstream ss;
    ss.imbue(std::locale("chinese"));
    ss << L"2019-五月-13 08:09:10pM";

    ss >> std::get_time(&t, L"%Y-%b-%d");

    if (ss.fail()) {
        std::cout << "Parse failed\n";
    }
    else {
        std::cout << std::put_time(&t, "%c") << '\n';
    }
}

int main() {
    test_english1();    
    test_english2();
    test_english3();
    test_russian();
    test_german();
    test_chinese();
    std::cin.get();
    return 0;
}

@AlexGuteniev
Copy link
Contributor

Click details on check failure, see this in test log:

1: Failed Tests (6):
1:   tr1 :: tests/codecvt:08
1:   tr1 :: tests/codecvt:27
1:   tr1 :: tests/codecvt:26
1:   tr1 :: tests/codecvt:17
1:   tr1 :: tests/codecvt:25
1:   tr1 :: tests/codecvt:16

Follow these steps, try to see the same failure locally:
https://github.com/microsoft/STL#running-a-subset-of-the-tests


Or see further in test log:

2020-08-12T16:09:43.7267456Z 1: Command: "C:\Program Files (x86)\Microsoft Visual Studio\2019\Preview\VC\Tools\MSVC\14.28.29115\bin\HostX86\x86\cl.EXE" "/FeD:\build\x86\tests\tr1\tests\codecvt\08\codecvt.exe" "/FoD:\build\x86\tests\tr1\tests\codecvt\08\codecvt.obj" "/ID:\build\x86\out\inc" "/IC:/agent/_work/1/s/tests/tr1/include" "/IC:/agent/_work/1/s/tests/std/include" "/nologo" "/Od" "/W4" "/w14061" "/w14242" "/w14265" "/w14582" "/w14583" "/w14587" "/w14588" "/w14749" "/w14841" "/w14842" "/w15038" "/w15214" "/w15215" "/w15216" "/w15217" "/sdl" "/WX" "/Zc:strictStrings" "/D_ENABLE_STL_INTERNAL_CHECK" "/bigobj" "/FIforce_include.hpp" "/D_ENFORCE_FACET_SPECIALIZATIONS=1" "/D_CRT_SECURE_NO_WARNINGS" "/EHsc" "/MDd" "/D_ITERATOR_DEBUG_LEVEL=2" "/std:c++17" "/permissive-" "C:\agent\_work\1\s\tests\tr1\tests\codecvt\test.cpp" "/link" "/LIBPATH:D:\build\x86\out\lib\i386" "/LIBPATH:C:\Program Files (x86)\Microsoft Visual Studio\2019\Preview\VC\Tools\MSVC\14.28.29115\lib\x86" "/MANIFEST:EMBED"
2020-08-12T16:09:43.7269580Z 1: Exit Code: 2
2020-08-12T16:09:43.7269912Z 1: Standard Output:
2020-08-12T16:09:43.7270212Z 1: --
2020-08-12T16:09:43.7270477Z 1: test.cpp
2020-08-12T16:09:43.7270935Z 1: D:\build\x86\out\inc\xlocale(672): error C3861: '_Maklocchr': identifier not found
2020-08-12T16:09:43.7271787Z 1: D:\build\x86\out\inc\xlocale(672): note: '_Maklocchr': function was not declared in the template definition context and can be found only via argument-dependent lookup in the instantiation context
2020-08-12T16:09:43.7272831Z 1: D:\build\x86\out\inc\xlocale(669): note: while compiling class template member function 'const char *std::ctype<_Elem>::do_widen(const char *,const char *,_Elem *) const'
2020-08-12T16:09:43.7273484Z 1:         with
2020-08-12T16:09:43.7273792Z 1:         [
2020-08-12T16:09:43.7274096Z 1:             _Elem=char_type
2020-08-12T16:09:43.7274447Z 1:         ]
2020-08-12T16:09:43.7274990Z 1: D:\build\x86\out\inc\istream(139): note: see reference to class template instantiation 'std::ctype<_Elem>' being compiled
2020-08-12T16:09:43.7275543Z 1:         with
2020-08-12T16:09:43.7275847Z 1:         [
2020-08-12T16:09:43.7276155Z 1:             _Elem=char_type
2020-08-12T16:09:43.7276485Z 1:         ]
2020-08-12T16:09:43.7277131Z 1: D:\build\x86\out\inc\istream(116): note: while compiling class template member function 'bool std::basic_istream<char_type,std::char_traits<char_type>>::_Ipfx(bool)'
2020-08-12T16:09:43.7278255Z 1: D:\build\x86\out\inc\istream(103): note: see reference to function template instantiation 'bool std::basic_istream<char_type,std::char_traits<char_type>>::_Ipfx(bool)' being compiled
2020-08-12T16:09:43.7279357Z 1: C:\agent\_work\1\s\tests\tr1\tests\codecvt\test.cpp(32): note: see reference to class template instantiation 'std::basic_istream<char_type,std::char_traits<char_type>>' being compiled
2020-08-12T16:09:43.7280737Z 1: C:\agent\_work\1\s\tests\tr1\tests\codecvt\test.cpp(38): note: see reference to function template instantiation 'void test_write<std::codecvt_utf8<unsigned int,1114111,(std::codecvt_mode)0>>(Cvt *,int)' being compiled
2020-08-12T16:09:43.7281535Z 1:         with
2020-08-12T16:09:43.7282313Z 1:         [
2020-08-12T16:09:43.7290708Z 1:             Cvt=std::codecvt_utf8<unsigned int,1114111,(std::codecvt_mode)0>
2020-08-12T16:09:43.7291566Z 1:         ]
2020-08-12T16:09:43.7292076Z 1: --

@AlexGuteniev
Copy link
Contributor

_Maklocchr is now below the class that you moved up

@AlexGuteniev
Copy link
Contributor

Could you please add tests to std test suite?

@Arzaghi
Copy link
Contributor Author

Arzaghi commented Aug 14, 2020

After I fixed the Issues mentioned in Issue #1126 I've just added some tests to cover the scenarios mentioned on the thread.
However, in order to add tests to cover locale dates, I was forced to change the test.cpp encoding to UTF-8-BOM and put some Russian, German, and Chines characters in this file.
Now code format validation fails on the check.
Any solution?

@AlexGuteniev
Copy link
Contributor

AlexGuteniev commented Aug 14, 2020

I would encode those strings as hex literals with comments about content in English, like "\xCF87" // Greek Chi

But probably maintainers should answer

@Arzaghi
Copy link
Contributor Author

Arzaghi commented Aug 14, 2020

I would encode those strings as hex literals with comments about content in English, like "xCF87" // Greek Chi

But probably maintainers should answer

Tricky solution :) but then those strings won't be readable anymore

@AlexGuteniev
Copy link
Contributor

Chinese aren't readable to me already. I guess a comment like "\x5432\x2332" /* short monday in Russian */ would suffice.

@AlexGuteniev
Copy link
Contributor

Or skip Russian and Chinesse, with German it is just a few Umlaut characters

@AlexGuteniev
Copy link
Contributor

Examples from @BillyONeal

path p1(L"\u00E9"sv);

const u8string_view slowPattern{u8"Major Game \U0001F3C8 Sunday"};

stl/inc/xloctime Outdated Show resolved Hide resolved
Copy link
Contributor

@mnatsuhara mnatsuhara left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for these improvements!

stl/inc/xlocale Outdated Show resolved Hide resolved
stl/inc/xlocale Outdated Show resolved Hide resolved
stl/inc/xloctime Outdated Show resolved Hide resolved
stl/inc/xloctime Outdated Show resolved Hide resolved
tests/std/tests/Dev11_0836436_get_time/test.cpp Outdated Show resolved Hide resolved
tests/std/tests/Dev11_0836436_get_time/test.cpp Outdated Show resolved Hide resolved
tests/std/tests/Dev11_0836436_get_time/test.cpp Outdated Show resolved Hide resolved
@StephanTLavavej
Copy link
Member

@Arzaghi FYI, I slightly edited your PR description. GitHub supports syntax to link PRs to issues, such that the linked issues show up in the sidebar and will be automatically closed when the PR is merged. However, the syntax is very specific - "Fix for Issue #1126" is treated as an ordinary mention of an issue, while "Fixes #1126." activates the "this PR will close this issue" linking.

@Arzaghi
Copy link
Contributor Author

Arzaghi commented Aug 22, 2020

@Arzaghi FYI, I slightly edited your PR description. GitHub supports syntax to link PRs to issues, such that the linked issues show up in the sidebar and will be automatically closed when the PR is merged. However, the syntax is very specific - "Fix for Issue #1126" is treated as an ordinary mention of an issue, while "Fixes #1126." activates the "this PR will close this issue" linking.

Thank you very much. I didn't know the point.

Copy link
Member

@StephanTLavavej StephanTLavavej left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I pushed a merge with master, a comment cleanup (we generally prefer comments to be either sentence fragments or complete sentences with proper punctuation, not a mix), and the code movement that @mnatsuhara requested (no other changes) - with _Getloctxt at the end of the file, the diff is much simpler, and we don't need to worry about the placement of class template specializations. I verified locally that this passes tests.

@StephanTLavavej StephanTLavavej self-assigned this Aug 26, 2020
@StephanTLavavej StephanTLavavej merged commit ea1aaf7 into microsoft:master Aug 26, 2020
@StephanTLavavej
Copy link
Member

Thanks for this significant runtime correctness fix, and the extensive testing! 😸 📆

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

<iomanip>: Case sensitivity when parsing months
7 participants