Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GeoIP2 trys to write characters outside of ISO-8859 #127

Closed
manuelm opened this issue Apr 18, 2019 · 2 comments · Fixed by #131
Closed

GeoIP2 trys to write characters outside of ISO-8859 #127

manuelm opened this issue Apr 18, 2019 · 2 comments · Fixed by #131

Comments

@manuelm
Copy link
Contributor

manuelm commented Apr 18, 2019

This is related to #69 but targets GeoIP2 modules. Mainly GeoIP2::City. GeoIP2 names are UTF-8 encoded. Therefor they can contain characters outside of the ISO-8859 range. Since AWStats data files are stored in ISO the GeoIP output has to be escaped. As in #69 I suggest html escaping them, either using HTML::Entities or a stripped down version which only escapes high bytes:

sub html_escape {
  my $s = shift;
  $s =~ s/([^\x00-\x7F])/sprintf "&#x%X;", ord($1)/ge;
  return $s;
}

Regarding geoip2_city.pm the function call should be added in SectionWriteHistory_geoip2_city so that XMLEncodeForHisto($_) becomes XMLEncodeForHisto(html_escape($_)) (or XMLEncodeForHisto(encode_entities_numeric($_))).

For geoip2.pm this isn't strictly necessary as I'm not aware of any international countries names that contain characters outside the ISO range.

195.38.12.0 is an ipaddress for testing. GeoIP2::City tries to store PL_Zamość_Lublin. With escaping this gets converted to PL_Zamość_Lublin.

@manuelm
Copy link
Contributor Author

manuelm commented Apr 18, 2019

Forgot about loading: Of course we also need to unescape the characters on loading/updating the data file. So e.g. in SectionReadHistory_geoip2_city we add decode_entities($field[0]); after @field=split(/\s+/,($xmlold?XMLDecodeFromHisto($_):$_));.

@manuelm
Copy link
Contributor Author

manuelm commented Apr 19, 2019

Here is a better approach without the need of decoding and HTML::Entities as extra dependency. We simply store the escaped name in the hash:

--- b/wwwroot/cgi-bin/plugins/geoip2_city.pm.orig 2019-04-19 12:36:02.026552366 +0200
+++ a/wwwroot/cgi-bin/plugins/geoip2_city.pm  2019-04-19 12:35:52.906632209 +0200
@@ -29,6 +29,11 @@
 #use strict;
 no strict "refs";
 
+sub html_escape {
+ my $s = shift;
+ $s =~ s/([^\x00-\x7F])/sprintf "&#x%X;", ord($1)/ge;
+ return $s;
+}
 
 
 #-----------------------------------------------------------------------------
@@ -369,7 +374,7 @@
             my $countrycity=$record->country()->iso_code().'_'.$record->city()->name();
             $countrycity=~s/ /%20/g;
             if ($record->most_specific_subdivision()->name()) { $countrycity.='_'.$record->most_specific_subdivision()->name(); }
-            $_city_h{lc($countrycity)}++;
+            $_city_h{html_escape(lc($countrycity))}++;
         } else {
             $_city_h{'unknown'}++;
         }
@@ -409,7 +414,7 @@
             my $countrycity=($record->country()->iso_code()).'_'.$city;
             $countrycity=~s/ /%20/g;
             if ($record->most_specific_subdivision()->name()) { $countrycity.='_'.$record->most_specific_subdivision()->name(); }
-            $_city_h{lc($countrycity)}++;
+            $_city_h{html_escape(lc($countrycity))}++;
         } else {
             $_city_h{'unknown'}++;
         }

manuelm added a commit to manuelm/awstats that referenced this issue Apr 26, 2019
* Correctly convert dns names to ip4 and ip6 address using getaddrinfo
  (fixes eldy#120, eldy#121, obsoletes eldy#115)
* Only lookup if the IP is of type public (fixes eldy#122)
  and catch further lookup errors (obsoletes eldy#123)
* Store and display the GeoIP City output HTML escaped (fixes eldy#127)
* Code to perform and cache the actual lookup is consolidated
* General code improvement and readability
@eldy eldy closed this as completed in #131 Jun 27, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant