Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add API to set line number, fixes #1657 #1658

Closed
wants to merge 7 commits into from

Conversation

fulldecent
Copy link
Contributor

This adds a way to set XML node line numbers.

Why would you want to do that? Constructed nodes. See one implementation that needs this here rubys/nokogumbo#53 (comment)

Sorry, this is maybe a useless contribution, I don't actually know Ruby, just pretending.

This fixes issue #1657.

@flavorjones
Copy link
Member

@fulldecent Please add some tests for this behavior.

@fulldecent
Copy link
Contributor Author

I have some tests here but they are not working. Requesting assistance, I'm in over my head.

@flavorjones
Copy link
Member

I'm happy to work with anybody on the nokogumbo project who can help me understand what the desired behavior is. But this PR is doing bad things with memory (valgrind picks these up as failures in the test suite) and so the PR is not acceptable even if the tests were passing and the JRuby API was similarly enhanced.

@stevecheckoway
Copy link
Contributor

I haven't had a chance to look closely, but I think my comment gives a pretty good hint.

  • Text nodes store the line number directly, either in the line field or the psvi field.
  • Element, comment, or pi nodes store the line number directly in line if it is less than 65535 and not at all if it is 65535 or greater. (Where libxml2 gets the line number in the latter case depends on the type of node.)
  • Other nodes never store the line number.

In nokogumbo, the function setting the line numbers only deals with element, text, CDATA, and comment nodes. I have a comment about it being okay to set small line numbers on CDATA nodes.

The first thing I would try is adding conditionals to only set the line number if it is one of text, element, comment, or pi nodes. It'd also be good to double check that I didn't misunderstand what was going on with the psvi field and that its use didn't change between versions.

@fulldecent
Copy link
Contributor Author

@stevecheckoway @flavorjones @gjtorikian sorry team, I do not have much technical expertise to add here to this PR. My intention was just to get the correct people in this room and get the PR started with what we already have available.

May I please ask:

  1. @gjtorikian, @jeremy: what do you need from this PR to get your job done (Nokogiri::HTML5 does not support line numbers rubys/nokogumbo#53, HTML5 parsing and error checking gjtorikian/html-proofer#362)?
  2. @flavorjones: what would you need to accept this PR?
  3. Everyone: who can help actually do this?

If we can't get this done then I will be happy to learn so and then I can continue by closing this and going downstream to close dependent items. But it will be much better if we /can/ get this done.

@flavorjones
Copy link
Member

In order to consider this PR, I would need to see:

  1. the tests pass
  2. valgrind run on the test suite with no illegal memory access

and a nice-to-have would be:

  • an end-to-end demonstration of how nokogumbo will call this API, to ensure that it works as expected in that context.

Details on the illegal memory access:

$ TESTOPTS="-n '/test_set_line/'" bundle exec rake test:valgrind
...
==15700== Memcheck, a memory error detector
==15700== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==15700== Using Valgrind-3.12.0.SVN and LibVEX; rerun with -h for copyright info
==15700== Command: /home/flavorjones/.rvm/rubies/ruby-2.4.1/bin/ruby -w -Ilib:bin:test:. -e require\ "rubygems";\ require\ "minitest/autorun";\ require\ "test/test_memory_leak.rb";\ require\ "test/css/test_tokenizer.rb";\ require\ "test/css/test_parser.rb";\ require\ "test/css/test_nthiness.rb";\ require\ "test/css/test_xpath_visitor.rb";\ require\ "test/test_encoding_handler.rb";\ require\ "test/test_soap4r_sax.rb";\ require\ "test/test_xslt_transforms.rb";\ require\ "test/xslt/test_exception_handling.rb";\ require\ "test/xslt/test_custom_functions.rb";\ require\ "test/test_convert_xpath.rb";\ require\ "test/namespaces/test_additional_namespaces_in_builder_doc.rb";\ require\ "test/namespaces/test_namespaces_aliased_default.rb";\ require\ "test/namespaces/test_namespaces_in_cloned_doc.rb";\ require\ "test/namespaces/test_namespaces_in_parsed_doc.rb";\ require\ "test/namespaces/test_namespaces_in_created_doc.rb";\ require\ "test/namespaces/test_namespaces_in_builder_doc.rb";\ require\ "test/namespaces/test_namespaces_preservation.rb";\ require\ "test/html/test_builder.rb";\ require\ "test/html/test_node.rb";\ require\ "test/html/test_document_encoding.rb";\ require\ "test/html/test_node_encoding.rb";\ require\ "test/html/test_named_characters.rb";\ require\ "test/html/test_element_description.rb";\ require\ "test/html/test_document.rb";\ require\ "test/html/sax/test_parser_context.rb";\ require\ "test/html/sax/test_parser.rb";\ require\ "test/html/sax/test_push_parser.rb";\ require\ "test/html/test_document_fragment.rb";\ require\ "test/test_css_cache.rb";\ require\ "test/decorators/test_slop.rb";\ require\ "test/test_nokogiri.rb";\ require\ "test/xml/test_builder.rb";\ require\ "test/xml/test_c14n.rb";\ require\ "test/xml/test_node.rb";\ require\ "test/xml/test_attr.rb";\ require\ "test/xml/node/test_subclass.rb";\ require\ "test/xml/node/test_save_options.rb";\ require\ "test/xml/test_node_inheritance.rb";\ require\ "test/xml/test_namespace.rb";\ require\ "test/xml/test_dtd_encoding.rb";\ require\ "test/xml/test_reader_encoding.rb";\ require\ "test/xml/test_processing_instruction.rb";\ require\ "test/xml/test_document_encoding.rb";\ require\ "test/xml/test_element_content.rb";\ require\ "test/xml/test_node_encoding.rb";\ require\ "test/xml/test_xinclude.rb";\ require\ "test/xml/test_element_decl.rb";\ require\ "test/xml/test_cdata.rb";\ require\ "test/xml/test_text.rb";\ require\ "test/xml/test_parse_options.rb";\ require\ "test/xml/test_node_reparenting.rb";\ require\ "test/xml/test_reader.rb";\ require\ "test/xml/test_comment.rb";\ require\ "test/xml/test_schema.rb";\ require\ "test/xml/test_document.rb";\ require\ "test/xml/sax/test_parser_context.rb";\ require\ "test/xml/sax/test_parser.rb";\ require\ "test/xml/sax/test_push_parser.rb";\ require\ "test/xml/test_entity_reference.rb";\ require\ "test/xml/test_dtd.rb";\ require\ "test/xml/test_relax_ng.rb";\ require\ "test/xml/test_entity_decl.rb";\ require\ "test/xml/test_xpath.rb";\ require\ "test/xml/test_node_attributes.rb";\ require\ "test/xml/test_node_set.rb";\ require\ "test/xml/test_syntax_error.rb";\ require\ "test/xml/test_attribute_decl.rb";\ require\ "test/xml/test_document_fragment.rb";\ require\ "test/xml/test_unparented_node.rb" -- -n /test_set_line/
==15700== 
/home/flavorjones/code/oss/nokogiri/test/helper.rb:25: version info: {"warnings"=>[], "nokogiri"=>"1.8.0", "ruby"=>{"version"=>"2.4.1", "platform"=>"x86_64-linux", "description"=>"ruby 2.4.1p111 (2017-03-22 revision 58053) [x86_64-linux]", "engine"=>"ruby"}, "libxml"=>{"binding"=>"extension", "source"=>"packaged", "libxml2_path"=>"/home/flavorjones/code/oss/nokogiri/ports/x86_64-pc-linux-gnu/libxml2/2.9.4", "libxslt_path"=>"/home/flavorjones/code/oss/nokogiri/ports/x86_64-pc-linux-gnu/libxslt/1.1.29", "libxml2_patches"=>["0001-Fix-comparison-with-root-node-in-xmlXPathCmpNodes.patch", "0002-Fix-XPointer-paths-beginning-with-range-to.patch", "0003-Disallow-namespace-nodes-in-XPointer-ranges.patch"], "libxslt_patches"=>["0001-Fix-heap-overread-in-xsltFormatNumberConversion.patch", "0002-Check-for-integer-overflow-in-xsltAddTextString.patch"], "compiled"=>"2.9.4", "loaded"=>"2.9.4"}}
==15700== Invalid write of size 2
==15700==    at 0xA67D130: set_line (xml_node.c:1269)
==15700==    by 0x50356F9: vm_call_cfunc_with_frame (vm_insnhelper.c:1752)
==15700==    by 0x50356F9: vm_call_cfunc (vm_insnhelper.c:1847)
==15700==    by 0x5044EE2: vm_call_method (vm_insnhelper.c:2292)
==15700==    by 0x503E397: vm_exec_core (insns.def:1066)
==15700==    by 0x5043480: vm_exec (vm.c:1727)
==15700==    by 0x5044239: invoke_block (vm.c:969)
==15700==    by 0x5044239: invoke_iseq_block_from_c (vm.c:1014)
==15700==    by 0x50457EC: invoke_block_from_c_splattable (vm.c:1032)
==15700==    by 0x50457EC: vm_yield (vm.c:1074)
==15700==    by 0x50457EC: rb_yield_0 (vm_eval.c:1010)
==15700==    by 0x50457EC: rb_yield_1 (vm_eval.c:1016)
==15700==    by 0x50457EC: rb_yield (vm_eval.c:1026)
==15700==    by 0x4E6B1DC: rb_ary_each (array.c:1824)
==15700==    by 0x50356F9: vm_call_cfunc_with_frame (vm_insnhelper.c:1752)
==15700==    by 0x50356F9: vm_call_cfunc (vm_insnhelper.c:1847)
==15700==    by 0x5044EE2: vm_call_method (vm_insnhelper.c:2292)
==15700==    by 0x503DF59: vm_exec_core (insns.def:967)
==15700==    by 0x5043480: vm_exec (vm.c:1727)
==15700==    by 0x5044239: invoke_block (vm.c:969)
==15700==    by 0x5044239: invoke_iseq_block_from_c (vm.c:1014)
==15700==    by 0x50457EC: invoke_block_from_c_splattable (vm.c:1032)
==15700==    by 0x50457EC: vm_yield (vm.c:1074)
==15700==    by 0x50457EC: rb_yield_0 (vm_eval.c:1010)
==15700==    by 0x50457EC: rb_yield_1 (vm_eval.c:1016)
==15700==    by 0x50457EC: rb_yield (vm_eval.c:1026)
==15700==    by 0x4E70B4C: rb_ary_collect (array.c:2734)
==15700==    by 0x50356F9: vm_call_cfunc_with_frame (vm_insnhelper.c:1752)
==15700==    by 0x50356F9: vm_call_cfunc (vm_insnhelper.c:1847)
==15700==    by 0x5044EE2: vm_call_method (vm_insnhelper.c:2292)
==15700==    by 0x503DF59: vm_exec_core (insns.def:967)
==15700==    by 0x5043480: vm_exec (vm.c:1727)
==15700==    by 0x5044239: invoke_block (vm.c:969)
==15700==    by 0x5044239: invoke_iseq_block_from_c (vm.c:1014)
==15700==    by 0x50443FF: invoke_block_from_c_unsplattable (vm.c:1101)
==15700==    by 0x5044602: vm_invoke_proc (vm.c:1126)
==15700==    by 0x4F732F1: rb_proc_call (proc.c:845)
==15700==    by 0x4EE6A95: exec_end_procs_chain (eval_jump.c:108)
==15700==    by 0x4EE6A95: rb_exec_end_proc (eval_jump.c:125)
==15700==    by 0x4EE6BE2: ruby_finalize_0 (eval.c:122)
==15700==    by 0x4EE6EEF: ruby_cleanup (eval.c:179)
==15700==    by 0x4EE71B4: ruby_run_node (eval.c:300)
==15700==    by 0x40087A: main (main.c:36)
==15700==  Address 0x6eeb920 is 0 bytes after a block of size 112 alloc'd
==15700==    at 0x4C2DB4F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==15700==    by 0x4EFE123: objspace_xmalloc0 (gc.c:7827)
==15700==    by 0xA6EFBE6: xmlAddElementDecl (in /home/flavorjones/code/oss/nokogiri/lib/nokogiri/nokogiri.so)
==15700==    by 0xA7CA86F: xmlSAX2ElementDecl (in /home/flavorjones/code/oss/nokogiri/lib/nokogiri/nokogiri.so)
==15700==    by 0xA6B6D6E: xmlParseElementDecl (in /home/flavorjones/code/oss/nokogiri/lib/nokogiri/nokogiri.so)
==15700==    by 0xA6B7BE1: xmlParseMarkupDecl (in /home/flavorjones/code/oss/nokogiri/lib/nokogiri/nokogiri.so)
==15700==    by 0xA6BAEFB: xmlParseInternalSubset (in /home/flavorjones/code/oss/nokogiri/lib/nokogiri/nokogiri.so)
==15700==    by 0xA6C2A2F: xmlParseDocument (in /home/flavorjones/code/oss/nokogiri/lib/nokogiri/nokogiri.so)
==15700==    by 0xA6CC835: xmlDoRead (in /home/flavorjones/code/oss/nokogiri/lib/nokogiri/nokogiri.so)
==15700==    by 0xA6CC9CC: xmlReadMemory (in /home/flavorjones/code/oss/nokogiri/lib/nokogiri/nokogiri.so)
==15700==    by 0xA67AC54: read_memory (xml_document.c:291)
==15700==    by 0x50356F9: vm_call_cfunc_with_frame (vm_insnhelper.c:1752)
==15700==    by 0x50356F9: vm_call_cfunc (vm_insnhelper.c:1847)
==15700==    by 0x5044EE2: vm_call_method (vm_insnhelper.c:2292)
==15700==    by 0x503E397: vm_exec_core (insns.def:1066)
==15700==    by 0x5043480: vm_exec (vm.c:1727)
==15700==    by 0x5044239: invoke_block (vm.c:969)
==15700==    by 0x5044239: invoke_iseq_block_from_c (vm.c:1014)
==15700==    by 0x50457EC: invoke_block_from_c_splattable (vm.c:1032)
==15700==    by 0x50457EC: vm_yield (vm.c:1074)
==15700==    by 0x50457EC: rb_yield_0 (vm_eval.c:1010)
==15700==    by 0x50457EC: rb_yield_1 (vm_eval.c:1016)
==15700==    by 0x50457EC: rb_yield (vm_eval.c:1026)
==15700==    by 0x4E6B1DC: rb_ary_each (array.c:1824)
==15700==    by 0x50356F9: vm_call_cfunc_with_frame (vm_insnhelper.c:1752)
==15700==    by 0x50356F9: vm_call_cfunc (vm_insnhelper.c:1847)
==15700==    by 0x5044EE2: vm_call_method (vm_insnhelper.c:2292)
==15700==    by 0x503DF59: vm_exec_core (insns.def:967)
==15700==    by 0x5043480: vm_exec (vm.c:1727)
==15700==    by 0x5044239: invoke_block (vm.c:969)
==15700==    by 0x5044239: invoke_iseq_block_from_c (vm.c:1014)
==15700==    by 0x50457EC: invoke_block_from_c_splattable (vm.c:1032)
==15700==    by 0x50457EC: vm_yield (vm.c:1074)
==15700==    by 0x50457EC: rb_yield_0 (vm_eval.c:1010)
==15700==    by 0x50457EC: rb_yield_1 (vm_eval.c:1016)
==15700==    by 0x50457EC: rb_yield (vm_eval.c:1026)
==15700==    by 0x4E70B4C: rb_ary_collect (array.c:2734)
==15700==    by 0x50356F9: vm_call_cfunc_with_frame (vm_insnhelper.c:1752)
==15700==    by 0x50356F9: vm_call_cfunc (vm_insnhelper.c:1847)
==15700==    by 0x5044EE2: vm_call_method (vm_insnhelper.c:2292)
==15700==    by 0x503DF59: vm_exec_core (insns.def:967)
==15700==    by 0x5043480: vm_exec (vm.c:1727)
==15700==    by 0x5044239: invoke_block (vm.c:969)
==15700==    by 0x5044239: invoke_iseq_block_from_c (vm.c:1014)
==15700==    by 0x50443FF: invoke_block_from_c_unsplattable (vm.c:1101)
==15700==    by 0x5044602: vm_invoke_proc (vm.c:1126)
==15700==    by 0x4F732F1: rb_proc_call (proc.c:845)
==15700==    by 0x4EE6A95: exec_end_procs_chain (eval_jump.c:108)
==15700==    by 0x4EE6A95: rb_exec_end_proc (eval_jump.c:125)
==15700==    by 0x4EE6BE2: ruby_finalize_0 (eval.c:122)
==15700==    by 0x4EE6EEF: ruby_cleanup (eval.c:179)
==15700==    by 0x4EE71B4: ruby_run_node (eval.c:300)
==15700==    by 0x40087A: main (main.c:36)
==15700== 
Run options: -n /test_set_line/ --seed 6031

# Running:

FFFFFF==15700== Invalid free() / delete / delete[] / realloc()
==15700==    at 0x4C2ED7B: free (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==15700==    by 0x4EFBD1C: objspace_xfree (gc.c:7904)
==15700==    by 0x4EFBD1C: ruby_sized_xfree (gc.c:7997)
==15700==    by 0x4EFBD1C: ruby_xfree (gc.c:8004)
==15700==    by 0xA6F063C: xmlFreeAttribute (in /home/flavorjones/code/oss/nokogiri/lib/nokogiri/nokogiri.so)
==15700==    by 0xA6DF348: xmlHashFree (in /home/flavorjones/code/oss/nokogiri/lib/nokogiri/nokogiri.so)
==15700==    by 0xA6F0E4A: xmlFreeAttributeTable (in /home/flavorjones/code/oss/nokogiri/lib/nokogiri/nokogiri.so)
==15700==    by 0xA6CF65F: xmlFreeDtd (in /home/flavorjones/code/oss/nokogiri/lib/nokogiri/nokogiri.so)
==15700==    by 0xA6CF92C: xmlFreeDoc (in /home/flavorjones/code/oss/nokogiri/lib/nokogiri/nokogiri.so)
==15700==    by 0x4EF28B0: run_final (gc.c:2752)
==15700==    by 0x4EF28B0: finalize_list (gc.c:2768)
==15700==    by 0x4F0040E: rb_objspace_call_finalizer (gc.c:2915)
==15700==    by 0x4F0040E: rb_gc_call_finalizer_at_exit (gc.c:2840)
==15700==    by 0x4EE701D: ruby_finalize_1 (eval.c:131)
==15700==    by 0x4EE701D: ruby_cleanup (eval.c:221)
==15700==    by 0x4EE71B4: ruby_run_node (eval.c:300)
==15700==    by 0x40087A: main (main.c:36)
==15700==  Address 0xc330055 is in a --- mapped file /home/flavorjones/.rvm/rubies/ruby-2.4.1/lib/ruby/2.4.0/x86_64-linux/enc/utf_32be.so segment
==15700== 
==15700== Invalid free() / delete / delete[] / realloc()
==15700==    at 0x4C2ED7B: free (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==15700==    by 0x4EFBD1C: objspace_xfree (gc.c:7904)
==15700==    by 0x4EFBD1C: ruby_sized_xfree (gc.c:7997)
==15700==    by 0x4EFBD1C: ruby_xfree (gc.c:8004)
==15700==    by 0xA6CFA38: xmlFreeDoc (in /home/flavorjones/code/oss/nokogiri/lib/nokogiri/nokogiri.so)
==15700==    by 0x4EF28B0: run_final (gc.c:2752)
==15700==    by 0x4EF28B0: finalize_list (gc.c:2768)
==15700==    by 0x4F0040E: rb_objspace_call_finalizer (gc.c:2915)
==15700==    by 0x4F0040E: rb_gc_call_finalizer_at_exit (gc.c:2840)
==15700==    by 0x4EE701D: ruby_finalize_1 (eval.c:131)
==15700==    by 0x4EE701D: ruby_cleanup (eval.c:221)
==15700==    by 0x4EE71B4: ruby_run_node (eval.c:300)
==15700==    by 0x40087A: main (main.c:36)
==15700==  Address 0x55 is not stack'd, malloc'd or (recently) free'd
==15700== 
==15700== Invalid free() / delete / delete[] / realloc()
==15700==    at 0x4C2ED7B: free (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==15700==    by 0x4EFBD1C: objspace_xfree (gc.c:7904)
==15700==    by 0x4EFBD1C: ruby_sized_xfree (gc.c:7997)
==15700==    by 0x4EFBD1C: ruby_xfree (gc.c:8004)
==15700==    by 0xA6CF5C2: xmlFreeDtd (in /home/flavorjones/code/oss/nokogiri/lib/nokogiri/nokogiri.so)
==15700==    by 0xA6CF92C: xmlFreeDoc (in /home/flavorjones/code/oss/nokogiri/lib/nokogiri/nokogiri.so)
==15700==    by 0x4EF28B0: run_final (gc.c:2752)
==15700==    by 0x4EF28B0: finalize_list (gc.c:2768)
==15700==    by 0x4F0040E: rb_objspace_call_finalizer (gc.c:2915)
==15700==    by 0x4F0040E: rb_gc_call_finalizer_at_exit (gc.c:2840)
==15700==    by 0x4EE701D: ruby_finalize_1 (eval.c:131)
==15700==    by 0x4EE701D: ruby_cleanup (eval.c:221)
==15700==    by 0x4EE71B4: ruby_run_node (eval.c:300)
==15700==    by 0x40087A: main (main.c:36)
==15700==  Address 0x8c70055 is 21 bytes after a block of size 80 free'd
==15700==    at 0x4C2ED7B: free (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==15700==    by 0x4FB4AF1: onig_region_free (regexec.c:346)
==15700==    by 0x4F996AA: rb_reg_search0 (re.c:1549)
==15700==    by 0x4F996AA: rb_reg_search (re.c:1588)
==15700==    by 0x4F996FC: rb_reg_eqq (re.c:3113)
==15700==    by 0x5047EA4: vm_call0_cfunc_with_frame (vm_eval.c:132)
==15700==    by 0x5047EA4: vm_call0_cfunc (vm_eval.c:149)
==15700==    by 0x5047EA4: vm_call0_body.constprop.141 (vm_eval.c:181)
==15700==    by 0x50486A8: vm_call0 (vm_eval.c:62)
==15700==    by 0x50486A8: rb_call0 (vm_eval.c:343)
==15700==    by 0x5048D03: rb_call (vm_eval.c:629)
==15700==    by 0x5048D03: rb_funcall (vm_eval.c:841)
==15700==    by 0x4ED481D: grep_i (enum.c:70)
==15700==    by 0x50458DC: vm_yield_with_cfunc (vm_insnhelper.c:2460)
==15700==    by 0x50458DC: invoke_block_from_c_splattable (vm.c:1037)
==15700==    by 0x50458DC: vm_yield (vm.c:1074)
==15700==    by 0x50458DC: rb_yield_0 (vm_eval.c:1010)
==15700==    by 0x50458DC: rb_yield_1 (vm_eval.c:1016)
==15700==    by 0x50458DC: rb_yield (vm_eval.c:1026)
==15700==    by 0x4E6B1DC: rb_ary_each (array.c:1824)
==15700==    by 0x5047EA4: vm_call0_cfunc_with_frame (vm_eval.c:132)
==15700==    by 0x5047EA4: vm_call0_cfunc (vm_eval.c:149)
==15700==    by 0x5047EA4: vm_call0_body.constprop.141 (vm_eval.c:181)
==15700==    by 0x50486A8: vm_call0 (vm_eval.c:62)
==15700==    by 0x50486A8: rb_call0 (vm_eval.c:343)
==15700==    by 0x5037856: rb_iterate0 (vm_eval.c:1173)
==15700==    by 0x5037A1A: rb_block_call (vm_eval.c:1236)
==15700==    by 0x4ECE9E9: enum_grep (enum.c:112)
==15700==    by 0x50356F9: vm_call_cfunc_with_frame (vm_insnhelper.c:1752)
==15700==    by 0x50356F9: vm_call_cfunc (vm_insnhelper.c:1847)
==15700==    by 0x503E397: vm_exec_core (insns.def:1066)
==15700==    by 0x5043480: vm_exec (vm.c:1727)
==15700==    by 0x5044239: invoke_block (vm.c:969)
==15700==    by 0x5044239: invoke_iseq_block_from_c (vm.c:1014)
==15700==    by 0x50457EC: invoke_block_from_c_splattable (vm.c:1032)
==15700==    by 0x50457EC: vm_yield (vm.c:1074)
==15700==    by 0x50457EC: rb_yield_0 (vm_eval.c:1010)
==15700==    by 0x50457EC: rb_yield_1 (vm_eval.c:1016)
==15700==    by 0x50457EC: rb_yield (vm_eval.c:1026)
==15700==    by 0x4E70B4C: rb_ary_collect (array.c:2734)
==15700==    by 0x50356F9: vm_call_cfunc_with_frame (vm_insnhelper.c:1752)
==15700==    by 0x50356F9: vm_call_cfunc (vm_insnhelper.c:1847)
==15700==    by 0x5044EE2: vm_call_method (vm_insnhelper.c:2292)
==15700==    by 0x503DF59: vm_exec_core (insns.def:967)
==15700==    by 0x5043480: vm_exec (vm.c:1727)
==15700==    by 0x5044239: invoke_block (vm.c:969)
==15700==    by 0x5044239: invoke_iseq_block_from_c (vm.c:1014)
==15700==    by 0x50443FF: invoke_block_from_c_unsplattable (vm.c:1101)
==15700==    by 0x5044602: vm_invoke_proc (vm.c:1126)
==15700==    by 0x4F732F1: rb_proc_call (proc.c:845)
==15700==    by 0x4EE6A95: exec_end_procs_chain (eval_jump.c:108)
==15700==    by 0x4EE6A95: rb_exec_end_proc (eval_jump.c:125)
==15700==    by 0x4EE6BE2: ruby_finalize_0 (eval.c:122)
==15700==    by 0x4EE6EEF: ruby_cleanup (eval.c:179)
==15700==    by 0x4EE71B4: ruby_run_node (eval.c:300)
==15700==    by 0x40087A: main (main.c:36)
==15700==  Block was alloc'd at
==15700==    at 0x4C2DB4F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==15700==    by 0x4FB49A2: onig_region_resize (regexec.c:260)
==15700==    by 0x4FB4DE0: onig_region_resize_clear (regexec.c:298)
==15700==    by 0x4FB4DE0: onig_search_gpos (regexec.c:4168)
==15700==    by 0x4FB5885: onig_search (regexec.c:4145)
==15700==    by 0x4F992A3: rb_reg_search0 (re.c:1531)
==15700==    by 0x4F992A3: rb_reg_search (re.c:1588)
==15700==    by 0x4F996FC: rb_reg_eqq (re.c:3113)
==15700==    by 0x5047EA4: vm_call0_cfunc_with_frame (vm_eval.c:132)
==15700==    by 0x5047EA4: vm_call0_cfunc (vm_eval.c:149)
==15700==    by 0x5047EA4: vm_call0_body.constprop.141 (vm_eval.c:181)
==15700==    by 0x50486A8: vm_call0 (vm_eval.c:62)
==15700==    by 0x50486A8: rb_call0 (vm_eval.c:343)
==15700==    by 0x5048D03: rb_call (vm_eval.c:629)
==15700==    by 0x5048D03: rb_funcall (vm_eval.c:841)
==15700==    by 0x4ED481D: grep_i (enum.c:70)
==15700==    by 0x50458DC: vm_yield_with_cfunc (vm_insnhelper.c:2460)
==15700==    by 0x50458DC: invoke_block_from_c_splattable (vm.c:1037)
==15700==    by 0x50458DC: vm_yield (vm.c:1074)
==15700==    by 0x50458DC: rb_yield_0 (vm_eval.c:1010)
==15700==    by 0x50458DC: rb_yield_1 (vm_eval.c:1016)
==15700==    by 0x50458DC: rb_yield (vm_eval.c:1026)
==15700==    by 0x4E6B1DC: rb_ary_each (array.c:1824)
==15700==    by 0x5047EA4: vm_call0_cfunc_with_frame (vm_eval.c:132)
==15700==    by 0x5047EA4: vm_call0_cfunc (vm_eval.c:149)
==15700==    by 0x5047EA4: vm_call0_body.constprop.141 (vm_eval.c:181)
==15700==    by 0x50486A8: vm_call0 (vm_eval.c:62)
==15700==    by 0x50486A8: rb_call0 (vm_eval.c:343)
==15700==    by 0x5037856: rb_iterate0 (vm_eval.c:1173)
==15700==    by 0x5037A1A: rb_block_call (vm_eval.c:1236)
==15700==    by 0x4ECE9E9: enum_grep (enum.c:112)
==15700==    by 0x50356F9: vm_call_cfunc_with_frame (vm_insnhelper.c:1752)
==15700==    by 0x50356F9: vm_call_cfunc (vm_insnhelper.c:1847)
==15700==    by 0x503E397: vm_exec_core (insns.def:1066)
==15700==    by 0x5043480: vm_exec (vm.c:1727)
==15700==    by 0x5044239: invoke_block (vm.c:969)
==15700==    by 0x5044239: invoke_iseq_block_from_c (vm.c:1014)
==15700==    by 0x50457EC: invoke_block_from_c_splattable (vm.c:1032)
==15700==    by 0x50457EC: vm_yield (vm.c:1074)
==15700==    by 0x50457EC: rb_yield_0 (vm_eval.c:1010)
==15700==    by 0x50457EC: rb_yield_1 (vm_eval.c:1016)
==15700==    by 0x50457EC: rb_yield (vm_eval.c:1026)
==15700==    by 0x4E70B4C: rb_ary_collect (array.c:2734)
==15700==    by 0x50356F9: vm_call_cfunc_with_frame (vm_insnhelper.c:1752)
==15700==    by 0x50356F9: vm_call_cfunc (vm_insnhelper.c:1847)
==15700==    by 0x5044EE2: vm_call_method (vm_insnhelper.c:2292)
==15700==    by 0x503DF59: vm_exec_core (insns.def:967)
==15700==    by 0x5043480: vm_exec (vm.c:1727)
==15700==    by 0x5044239: invoke_block (vm.c:969)
==15700==    by 0x5044239: invoke_iseq_block_from_c (vm.c:1014)
==15700==    by 0x50443FF: invoke_block_from_c_unsplattable (vm.c:1101)
==15700==    by 0x5044602: vm_invoke_proc (vm.c:1126)
==15700==    by 0x4F732F1: rb_proc_call (proc.c:845)
==15700==    by 0x4EE6A95: exec_end_procs_chain (eval_jump.c:108)
==15700==    by 0x4EE6A95: rb_exec_end_proc (eval_jump.c:125)
==15700==    by 0x4EE6BE2: ruby_finalize_0 (eval.c:122)
==15700==    by 0x4EE6EEF: ruby_cleanup (eval.c:179)
==15700==    by 0x4EE71B4: ruby_run_node (eval.c:300)
==15700==    by 0x40087A: main (main.c:36)
==15700== 


Finished in 1.304488s, 4.5995 runs/s, 5.3661 assertions/s.

  1) Failure:
Nokogiri::XML::TestElementDecl#test_set_line [/home/flavorjones/code/oss/nokogiri/test/xml/test_element_decl.rb:44]:
NoMethodError expected but nothing was raised.


  2) Failure:
Nokogiri::XML::TestNode#test_set_line [/home/flavorjones/code/oss/nokogiri/test/xml/test_node.rb:1109]:
Expected: 42
  Actual: 85


  3) Failure:
Nokogiri::XML::TestEntityDecl#test_set_line [/home/flavorjones/code/oss/nokogiri/test/xml/test_entity_decl.rb:115]:
NoMethodError expected but nothing was raised.


  4) Failure:
Nokogiri::XML::TestAttributeDecl#test_set_line [/home/flavorjones/code/oss/nokogiri/test/xml/test_attribute_decl.rb:67]:
NoMethodError expected but nothing was raised.


  5) Failure:
Nokogiri::XML::TestDocument#test_set_line [/home/flavorjones/code/oss/nokogiri/test/xml/test_document.rb:314]:
NoMethodError expected but nothing was raised.


  6) Failure:
Nokogiri::XML::TestDTD#test_set_line [/home/flavorjones/code/oss/nokogiri/test/xml/test_dtd.rb:151]:
NoMethodError expected but nothing was raised.

6 runs, 7 assertions, 6 failures, 0 errors, 0 skips
==15700== 
==15700== HEAP SUMMARY:
==15700==     in use at exit: 15,822,930 bytes in 99,154 blocks
==15700==   total heap usage: 480,372 allocs, 381,221 frees, 85,922,879 bytes allocated
==15700== 
==15700== LEAK SUMMARY:
==15700==    definitely lost: 2,167,997 bytes in 18,728 blocks
==15700==    indirectly lost: 3,925,452 bytes in 34,307 blocks
==15700==      possibly lost: 6,844,703 bytes in 44,357 blocks
==15700==    still reachable: 2,884,778 bytes in 1,762 blocks
==15700==         suppressed: 0 bytes in 0 blocks
==15700== Rerun with --leak-check=full to see details of leaked memory
==15700== 
==15700== For counts of detected and suppressed errors, rerun with: -v
==15700== ERROR SUMMARY: 5 errors from 4 contexts (suppressed: 0 from 0)
rake aborted!
Command failed with status (42): [ulimit -s unlimited && valgrind --num-call...]
/home/flavorjones/.rvm/gems/ruby-2.4.1/gems/hoe-debugging-1.4.2/lib/hoe/debugging.rb:64:in `hoe_debugging_run_valgrind'
/home/flavorjones/.rvm/gems/ruby-2.4.1/gems/hoe-debugging-1.4.2/lib/hoe/debugging.rb:88:in `block in define_debugging_tasks'
/home/flavorjones/.rvm/gems/ruby-2.4.1/gems/rake-12.1.0/exe/rake:27:in `<top (required)>'
/home/flavorjones/.rvm/gems/ruby-2.4.1/bin/ruby_executable_hooks:15:in `eval'
/home/flavorjones/.rvm/gems/ruby-2.4.1/bin/ruby_executable_hooks:15:in `<main>'
Tasks: TOP => test:valgrind
(See full trace by running task with --trace)

@flavorjones
Copy link
Member

flavorjones commented Nov 15, 2017

I'd also like to see more meaningful tests than what's currently present for some classes, e.g.:

      def test_set_line
        assert_raise NoMethodError do
          @entity_decl.line = 42
        end
      end

and while I understand that the existing legacy tests for #line are Not Great™, this is new code and so I'd like to ensure the tests demonstrate the behavior under test.

@flavorjones
Copy link
Member

Honestly, I'm happy to write the C implementation, and even explore the Java implementation, for this feature. But I need tests to tell me what nokogumbo expects the behavior to be.

@fulldecent
Copy link
Contributor Author

Hello. Will the Nokogiri team please clarify if a contribution here is welcome and will be accepted? If so, type of solution will be acceptable?

@fulldecent
Copy link
Contributor Author

I believe this might be fixed upstream. Has anybody here been working on that and if so could you please cross reference?

@stevecheckoway
Copy link
Contributor

@fulldecent What do you mean by upstream here? I just checked the latest version of libxml and there's no API to set line numbers.

@fulldecent
Copy link
Contributor Author

Sorry about that. Here is my reference:

rubys/nokogumbo#53 (comment)

stevecheckoway/nokogumbo@e884196

But I see this is incorrect.

@flavorjones
Copy link
Member

Sorry if what I said earlier on this issue wasn't clear; I would absolutely accept a PR introducing an API to set the line number in libxml2!

My expectations are simply that it would need to come with unit tests, and will need to pass the existing Nokogiri test suite which notably runs valgrind's memcheck on all unit tests (see notes above).

Because the consumer of the API will primarily be Nokogumbo, I do not expect a JRuby/Xerces/NekoHTML implementation to be submitted. I'm fine with this API call being libxml2-only and the tests being skipped if Nokogiri.jruby?

Hope that response makes sense.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants