Skip to content

Latest commit

 

History

History
121 lines (87 loc) · 3.34 KB

README.md

File metadata and controls

121 lines (87 loc) · 3.34 KB

README

Demonstrate how to handle Xml in shell scripts.

Install

# install xmllint package
sudo apt install libxml2-utils

Troubleshooting

xmllint --debug ./danceloop_00009.svg 

Shell

Manually find the correct paths

xmllint --shell ./danceloop_00009.svg 

> help
> setns x=http://www.w3.org/2000/svg
> xpath /x:svg/x:g
> xpath /x:svg/x:g/x:path/@d

Extraction

# this fails because svg is namespaced
xmllint --xpath '//svg/g/path/@d' ./52_xml/danceloop_00009.svg

# loosens the path selection to avoid having to specify namespace everywhere.  
xmllint --debug --xpath "//*[local-name()='path']" ./danceloop_00009.svg

# extract the attribute
xmllint --debug --xpath "string(//*[local-name()='path']/@d)" ./danceloop_00009.svg

Convert into JSON

Refer to the jq examples

cat <<- EOF > "./frames.json"
{
    "frames": [
    ]
}
EOF
_filename="./danceloop_00009.svg"
_no_extension="${_filename%.*}"
_frame_number=$(echo ${_no_extension} | grep --color=never -o -E '[0-9]+')
xmllint --debug --xpath "string(//*[local-name()='path']/@d)" ./danceloop_00009.svg > /path.txt
jq --rawfile path ./path.txt --arg filename "${_no_extension}" --arg number "${_frame_number}" '.frames += [ {"name":$filename, "path":$path, "number":$number | tonumber }]' "./frames.json"

Process NMAP

Ref: 11_nmap_scanning

# scan network (save xml file)
nmap -p 22 -oX ./net.xml -vvv 192.168.1.0/24   

# show hosts that are up
xmllint --xpath '//host/status[@state="up"]/../address/@addr' ./net.xml

RSS Feed

mkdir -p ./out

# get rss feed
curl -s -o ./out/rss.xml https://latenightlinux.com/feed/mp3

# get first url
FEED_URL=$(xmllint --xpath 'string(//rss/channel/item[1]/enclosure/@url)' --format --pretty 2 ./out/rss.xml)

# get the file
curl -s -L -o ./out/$(basename $FEED_URL) $FEED_URL

S3 bucket listing

Process S3 bucket listing.

# acquire the bucket listing
curl -s "https://mybucket.s3.eu-west-1.amazonaws.com/?prefix=myprefix&max-keys=3&marker=key/path/file" | xmllint --format -

# process manually in xmllint shell
xmllint --shell ./s3listing.xml  
setns x=http://s3.amazonaws.com/doc/2006-03-01/
xpath /x:ListBucketResult/x:Contents/x:Key/text()

# xmllint
xmllint --xpath '//*[local-name()="ListBucketResult"]/*[local-name()="Contents"]/*[local-name()="Key"]/text()' --format ./s3listing.xml   

# easier to write it in python.
./extract_xml.py ./s3listing.xml   

# using yq that ignores namespaces
yq -oy '.ListBucketResult.Contents[].Key' ./s3listing.xml

Resources

  • Some example Xml tooling here
  • Explanation of the namespacing issues here
  • More namespacing here
  • Extract XML Elements Using xmllint hereextract-xml-elements-using-xmllint/
  • xmllint in Linux here
  • xpath cheatsheet here
  • What is RSS? here
  • YQ: Working with XML here