Skip to content

How to handle a join when join field is embedded in line? #1445

Answered by kubu4
kubu4 asked this question in Q&A
Discussion options

You must be logged in to vote

The input file examples above are slightly different than what the solution below used, but are extremely similar (biggest differences is different number of columns in each file). Here's the awk solution I ended up using:

awk \
-v FS='[;[:space:]]+' \
'NR==FNR \
{array[$1]=$0; next} \
($1 in array) \
{print $2"\t"array[$1]}' \
"File02.txt File01.txt" \
> "${joined_output}"

And, here's the code explanation:

  • awk -v FS='[;[:space:]]+': Sets the Field Separator variable to handle ; in UniProt accessions. Allows for proper searching.

  • FNR == NR: Restricts next block (designated by {}) to work only on first input file.

  • {array[$1]=$0; next}: Adds the entire line ($0) of the first file to t…

Replies: 3 comments 1 reply

Comment options

You must be logged in to vote
1 reply
@kubu4
Comment options

kubu4 Apr 7, 2022
Maintainer Author

Comment options

kubu4
Apr 7, 2022
Maintainer Author

You must be logged in to vote
0 replies
Comment options

kubu4
Apr 21, 2022
Maintainer Author

You must be logged in to vote
0 replies
Answer selected by kubu4
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants