XKCD's StackSort Implemented In A Vim Regex

2016-03-17 - By Robert Elder

Pre-requisite: XKCD's StackSort (See alt text of image)

Use This Vim Command

.s/\(.*\)/\=system('a='."https:\/\/api.stackexchange.com\/2.2\/".'; q=`curl -s -G --data-urlencode "q='.submatch(1).'" --compressed "'."${a}search\/advanced?order=desc&sort=relevance&site=stackoverflow".'" | python -c "'."exec(\\\"import sys \\nimport json\\nprint(json.loads(''.join(sys.stdin.readlines()))['items'][0]['question_id'])\\\")".'"`; curl -s --compressed "'."${a}questions\/$q\/answers?order=desc&sort=votes&site=stackoverflow&filter=withbody".'" | python -c "'."exec(\\\"import sys\\nimport json\\nprint(json.loads(''.join(sys.stdin.readlines()))['items'][0]['body']).encode('utf8')\\\")".'"')/

In A File With Only This Line

sort a list

And The Result Is

<p>It is not possible to sort a dict, only to get a representation of a dict that is sorted. Dicts are inherently orderless, but other types, such as lists and tuples, are not. So you need a sorted representation, which will be a list—probably a list of tuples.</p>

<p>For instance,</p>

<pre><code>import operator
x = {1: 2, 3: 4, 4: 3, 2: 1, 0: 0}
sorted_x = sorted(x.items(), key=operator.itemgetter(1))
</code></pre>

<p><code>sorted_x</code> will be a list of tuples sorted by the second element in each tuple. <code>dict(sorted_x) == x</code>.</p>

<p>And for those wishing to sort on keys instead of values:</p>

<pre><code>import operator
x = {1: 2, 3: 4, 4: 3, 2: 1, 0: 0}
sorted_x = sorted(x.items(), key=operator.itemgetter(0))
</code></pre>

Which is the first answer to the first question if you search for 'sort a list' on Stack Overflow (which was answered by Devin Jeanpierre).

The above command can be used as a Vim Ex command (type escape then colon and paste the command). This was tested on Ubuntu 14.04. It requires that python and curl be installed. This was tested to work with curl 7.35.0, Python 2.7.6, VIM - Vi IMproved 7.4, json.__version__ = '2.0.9', GNU bash, version 4.3.11.

Of course, this command works not only for stack sort but any pseudocode. For example, if you run this same regex on the line:

merge sort in Haskell

the result is

<p>Try this version:</p>

<pre><code>mergesort :: [String] -&gt; [String]
mergesort = mergesort' . map wrap

mergesort' :: [[String]] -&gt; [String]
mergesort' [] = []
mergesort' [xs] = xs
mergesort' xss = mergesort' (merge_pairs xss)

merge_pairs :: [[String]] -&gt; [[String]]
merge_pairs [] = []
merge_pairs [xs] = [xs]
merge_pairs (xs:ys:xss) = merge xs ys : merge_pairs xss

merge :: [String] -&gt; [String] -&gt; [String]
merge [] ys = ys
merge xs [] = xs
merge (x:xs) (y:ys)
 = if x &gt; y
        then y : merge (x:xs)  ys
        else x : merge  xs    (y:ys)

wrap :: String -&gt; [String]
wrap x = [x]
</code></pre>

<ol>
<li>Bad idea is splitting list first. Instead of it just make list of one member lists. Haskell is lazy, it will be done in right time.</li>
<li>Then merge pairs of lists until you have only one list.</li>
</ol>

<p><strong>Edit</strong>: Someone who down-vote this answer: above merge sort implementation is same algorithm as used in ghc Data.List.sort except with cmp function removed. Well ghc authors are may be wrong :-/</p>

This is the first answer you'll find if you search for 'merge sort in Haskell' on Stack Overflow (which was answered by Hynek -Pichi- Vychodil).

Now you too can harness the power of copying the first answer on Stack Overflow without even reading the question, and do so without leaving the comfort of your editor.

Explanation

To be fair, most of the work here is not being done by Vim and is instead being outsourced to a shell command. This in itself is an EXTREMELY powerful concept, because it lets you apply arbitrary external shell commands against a backreference (which can be matched from any Vim regex you can think of):

%s/\(.*\)/\=system('echo -n "'.submatch(1).'" | wc -c | head -c -1')/

The above example illustrates the overall approach used by the StackSort regex at the top. This command will take every line in the file and replace it with the output of piping it into 'wc -c'. This will output the number of characters in the given line. The 'head -c -1' cuts off the newline that wc outputs so the output looks nicer.

The hardest part of understanding the command at the top, is seeing through all the quotes and concatenation, so let's look at every string that is getting concatenated together to make up what gets passed to the 'system' command in Vim. The strings are concatenated with the '.' character:

'a='.
"https:\/\/api.stackexchange.com\/2.2\/".
'; q=`curl -s -G --data-urlencode "q='.
submatch(1).
'" --compressed "'.
"${a}search\/advanced?order=desc&sort=relevance&site=stackoverflow".
'" | python -c "'.
"exec(\\\"import sys \\nimport json\\nprint(json.loads(''.join(sys.stdin.readlines()))['items'][0]['question_id'])\\\")".
'"`; curl -s --compressed "'.
"${a}questions\/$q\/answers?order=desc&sort=votes&site=stackoverflow&filter=withbody".
'" | python -c "'.
"exec(\\\"import sys\\nimport json\\nprint(json.loads(''.join(sys.stdin.readlines()))['items'][0]['body']).encode('utf8')\\\")".
'"'

This is not much easier to read, but at least we can start working on figuring out what is actually being executed by the shell. The 'submatch(1)' and '.' characters are the only things here that are relevant to Vim. The 'submatch(1)' represents the first backreference from the first part of our regex.

a=
https://api.stackexchange.com/2.2/
; q=`curl -s -G --data-urlencode "q=
<whatever the match was>
" --compressed "
${a}search/advanced?order=desc&sort=relevance&site=stackoverflow
" | python -c "
exec(\"import sys \nimport json\nprint(json.loads(''.join(sys.stdin.readlines()))['items'][0]['question_id'])\")
"`; curl -s --compressed "
${a}questions/$q/answers?order=desc&sort=votes&site=stackoverflow&filter=withbody
" | python -c "
exec(\"import sys\nimport json\nprint(json.loads(''.join(sys.stdin.readlines()))['items'][0]['body']).encode('utf8')\")
"

The above lines are the literal string information that would be concatenated together from the '.' characters in the Vim 'system' function call. Put it together and you get:

a=https://api.stackexchange.com/2.2/; q=`curl -s -G --data-urlencode "q=<whatever the match was>" --compressed "${a}search/advanced?order=desc&sort=relevance&site=stackoverflow" | python -c "exec(\"import sys \nimport json\nprint(json.loads(''.join(sys.stdin.readlines()))['items'][0]['question_id'])\")"`; curl -s --compressed "${a}questions/$q/answers?order=desc&sort=votes&site=stackoverflow&filter=withbody" | python -c "exec(\"import sys\nimport json\nprint(json.loads(''.join(sys.stdin.readlines()))['items'][0]['body']).encode('utf8')\")"

The above is just a simple shell program which can be re-written as:

#  Create a variable "a"; with this URL to avoid repetition
a=https://api.stackexchange.com/2.2/;

q=`curl -s -G --data-urlencode "q=<whatever the match was>" --compressed "${a}search/advanced?order=desc&sort=relevance&site=stackoverflow" | python -c "exec(\"import sys \nimport json\nprint(json.loads(''.join(sys.stdin.readlines()))['items'][0]['question_id'])\")"`;

curl -s --compressed "${a}questions/$q/answers?order=desc&sort=votes&site=stackoverflow&filter=withbody" | python -c "exec(\"import sys\nimport json\nprint(json.loads(''.join(sys.stdin.readlines()))['items'][0]['body']).encode('utf8')\")"

Let's look at each line above in even more detail:

a=https://api.stackexchange.com/2.2/;

The above line just creates a shell variable 'a' and assigns the value "https://api.stackexchange.com/2.2/" to it. The next line is of this general form:

q=`...`

This is another variable assignment, but the '...' stuff inside the backticks (the ` characters), gets evaluated as shell code and that result is assigned to the variable 'q'. The stuff that gets evaluated is:

curl -s -G --data-urlencode "q=<whatever the match was>" --compressed "${a}search/advanced?order=desc&sort=relevance&site=stackoverflow" | python -c "exec(\"import sys \nimport json\nprint(json.loads(''.join(sys.stdin.readlines()))['items'][0]['question_id'])\")"

Let's review each part of this command:

#  Use the program 'curl' used for doing HTTP requests on the command line
curl

#  Use the silent flag to avoid outputting progress and other output we don't want
-s

#  This option causes the query data we specified using --data-urlencode to be added to the request as an HTTP GET parameter, instead of as POST.
-G

#  This url encodes the query data for us, so that special characters like (?, &, etc.) don't cause problems when making the request
--data-urlencode "q=<whatever the match was>"

#  The response data from StackOverflow seems to be compressed, so I added this option to explicitly decompress it
--compressed

#  This is the URL we're sending the HTTP request to.  Note that the ${a} is referencing the 'a' variable in the first line.
"${a}search/advanced?order=desc&sort=relevance&site=stackoverflow"

#  This is our friendly old pipe symbol that will take the standard output of the previous command, and pass it as standard input to the next command
|

#  Start python, and run the following literal python program specified with the '-c' flag:
python -c "exec(\"import sys \nimport json\nprint(json.loads(''.join(sys.stdin.readlines()))['items'][0]['question_id'])\")"

Let's re-write the small python program so it is a bit easier to read:

import sys
import json
print(json.loads(''.join(sys.stdin.readlines()))['items'][0]['question_id'])

The last line just reads in lines from standard input, concatenates them together, json decodes them, and prints the 'question_id' member of the first array member in the 'items' object. This mirrors the structure of the json object returned from the prior curl request to the Stack Overflow API:

{
   "items":[
      {
         ...
         "question_id": 1234,
         ...
      },
      ...
   ],
   ...
}

Now that we have the question id stored in the variable 'q', we can use the same technique to obtain the body of the first answer:

curl -s --compressed "${a}questions/$q/answers?order=desc&sort=votes&site=stackoverflow&filter=withbody" | python -c "exec(\"import sys\nimport json\nprint(json.loads(''.join(sys.stdin.readlines()))['items'][0]['body']).encode('utf8')\")"

This effectively the same as the first call to the API, but the curl command is slightly simpler because the integer id returned from the previous call should not need url escaping. There is also an extra call in the Python program to encode the output as UTF-8 before displaying it.

But Does It Sort The List?

If you're an astute observer, you'll point out that this obtains the code from Stack Overflow, but it does not run samples until your list is sorted. At this point, I'll get a bit more hand wavy by saying that since you can execute any shell command you want from vim you should be able use the '|' symbol to append a command to save the file, and then add another external command to run your favourite build system:

.s/search/replace/|w|!make

To check that the list is sorted, you could throw a while loop in there, and pipe your resulting list into ...uh... 'sort' to check that it is properly sorted.

You may also want to use different questions or answers, and one technique you could use to achieve this easily, is to indicate the index of the question/answer on the line of pseudocode, while adding an appropriate submatch in your regex:

.s/\([0-9]*\)\([0-9]\+\)/...q=submatch(1)..."['items'][".submatch(2)."]['question_id']".../

sort a list 3

You could then use the 'submatch(...)' again to control the index of the JSON object that is printed in the small Python programs.

Finally, there is the problem of having HTML embedded in the code you're trying to run. Just throw it into grep or sed. It'll be fine.

Conclusion

The StackSort example was an interesting toy example, but I would like to re-emphasize how useful this technique can be for performing arbitrary operations on subexpressions that can be regex matched. This lets you take advantage of all the different built-in shell commands for doing operations on text, but you could even go as far as writing your own string processing utility for an ad-hoc job, and simply calling it from vim against your regex:

%s/\(.*\)/\=system('echo -n "'.submatch(1).'" | ./my-custom-string-transformer')/g

Using A Piece Of Paper As A Display Terminal - ed Vs. vim Published 2020-10-05	$1.00 CAD Terminal Block Mining Simulation Game	Use Vim Inside A Unix Pipe Like Sed Or AWK Published 2016-04-05	Why 'sudo vim' Could Hurt Your Productivity Published 2016-08-30
Can You Use 'ed' As A Drop-in Replacement For vim, grep & sed? Published 2020-10-15	Undefined Behaviour With Grep -E Published 2020-10-01	The Regular Expression Visualizer, Simulator & Cross-Compiler Tool Published 2020-07-09	A Surprisingly Common Mistake Involving Wildcards & The Find Command Published 2020-01-21

Why Bother Subscribing?

Free Software/Engineering Content. I publish all of my educational content publicly for free so everybody can make use of it. Why bother signing up for a paid 'course', when you can just sign up for this email list?
Read about cool new products that I'm building. How do I make money? Glad you asked! You'll get some emails with examples of things that I sell. You might even get some business ideas of your own :)
People actually like this email list. I know that sounds crazy, because who actually subscribes to email lists these days, right? Well, some do, and if you end up not liking it, I give you permission to unsubscribe and mark it as spam.