crvs' dumb XML blog

Home RSS

So, about KD-trees

A couple of days ago I have updated my KD-tree library adding a query for K-nearest neighbors. This had been raised as an issue in the repository for a while back, and since I had noticed the issue and added a development branch for the feature I had been conducting low-intensity warfare with a bug that meant that while most of the K-nearest neighbours returned by the query were correct, some were not (so, a partial success, which is the worst kind of success).

The problem was actually pretty simple, as these things usually tend to go but it still went unnoticed for a while before I was able to actually find it.

KD-trees derive their efficiency from placing a set of points in a tree structure, where the nearest-neighbor query is able to stop the search down a particular branch if it notices that no improvement in the best distance estimate can be attained further down the branch. This is done by guaranteeing that at every node at depth d of the tree the d-th coordinate of all the points in the left branch of that node is smaller than that of the node, and conversely larger for those points on the right branch of the node. At query time this is used by checking if the distance along the d-th axis is larger than the current best.

To turn this into a K-nearest neighbours query simply means that now, during the query we keep track of a set of k answers, and rather than cutting off when the best distance cannot be improved, we cut off when the k-th best distance cannot be improved. The catch is that this led to still stopping the search too soon, as if promising branches are encountered too soon in the search procedure will mean that we exit too early for that branch, and so we also must make sure that we have already filled our answer buffer with k nearest neighbours.

This wasn't exactly a revelation, but it felt interesting enough to just drop a post. So the repository now has a version 2 without ever having a version 1 because I never expected to have versions in the first place, but changing the API (even though it was only an extension) seems like a good-enough reason to introduce a version number (also partly because there were pretty dramatic refactorings).


This is a test rss post

Today I am testing whether I can serve RSS using xml + xslt, at the behest of what I believe to be the totality of the audience of this blog (hi Carl!). The answer to this is, of course, "it depends", namely on what we mean by using XML + XSLT.

So it is with a sad realization that for the sake of the convenience of having a single blog file, This blog will stop being rendered directly via XSL in your browser and it will from now on be processed into HTML which will be then served directly. So fewer moving parts for your browser, more work for me (but not much more, because let's face it, I don't post that often. As a further advantage, I no longer break the blog if I save it while editing with a poorly closed tag.

The way it will be done is fairly straightforward simply using xmlstarlet:

$ xml tr render_html.xsl blog_data.xml > index.html
$ xml tr render_rss.xsl blog_data.xml > rss.xml
                

To catch those unaware the blog.xml file can be replaced by an HTML file that simply redirects here.


DRUGS

Last week was not a warm week, last week was a hot week. We're talking upwards of 30 degrees Celsius (that's 86 F) in Stockholm which is nothing to scoff at, it's warmest-day-of-the-year type temperatures even before July, so it's serious stuff!

This warmth coupled with allergy season being well under way as new waves of pollen are spewed out into the air, led me to be, let's say not-at-my-top-performance for the time. Thus to combat this chemical onslaught I was forced to resort to DRUGS... my drug of choice for this is a pretty mild anti-allergy pill called ebastin, which is good at both curing my not-so-debilitating allergy and not making me a useless zombie (which I hear is what more hardcore anti-histamine pills tend to do). However after three days of spending whe whole day groggy and still reeling over the effects of my non-debilitating allergic condition which would render me into a fell mood by the end of the day, I decided to bite the bullet and have a second pill-of-the-day. Upon inspecting the pill box however I noticed just how wrong I was... I managed to confuse the allergy pills with sleeping pills... Fortunately though, it was only melatonin so it just made it very easy to sleep, rather than knock me out completely.


Speaking of laziness...

Well a few months (1 year?) ago I came across emmet. An editor extension largely geared to people who edit a lot of HTML by hand (which generally has not been the case for me), and by extension can be used for XML if you are careful to not do the things that will break your html files (like leaving your tags randomly open).

That has largely made html much much more convenient to write, so I highly recommend it (if you're using VIM you can find it in github courtesy of mattn.

Another thing that is worth asking about is all these timestamps (you know, the ones down at the bottom of each post). Those can be annoying to type (believe me...) yes that means they are typed, this is a one-file-blog not a wordpress installation afterall...

So how do I actually insert them? easy! just define the following abbreviations in my .vimrc:

:iabbrev thedate <c-r>strftime("%Y-%m-%d")<cr>
:iabbrev thetime <c-r>strftime("%H:%M %z")<cr>

And that gives me all the timestamps I may ever need or desire.


Autocomplete my life

Finally decided to make my script-life a bit more convenient with some completions, namely for work I often want to run a bunch of different examples of some tool I am working on and it is a pain to always remember what I need to put in the command line in order to run the example. This has largely been solved by shell scripting, right? Just stick it in a file called "example_1.sh" or something and run that. However, this means that in order to look for which example I actually want to run I need to be very very careful with how I name my files, and eventually this is not going to be the case (know thyself, amirite?).

The solution is obvious really, just have a configuration file in json, and use a shell script to mine that file using jq and you're good to go! For this your file should essentially be a list of configurations each of which with a name field that specifies the name, maybe a comment field, and your script can list the names and comments to help you decide which configuration to run, right?

But what if I'm really lazy?

Well you just copy paste this thing and adapt it to your needs:

_complete_runner () {
    if [!-f"./runconfig.json"]||[!-f"./runner"]; then
        return 0
    fi
    latest="${COMP_WORDS[$COMP_CWORD]}"
    prev="${COMP_WORDS[$COMP_CWORD - 1]}"
    words=""
    case "${prev}" in
        ./runner)
            # this is the list of flags that the script can take
            words="-c -l -h -d -p -n"
            ;;
        -c)
            # this gets the name of every configuration in runconfig with `jq`
            # and uses `tr` to make it a space separated list
            words=$(jq '.[].name' -r "./runconfig.json" | tr '\n' ' ')
            ;;
        *)
            ;;
    esac
    COMPREPLY=($(compgen -W "$words" -- $latest))
    return 0
}

# this associates the function `_complete_runner` to the command `./runner`
complete -F _complete_runner ./runner

Remember that to get more information on this you can just run help complete and help compgen but crucially what this is doing is creating a function _complete_runner which provides a list of completions to the current word. Once this is sourced into your environment you can finally stop yourself from accidentally mistyping the configuration name over and over again.

A more comprehensive explanation is available in this blogpost (archived here).


XML'ing the blog

A blog, being a website needs to be somewhere on the internet (preferably a server) and that can often be problematic for many reasons, not the list of which being how the hell one goes on to serve it and not (have the server) be hacked in the meantime.

Enter XML, the eXtendable Markup Language, HTML as you might be familiar is just a dialect of XML, which means that one can seamlessly include HTML markup inside an XML file... Where in the XML file? Well wherever you want, just stay clear of special characters and using Entitites which aren't just &amp; &lt; &gt; (i.e. the most basic ones) and so on, because those don't show up in the XML standard, and you're good to go.

To see how this blog is structured for example all you need to do is check the source code, it consists literally of a single XML file together with an XSL for structuring the content of the XML file and a CSS for styling, with those components in place and properly linked to each other (i.e. the XML being headed by the <?xml-stylesheet ?> declaration and the stylesheet rendering valid HTML out of the blog content) and you're good to go.


Scripting a vim session

To run a whole "vim session" on a file (or bunch of files), simply do:

$ ex file1 ... filen < scripted_sesion.ex

This command loads each file into its own buffer, and to alter them you can simply write your commands into scripted_session.ex, which is a file containing a bunch of vim commands, where you need to take care to have something like the structure below:

:set hidden          " let vim edit multiple buffers before saving changes
:bufdo %s/foo/bar/g  " do the action in all the buffers
:bufdo ...           " each action needs to be done in all buffers
...
:wa
:q

This requires loading each buffer in at the beginning of the session, and applies each command to each buffer rather than all commands at once, thus it may make somewhat inefficient use of memory, but generally it works.


Second post

Just like the first post I like to check that this is indeed working :).

And, as you can see this is one hundred percent usable with formatting and everything.


First post

This is my first post using my new xml rendering method!