Thoughts on simplified serialization formats

Lately I have been using two rather simplified serialization formats for a number of projects I am working on; JSON and YAML. With a fair amount of experience using both, I can now tell you about where I think each format has its place. You probably don't care to read a long-winded post of why's and why-not's, so let's just cut straight to the meat and potatoes.

JSON

  • Very rigorous standard
  • Compact notation does not differ from "pretty printed" syntactically
  • Supports string, integer, boolean, list, and map
  • Easily identifiable and understandable
  • Almost no exceptions to fundamental rules of the language

YAML

  • 3 distinct versions in active use
  • Supports compact inline and "pretty" notations, with differences
  • Supports primitive data types as well as some advanced types, like objects and pointers
  • Very appealing to the human eye

On differences between YAML parsers

YAML with its multiple and fundamentally different spec versions is not always predictable unless the software you are using can certify that it absolutely implements some version of the spec. There are a multitude of YAML parsers out there, some that fully implement the spec, others that lack certain features, and some that are written by people who have never read the spec to begin with. The general statement you will see on the project page of any YAML parser will be something to the tune of "Mostly implements spec version X".

On JSON portability

I've said this many times in the past, but its worth saying again; JSON is JSON is JSON. Almost (all?) JSON parsers will be able to read some data that another parser emitted. I can serialize something in Ruby, load it in PHP using one parser, dump it using another parser to a library that gets read by Python, which serves it to an AJAX client which natively understands it, and at ALL steps data types are preserved and there is no question or interpretation on why or how.

On YAML portability

It is harder to describe the state of portability in YAML. Rather than bash on it, I'll try to explain some things that are not obvious and why that makes it hard to implement YAML, contrary to the claim on their home page.

The first issue that probably comes to anyone's mind who has dealt with YAML using multiple parsers is indentation. Since YAML cuts down significantly on punctuation by counting white space, all parsers need to conform very strictly to output format for any of the YAML magic to work. Sometimes you will see YAML indented by four spaces, sometimes by eight (gag), sometimes by just one or two, and all are valid - as long as you consistently use the same spacing everywhere. This is actually not a bad thing. I like this about YAML, and Python as well, as I don't think it is unreasonable to demand consistency, especially in a data serialization language.

Another difficult thing is that much like the English language, YAML has made some exceptions to its own rules, which makes it harder to understand from an implementation perspective. YAML requires that you indent everything to indicate hierarchy and structure, EXCEPT when you are talking about a mapped list, which supports compact inline notation.

True YAML parsers should support objects and pointers, but the fact of the matter is, many of them don't, and are intended to be used purely as a way of relating primitive data types to one another. If some parser does not implement a more complicated function of the YAML specification, then said parser cannot truthfully claim that it is compliant with that specification. This reminds me of an old slogan, "Almost only counts in horseshoes and hand grenades".

On YAML pointers

One of the reasons I originally looked at YAML a while back was for this concept of "symbolic linking" or "pointers". At first glance, the concept of pointing to pieces of data by reference seems pretty awesome if you are dealing with a large document that might contain some repetitive or similar information.

But taking another, closer look, what are you really gaining? At which point will you pass some data through a parser that will emit YAML containing referenced data? Does the parser decide what data is duplicate and how the pointers will work? Are you really going to use some advanced parser functions to generate pointers in your serialized data? Probably not. Furthermore, once deserialized, what does that data look like? It looks like a flat array, because the code that is reading it likely does not care that it was a pointer or reference, it only cares what the absolute value is.

It is hard to "round-trip" YAML pointers. If you serialize something with pointers, when you deserialize it, and then serialize those results, is your pointer still there? Or is your document now just a large, standard-looking mapping of your data?

Random Thoughts on obviousness

  • When I look at a piece of JSON, it is very clear to me, visually, without inferring anything, what data type any member of the object is.

  • It's nice to be able to comment a file to explain why a certain value is set to something. YAML provides this, but in JSON there is no native way to accomplish it.

  • Bare words, in my opinion, cause confusion more than they provide convenience in this context. For example, is it obvious that 1 is an integer but 1.0.1 is a string? Because it is definitely obvious that 1 is an integer and "1.0.1" is a string.

  • It is possible to declare in a YAML document that the data contained within it is serialized in some particular version of the specification. However, many parsers ignore it, or even throw errors if they encounter YAML with front-matter (!!)

A quick word on performance

While you might read that JSON is faster than YAML, this isn't necessarily true. Performance metrics for this kind of thing are very subjective in that most parsers out there are community implemented, meaning one YAML parser is probably faster than one of the JSON parsers, and vice versa.

A few final thoughts

  • Anything outside of actual, usable data is valuable only to a human.

  • Unless you really need the advanced functionality that only YAML can provide, it would probably not be a bad decision to at least support JSON if not use it as your default until the current state of YAML parsers and their adoption and standardization improves.

  • Do the minimum. If you just want to map some data and pass it along, go back to basics.

  • Don't use any serialization language based on visual attractiveness unless you anticipate users constantly reading or writing these documents by hand.

  • JSON and YAML are both great languages. If you don't like YAML, it probably isn't because you don't like the language specification itself, but that you don't enjoy using the current breed of YAML parsers out there today.

 
slide.sh - Spend less time making slides

Over this past weekend, I wrote a lot of code. On Monday I was supposed to demo it in an organized way that would showcase the flow of the program I was working on. I see this done by others all the time with tools like PowerPoint, using lots of screen shots of running code alongside some text explaining at high-level what is going on to guide the speaker through his presentation.

I've done this myself many times before, but I almost always end up changing something between initially completing the task and presenting the slides, forcing me to re-run the programs by hand, take new screenshots, and import them into my slides. Wouldn't it be nice if the slides just ran my program while I was presenting? What if I didn't have to make all these slides in PowerPoint or OmniGraffle just to showcase some quick developer output in an organized, sequenced, flow-controlled way?

That's where slide.sh comes in. There's not really any magic to it, but it standardizes a practice I've used numerous times over the years by echo'ing formatted text, running some commands with pre-formatted output, and organizing thought by presenting things in a controlled fashion without throwing a huge blob of text up all at once.

Browse the source on Github


slide.sh

Spend less time making slides

""

What is it?

slide.sh is a small, basic, kludgy, hackish, ghetto slide maker that will execute entirely inside of your shell.

There are no graphics, no transitions or effects, no cool line drawing abilities, or anything like that. Seriously bro, its a shell script.

Purpose

What slide.sh provides is the ability to show simulated pages of ASCII text inside of your terminal. An advantage to this is that you, as some sort of programmer, won't have to take a bunch of screen caps or copy / paste text into some full-featured slide creation program to demo the core functionality of your executable program in a clean and organized way while notating certain things and controlling the flow of your presentation.

Features

  • Pre-formatted text will render exactly as it was produced
  • Text centering for writing titles, page markers, etc.
  • Slide pausing to help demonstrate multi-step processes
  • Slide separators (horizontal rule)

Uses

Some useful things I've done with slide.sh / can think of for it:

  • Demo REST API calls using cURL
  • Demo command-line tool functionality
  • Write some markdown-style slides for basic presentations

Requirements

  • A bash shell
  • tput

Example

The above demo was created from the following shell script, which just sources slide.sh and then makes some slides.

#!/bin/bash
. ./slide.sh

slide <<EOF
!!center
slide.sh
Spend less time making slides
https://github.com/ryanuber/slide.sh
EOF

slide <<EOF
By default, text appears exactly as you entered it.
EOF

slide <<EOF
!!center
Centering Text

In any of your slides, you can insert a line that reads '!!center'. This
will cause text in the following lines to be centered.

!!nocenter
You can use '!!nocenter' at any time to stop centering lines
EOF

slide <<EOF
!!center
You can use backticks or dollar-parenthesis to execute commands inside
of the slide, like this:

Today is \$(date +%A)

evaluates to...

Today is $(date +%A)
EOF

slide <<EOF
!!center
You can also use variables for repetetive information, like this:

The current working directory is \$PWD

evaluates to...

The current working directory is $PWD
EOF

slide <<EOF
!!center
You can press 'q' at any time to quit gracefully out of the slide deck
EOF

slide 'Check out this custom action message' <<EOF
!!center
You can pass a string argument to 'slide' to define a custom action message,
rather than the default 'next slide' message.
EOF

slide <<EOF
You can add pauses inside of each slide for demonstrating things. You can
advance slide rendering by pressing return. For example, in this slide, there
should be a pause immediately after this sentence.
!!pause
Did it pause? Cool! It should pause one more time following this sentence.
!!pause
You can use !!pause as many times as you'd like.
EOF

slide <<EOF
!!center
Separators

You can separate parts of a slide by using '!!sep' on a line of its own.
!!sep
It is useful for packing multiple thoughts or ideas into a single slide.
!!sep
You could also use it to create a separated header at the top of each slide.
You can add as many separators as you want.
EOF

slide 'Only one more slide to go! ->' <<EOF
!!center

Putting it all together

!!sep

This slide demonstrates all of the functionality working together.
!!pause


I'll show you the time from a few different places around the world.
!!pause
!!nocenter

!!sep
   California      $(TZ=America/Los_Angeles date)
   Panama          $(TZ=America/Panama date)
   Virgin Islands  $(TZ=America/Virgin date)
   Tahiti          $(TZ=Pacific/Tahiti date)
   Athens          $(TZ=Europe/Athens date)
!!sep

!!pause

!!center
Putting slides together is super-fast and easy!
EOF

slide 'End of slides - Press enter to quit' <<EOF
!!center
...And that's all you need to know!
EOF
 
veneer-swagger - Do something nice for your API documentation

One of my focuses on the veneer framework was documentation. I feel like it does a pretty decent job on its own of making all of the information you would need about your endpoints available. What it doesn't do, however, is provide a user-friendly interface that makes you look like an API superstar. I'm not much of a UI designer. In fact, I'm pretty awful at it, so I didn't try to write my own.

Recently I was introduced to iodocs to help in documenting API's in an explorable, interactive fashion that would encourage integration. Since node.js was not already part of the technology stack in the veneer framework, I decided to look around a bit more and discovered swagger. Swagger fits nicely with veneer because it can consume JSON documentation exposed by your own RESTful web service, which the veneer framework already had! Granted, the default format of the veneer documentation doesn't quite match the Swagger 1.1 specification, but it was easy enough to kludge a decent little extension that would massage veneer documentation into a format that the swagger-ui could consume.

Browse the source on Github


""

What does it do?

veneer-swagger is an extension to the veneer framework that will massage the usual endpoint definitions into usable swagger documentation format. This means you can have web documentation, developer experimentation tooling, and example client code generation, without writing any extra code!

What is Swagger?

Swagger, per its project page at http://swagger.worldnik.com, is:

a specification and complete framework implementation for describing, producing,
consuming, and visualizing RESTful web services.

What you need to know is that it's a super-slick collection of HTML, CSS, and Javascript that creates an explorable REST API experience without much effort.

How do I use veneer-swagger?

A minimal implementation involves:

  • Downloading swagger-v1.php
  • Including swagger-v1.php in your code, as you would any other API endpoint
  • Installing swagger-ui on a webserver
  • Pointing swagger-ui at [your-api-url-here]/v1/swagger

If your API is publicly accessible, you could even "try before you buy" by visiting the online demo, changing the URL field to point to [your-api-url-here]/v1/swagger, and pressing the "Explore" button. Instant documentation!

Caveats

Since Swagger is run entirely in your browser, it is susceptible to the same origin policy. To work around this (and there are many ways), a few things you might do could include:

  • If you are using Apache, add a header to every request within an Apache directory tag:

    Header set Access-Control-Allow-Origin "*"

  • Within your endpoint code, set an access control header:

    $this->response->set_header('Access-Control-Allow-Origin: *');

  • Run swagger and your API code within the same domain