Replies: 9 comments 6 replies
-
|
I had the same thought and spent ages creating a converter: I wrote custom javascript logic to convert sfz between the various formats:
To make sure it was accurate, I forked the sfz-test suite and ran the sfizz-parser to convert to xml and other formats: Also using the sfizz tests and other custom etxt files to check for edge-cases. Tests are here: What I learned:
What would be great, an official sfz parser for each major language: C, Python, Java, JavaScript Most sfz players could support sfz.xml files as most convert sfz to xml to be able to function. |
Beta Was this translation helpful? Give feedback.
-
|
Thanks for a really clear and complete case study! Lots to think about. I don't think continuing with XML is a good idea. I was a programmer in the 1990s when the XML craze went through the programming world, but it turned out to be a bust. I started at Google in 2004, and there was a lot of code that supported XML but none of the files were in XML. I asked someone who told me that they initially started with XML but they dropped it because of the very high error rates in editing, even with motivated and intelligent people like early Google product managers doing the work. There are numerous other issues with XML. The big one for me is that it has a different object model than JSON/YAML/TOML and other formats that can be represented as perfect subsets of those types, like CSV and .ini files, because there are four different types of containment: text within tags, tag-in-tag containment, attribute-in-tag containment, and And now, everyone knows JSON but mainly only Java programmers still write XML.
Using a list instead will work perfectly well, and is more natural. If something is named, it's a dict: if order is important, it's a list. I see that's exactly what you're doing with your converter. Aside: JSON itself is flawed for several reasons, not just that it isn't guaranteed to preserve order, but that there's no way to add comments, and because trailing commas are an error so editing large JSON files is syntax error prone. "You know what I mean, just ignore the stupid comma!" (me to several JSON parsers). Javascript, where JSON comes from, has both of these features. 🤡 It's a shame, because JSON has one property that neither SFZ, TOML nor YAML have, which is that no legal JSON file is a proper prefix of any other JSON file! (XML has that too.) So if you read a file and it parses as JSON, you know that the file has been completely written. |
Beta Was this translation helpful? Give feedback.
-
|
I am not a fan of xml, but for a format where ordering is required, it is the most compact whilst still being parse-able, line numbers by format:
JSON compact is close to yaml size, but the ordering can change when converting back-and-forth with sfz. Regarding best formats by style/preference I wrote this in the sfz Discord. There are different use-cases: 1) Non-technical user 2) Technical user 3) Computer scripts So in summary .sfz is targeting at use-case 1) and does a reasonably good job for non-technical users. Using a real format would take away some of it's usability for non-technical users. Perhaps option 1) should never have existed and there chould be a user-interface which non-technical users to generate 2) or 3) But we have inherited the decisions made with the original format, and I don't think they are inherently bad if we can mitigate by:
I think with these three features the sfz format and tools will be flexible enough for all the use-cases mentioned above! |
Beta Was this translation helpful? Give feedback.
-
|
Answers inline!
- sfz = 28
- json = 210
- yaml = 128
- xml = 28
https://sfzlab.github.io/sfz-website/converter/
With all due respect, you are optimizing for the wrong thing.
SFZ files are tiny compared with sample files. If you save 100 bytes, but
the project does not succeed, it is not a win.
Off the top of my head, if I had to list goals for a metadata format, I'd
say:
- correctness (does it correctly represent the problem domain?)
- reliability (for users editing files in the metadata format)
- interoperability (with other systems)
- expandability (how easy is it for other people to add features?)
- clarity and readability (but without that, the previous four fail!)
And if I were someone who was spending time on a project to do this, I'd
have another very high priority:
- attractiveness to new users
Compactness (if it doesn't contribute to the above) isn't really valuable
as a goal. If bytes count, you should compress the file anyway, and then
all the formats will have very similar sizes (because they have the same
entropy). It's much faster to unzip a file in memory than to load it
from disk or send it over the wire:
https://pesin.space/posts/2020-09-22-latencies/
...the ordering can change when converting back-and-forth with sfz.
Dictionaries are inherently unordered, so you use lists!
so a structure like this:
{
"whiskers": {"type": "cat", "color": "orange"},
"fido": {"type": "dog", "color": "brown"}
}
becomes
[
{"name": "whiskers", "type": "cat", "color": "orange"},
{"name": "fido", "type": "dog", "color": "brown"},
]
The second version is a little longer, but it's easier to
manipulate,because all the information is inside the record itself. In a
pure dictionary or the equivalent, you have to pass around the value *and* the
key.
I guarantee you that there's a provably perfect one-to-one
correspondence between a subset of JSON and SFZ.
Regarding best formats by style/preference I wrote this in the sfz Discord.
There are different use-cases:
*1) Non-technical user*
They don't formally know how to code, but can learn how to place strings
in a file in order to create a sampled instrucment.
This is what the sfz format is designed for.
Hear me out - your non-technical user might already know JSON, and will be
better served by teaching them JSON if they don't, than teaching them a
format which has no other use, and that they can only learn in one place.
If they don't understand JSON, there are a million tutorials. Their editor
will probably understand JSON. They can import the JSON into their Excel
spreadsheet, horrible but true, people do that.
You can see they took xml and removed the surrounding syntax. Which makes
it faster to type too.
Fast to type is another thing low on the list to optimize for. *Easy* *to
use* is much more important, as is "less to learn".
*2) Technical user*
They know how to code, so a full syntax is fine. But at the same time they
don't want to write hundreds of lines of code.
The difference between JSON and SFZ is not going to be hundreds of lines!
It's going to be perhaps a couple of hundred characters per project.
Even for people who "hunt and peck", their typing speed is almost never a
limiting factor when writing code.
People like fairly terse languages like Python because such languages are
clear and easy to understand. Languages like Lisp or Forth can express the
same code as Python in less than 2/3 the code, but are very often
considered hard to program in because they are hard to understand.
I am "the technical user", and believe me when I tell you that after having
learned likely over a hundred different data formats in over fifty years of
programming, I am highly disincentivized to learn another one, and "less
typing" is not going to help.
I am interested in generating several thousand sound fonts from materials I
already have to explore a set of variations procedurally.
If the SFZ output format had been JSON, I would have been done on Thursday
night, but as it is, I'm still trying to figure out someone's SFZ reading
library.
I have limited time to write computer music software, and I see learning
some other new format as time wasted on busywork and not on writing music.
So either a compact format such as xml
Imagine you're advertising your universal sample format to people - trying
to get them to use it. If you say, "Compact", they're going to be very
disappointed in the savings of hundreds of bytes, because they have
gigabytes of samples.
A project using your SFZ format, and a project with the same sample data
but a JSON-based sample format would be for every practical purpose the
same size. The size savings would be around 1% *of 1% - *the cost of the
extra RAM for a few hundred bytes is less than one thousandth of a penny.
Compact in that sense is not on people's radar.
-------
Young programmers today have no idea what XML is, and old programmers know
it and don't want it. The last time I got paid to write an XML file was
2003, quite likely I have not touched XML since.
I object to AI on multiple levels and never use it, but you can ask an AI
to write a JSON file for you based on a spec, and it apparently does a
pretty good job.
------
OR a compile-to language such as write shorter JavaScript using statements
and loops which then generates xml.
I implore you not to spend time creating yet another tiny, idiosyncratic
programming language, there are so many already!
If you must have some coding, use an existing tiny language!
https://en.wikipedia.org/wiki/Lua is probably the most popular.
People spend decades carefully creating and refining such languages. Profit
from their hard work and the endless bugs they had to fix to get to where
they are now.
…-----
Generally, metadata files should not have code in them at all: it is a path
to not just unreliability (because your toolchain doesn't recognize that
that code exists and cannot check it) , but security holes.
Code should be stored as code, where it can be approved,
checked, sandboxed. Metadata should *refer* to that code.
Allowing the contents of a data file to be injected into your system as
code, bypassing the usual checks for code, has been a repeated source of
successful attacks on systems for decades. (Come to think of it, a server I
ran fell prey to that, when someone uploaded a "font", hah! that had
malicious executable code into a WordPress installation. Tens of thousands
of spam emails passed through the box in a short time, it took me a day to
rebuild the server from scratch and then weeks before people stopped
thinking we were spammers.)
-----
I started computer music programming in the 1970s. By necessity, people
made up their own formats in the early days, I can still name some of them,
MUSIC V, CMusic, POD, my own CLOD. A few survived, notably HMSL and CMusic,
because a fairly small number of serious composers invested a great deal of
work in their systems (and got excellent results before anyone else could).
But that was a long time ago. Now we have data formats that have been
stable for decades. Browser developers always had the choice between XML
and JSON to send and receive data, and overwhelmingly they chose
JSON because it was easier.
You want these developers to want to use SFZ and to love it.
--------
I want SFZ to succeed; I want a flood of universal sound fonts, and support
for sound fonts; I want to write tools to create and manipulate such sound
fonts.
For this to happen, SFZ needs to dramatically lower its barrier to entry by
using standard file formats.
There are almost 30 million professional computer programmers out there in
the world, and easily 90% of them already know JSON. If they could easily
put together sound fonts with their existing editors, perhaps 1% of these
people would be interested, that's 300k. And that's ignoring hobbyists who
drive a lot of uptake, and again, mostly know JSON.
It would be better for the SFZ team to use a format that these hundreds of
thousands of potential users already knew, than for all of those people to
learn this specific format *and* not be able to use their toolchains.
And I think they mostly just won't do it. People are lazy, I certainly am,
and there is only so much time in the day. Show them a JSON file they can
edit, and they are on firm ground.
-----
Sorry for the long rant. :-D
Have a good weekend!
|
Beta Was this translation helpful? Give feedback.
-
|
Love the suggestion. I've been creating my own SFZ parsing and modification scripts in Python over the past few months, but I simply don't want to share them because I don't want to maintain a project that I scoped exclusively for my own needs, especially since I had zero prior experience with SFZ. I guess I'm not the only one, given the current state of tooling around SFZ. Having all the tools and libraries available for working with JSON would be incredible for generating and modifying SFZ instruments with code rather than by hand, which scales poorly once your instrument files start to reach the feature set that's common today anyway. We could even validate JSON-based SFZ files against a JSON schema. @rec Which of the suggested formats would you prefer? I've been using TOML to describe custom metadata of VST plugin parameters and was pleasantly surprised by its simplicity and readability compared to JSON (which I used for the same purpose in another project). Also, it doesn’t have the dreaded trailing comma problem from JSON. Since TOML is just a subset of JSON, I was able to use JSON tools like validate-json with TOML files converted to JSON on the fly for validation - meaning you can still use JSON tools when working with TOML. |
Beta Was this translation helpful? Give feedback.
-
I am also a TOML user, and Python having this as part of the standard lib has helped in its adoption. Also, I like the fact that you can automatically edit TOML and keep the comments around. A++! I'm somewhat anti-Yaml because it has a lot of traps in it, one of which I fell into badly once (I stored executable items in people's Yaml configs by mistake, I only needed their names! I realized it at some point, and then realized that if anyone else had realized it, they could easily have attacked people's systems using my program....) I do this sort of thing a lot, come to think of it. What I always do myself is simply accept any of Yaml, Toml or JSON because in 2025, it's really no extra work to support all three (I even have a library to do it in Python, https://github.com/rec/fil) and then if nothing is specified, I write in TOML. (In fact, I believe that all JSON files are YAML files.) There are seemingly mature, usable and well-tested libraries for all three formats and at least Python, C++ and Javascript. |
Beta Was this translation helpful? Give feedback.
-
In TOML this would be: TOML also has the advantage that you can read in a TOML file which contains comments, modify the data, store it, and keep the comments. This example: could be something like this: |
Beta Was this translation helpful? Give feedback.
-
|
Some of the rules the format would need to meet:
Trying out these rules, the files would be something like: example.sfz example.sfz.json example.sfz.xml example.sfz.yaml example.sfz.toml |
Beta Was this translation helpful? Give feedback.
-
|
On Sun, Oct 12, 2025 at 5:51 AM Kim T ***@***.***> wrote:
Custom compact json:
[
["region", [
["sample", "piano.wav"],
["pitch_keycenter", 60],
["lokey", 58],
["hikey", 62],
["lovel", 1],
["hivel", 20],
["locc64", 64],
["hicc64", 127]
]]
]
I agree that lists look bad there: but what about:
[
{
"type": "region",
"sample": "piano.wav",
"pitch_keycenter": 60,
"key": [58, 62],
"vel": [1, 20],
"cc64": [64, 12]]
}
]
Replacing ["lovel", 1], ["hivel", 20] with "vel": [1, 20] is not just more
compact and for me at least, easier to read ("what's a lovel?") but it's
easier to code for - you have "generic" code that accepts a min/max pair.
…--
/t
PGP Key: ***@***.***
*https://tom.ritchford.com <https://tom.ritchford.com>*
*https://tom.swirly.com <https://tom.swirly.com>*
|
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Idea: support standard file formats
SFZ is a tremendous idea, full of musical potential, but, at least from a long-time computer music developer coming to this project for the first time, it is hampered by the fact that the data files are in a non-standard format.
.json, .yaml, and .toml are all represent essentially the same thing: associative arrays ("dicts" or "objects") and lists combining with primitives like strings and numbers. There are a huge quantity of free, open-source tools and paid tools to manipulate these "dict" languages.
But to get involved with writing an SFZ file, a new developer needs to learn a whole new file format, one which will only be useful for this one solution within one problem domain, and one which will not interoperate with their existing toolchain.
And it's a pity, because in my admittedly fairly limited time looking at the format, I didn't see any feature that SFZ files had that wouldn't easily be supported by any of the dict languages.
How to do it
It could be done in a completely backward compatible way, in a step at a time, without breakage. Here's a very high level sketch.
First, define a one-to-one, invertible correspondence from the existing format to a dict format. I could do that if there were interest. 🙂
Then write a back-and-forth convertor once for each language (Python, JS, etc). I could do that for Python.
Initially, existing applications could just convert files from the dict languages into SFZ files on the fly and read those, but eventually they would read standard data files natively.
Nothing would break or have to be any different about existing code or files - eventually, all applications would simply accept more input file format choices.
Thanks for reading.
I understand that this is a rather radical idea, but having a known, boring old file storage format that all existing tools can manipulate will, I believe, dramatically advance this amazing but not yet fully accepted idea of a universal sample format.
Thanks for all your hard work!
Beta Was this translation helpful? Give feedback.
All reactions