[–] 2ion link

Yes, most tarballs do not support random access (there are some metadata extensions that allow this). This makes large tarballs annoying to use on systems with slow disk I/O (even a hard disk may be too slow (to the degree of being annoying to work with)). This is by far my biggest gripe with the format. Certainly, smaller tarballs are a very handy format as long as you stay inside the Unixy world of computing – and as long as you keep looking out for the various incompatibilities between the different tar implementations.

reply

[–] textmode link

"... there are some metadata extensions that allow this)."

Where to find these extensions? Are they portable between Linux and BSD?

The 1998 dict project included a utility called "dictzip" for random access to the contents of gzip compressed files.

Dumb question: Is it possible to create a utility or even a hack that performs "random access" into tar archives?

Example use case: the user only wants to untar a small number of selected files from a large tarball such as a source tree. The user has tried both the "-T filelist" option and using memory file systems instead of hard disk drives.

reply

[–] pwg link

> Dumb question: Is it possible to create a utility or even a hack that performs "random access" into tar archives?

Yes. If the tar file is on a seek-able medium, just read the headers only and build an index to the offsets of the file contents from the header data.

Then use the offsets index to seek to just the item of interest and read out only it and nothing more.

Now, does such a utility already exist? The answer seems to be yes: https://github.com/devsnd/tarindexer

reply

[–] Hello71 link

afaik, only pixz does this; they store the index in xz.

reply

[–] barrkel link

A zip file is a concatenation of gzipped files. A .tar.gz is a gzip stream of concatenated files. Anything that could do random access into the contents of a zip file entry could do similar things with a tarball.

reply

[–] jahewson link

Not so simple. A bundle of streams is not the same as a stream of bundles.

reply

[–] sakuronto link

What about .gz.tar, in which all the files are gzipped, then tarballed? It seems like it would be a slightly fatter .zip file.

reply

[–] barrkel link

With a transparent random access overlay, the difference mostly disappears, reducing to whether the stream needs to be scanned or whether it's indexed, which is itself orthogonal - zip file directory at the end is redundant.

reply

[–] netheril96 link

So you mean at each "random access", you actually have to scan the whole .tar.gz file to find the location? For large tarballs, that will definitely hinder performance a lot. The difference does not disappear at all.

reply

[–] barrkel link

Apparently you have a comprehension problem.

reply

[–] netheril96 link

Then enlighten me.

reply

[–] gkfasdfasdf link

Zip files have a directory which lists the files in the archive and their offsets, etc. No such feature in a tar archive.

reply

[–] barrkel link

If we're talking about something that indexes gzip streams, it's not a leap for it to also index the inner tar.

reply

[–] Mikhail_Edoshin link

AFAIK a compressor like zip builds a dynamic running table of frequent byte sequences; the resulting archive is written in such a way that when you decompress it, you re-build the table in the process.

So if you concatenate files A, B, and C and then compress the result, then by the time the compressor starts compressing the data of C, it will have that table built from A and B. To extract C, you'll need to re-build the same table and thus you'll need first to decompress A and B.

In a zip file each entry is compressed individually; this gives random access, but worse compression rate, because the table is not re-used between files.

reply

[–] undefined link
[deleted]

reply

[–] pif link

> This makes large tarballs annoying to use on systems with slow disk I/O

Funny how tar was originally developed for tape drives!

reply

[–] coldacid link

Tape drives don't really support random access, though, which is reflected in the design of the tar format and its offspring. That is, in fact, the problem here, and why formats designed for random access instead of sequential access are far better for storing file systems for containers and VMs.

reply

[–] RX14 link

I'm pretty sure the article implies this is for user-facing applications where the user would manually extract it once to a place of their choosing then run it from there. I think you're missing the point of the whole article.

reply

[–] theamk link

But why would you want to extract if you can mount the file directly? For simple archives, extracting is fine. But for larger archives (like a compiler -- 1000 files or more), loop-mounting is much better than extracting:

- Does not slow down your backup by adding thousands of files

- No need to wait for initial file extraction

- You can quickly and easily verify integrity of the whole archive

And if you are using fuse, it does not require any special privileges either!

reply

[–] justincormack link

Mountable formats have the security issue that the kernel is not that great at protecting against hostile images in mount. On disk format fuzzing has not been common and there are definitely bugs.

reply

[–] theamk link

This is solved very nicely with fuse mounts (and fuse is surprisingly performant on the modern multicore systems)

reply

[–] solatic link

Do you not still pay a significant performance penalty by reverifying the container upon each application load? Especially considering that, if the container is signed, you need to verify the signature itself before trusting the container, and full signature verification - including checking whether the signature has been revoked - involves expensive network calls?

If your operational and security model really frowns on trusting your extraction cache, then perhaps a different workflow is more appropriate - download the container, verify the container, extract, bake the OS plus extracted apps into an image, sign the image, verify the image upon each boot and mount apps read-only. Then you don't need to re-verify anything upon each launch, instead trusting that your image creation process is routinely updating and re-verifying the software in your current images.

reply

[–] theamk link

Verification of a single file is much faster than walking entire tree, especially when there are lots of small files, for example when there is a compiler or large python project inside.

A simple example: my /usr/include is 33037 files, 356M uncompressed. On SSD with cold cache, it takes 6.7 sec to read each file individually, or 0.7 sec to checksum a single 356M archive, a 10x difference.

The difference in the backup time is even more dramatic -- the backup program has to call stat() either 33K times, or just once, a 3,330,000% improvement! The other filesystem tools (What takes all the space? What has changed in the last X hours? Please sync this directory elsewhere.) will have similarly high speed improvements.

So if I had a choice, I would love my dev environment to come in mountable form. Similarly, I don't understand why container runtimes (like docker) don't use loop mounts more -- it seems like many advantages and very few disadvantages.

As for signature verification -- I don't care about 3rd party signature and revocation, I just want to ensure that I am running the same code every time. There are many ways one can damage extraction cache, especially if it is owned by the same user as application (like the topicstarter post described) -- sysadmin errors (`sudo find / -name app-old -delete`), application errors (create cache file in bin dir), disk errors (silent corruption), transfer errors (one file did not get transferred to a new computer). Loop mounting makes disk errors easier to detect, and eliminates other classes of error entirely.

reply

[–] theamk link

I like simple archives, but can it be not tarballs? For the kinds of application described in this article, tarballs are pretty bad:

Either you extract it from scratch every time you run an app, taking a long time penalty...

... or you extract once to cache, and assume that nothing changes the cache. This is pretty bad from both operational and security perspective:

- backups have to walk through tens of thousands of files, thus becoming much slower

- a damaged disk or a malicious actor can change one file in the cache, making damage which is very hard to detect.

There are plenty of mountable container formats -- ISO, squashfs, even zip files -- which all provide much faster initial access, and much better security/reliability guarantees, especially with things like dm-verity.

reply

[–] infogulch link

Interesting I didn't know this existed. Is there a way to layer sqlar like docker images? (Besides just tarring them up I guess.)

I wonder if this could be implemented with the WAL/journal system. Make each layer immutably append to the previous layers to make restarting at any layer trivial. I'm not sure if there's such a way to hook into the journal directly like that though.

reply

[–] zaarn link

Should be doable with overlayfs (or similar) or alternatively some extensions to sqlar.

sqlar is after all only a table definition, if you don't need FUSE access or are willing to write your own, SQLite3 can go a long way of providing arbitrary neat functionality.

reply

[–] paulfitz link

How about sqlar as a container format? https://sqlite.org/sqlar.html A regular sqlite database file, with anything you like in it. Mountable as a file system with sqlarfs. Written by the sqlite guy.

reply

[–] nextos link

I found it quite easy to switch to linux from linux-libre.

However, they package IceCat instead of Firefox, and that's a much tougher one. Note IceCat is not very well maintained.

Nonetheless, there are a few third party repos from users with non-GNU-sanctioned software. I hope it becomes a bit like Emacs, where GNU Elpa coexists in harmony with MELPA.

reply

[–] davexunit link

I think eventually we'll have our own firefox package that sticks much closer to upstream and makes minimal branding/config changes. A lot of active community members want it.

reply

[–] nextos link

That would be ideal. Keep up the good work with Guix.

reply

[–] RX14 link

Well pretty much every Wifi card doesn't work in linux-libre, so that's the main thing. I'm sure I'd find a lot more that doesn't work if I tried linux-libre.

reply

[–] namibj link

That's due to regulation, combined with hardware manufacturers correctly choosing to load the firmware by the driver/host, instead of some on-board permanent storage. Note that there are reasonably performance 802.11n cards with non-reverse-engineered open source firmware. They iirc use the ath9k driver, and are the result of the manufacturer opening them up to both Linux and BSD kernel license compatible status. They are great for hacking and there are some with 5GHz support. You have to keep in mind that hacking the firmware might violate the RF spectrum laws, which is relevant as much as GDPR compliance: if they can excert legal pressure on you, and do more than send angry letters and call you in the middle of the night, you have to consider if those jurisdiction's laws forbid your doings.

TLDR: they exist, they are not expensive, they can't do 802.11ac or 802.11ad, hacking the kernel-license compatible source might violate FCC or similar regulations and could well be punished harsh in case someone complains about what you do and you'r behavior is provably non-spec-conformant.

Be careful, and choose your hardware wisely to not use binary blobs. Also I assume you use an old CPU, if you wanna go the linux-libre route. I have a system where I'm not sure yet which OS it will get, but I already (with help, and soldering) removed the Intel ME from the firmware, and might even physically remove the processor that would have executed this, or do this soft and just cut it from power or something.

reply

[–] mikegerwitz link

> Note IceCat is not very well maintained.

Its maintainer is working on upgrading to the latest ESR now. If anyone is interested in helping maintain IceCat, please e-mail maintainers@gnu.org.

reply

[–] matthewbauer link

NixOS is a pretty good alternative. There are definitely areas where GuixSD is better than NixOS but also lots more places where NixOS is a lot better than GuixSD.

reply

[–] weberc2 link

I would really like to hear from more people who've used NixOS in anger. We used the Nix package manager (for packinging our application and managing dependencies) in our organization for a while, and it seemed to create a lot of pain, so I'm wondering if we were using it poorly or if the Nix ecosystem just needs to mature.

reply

[–] jolmg link

What GNU politics are you referring to that makes you reconsider using guixsd?

EDIT: Also what's unsexy about GNU? I'm really curious.

reply

[–] tremon link

Its refusal to package firmware binaries, for one, even if that firmware is required to have a useful machine. I'm looking at AMD specifically here, where recent graphics cards (including APU's) don't even do text-mode without the firmware.

(edit: I understand the why of it, and even agree on principle, but it still prevents me from running linux-libre on most of my systems)

reply

[–] rekado link

While Linux-libre is the default for Guix there are no limitations in place that would keep you from using vanilla Linux. In fact, Guix makes it extremely easy to build custom packages, and that includes custom kernel packages.

You can augment the package collection that comes with Guix with a simple environment variable, so the insistence on software libre on the side of the project should not represent a technical hurdle.

reply

[–] Digital-Citizen link

I think you're probably stating this incorrectly. Are you sure you don't mean to say that Linux-libre objects to distributing nonfree firmware?

I'd guess that GNU Linux-libre project maintainers have no objection to distributing free software firmware as part of Linux-libre.

reply

[–] t0nt0n link

Next time you build or choose a system, consider one that can run free software.

I did, and it makes most things quite a bit easier.

Edit: I did after struggling with hw requiring nonfree blobs of different shapes and size for a couple of years. Currently I was lucky to get my hands on a system that I can run using linux-libre and the only component I have "extra" is a usb wifi card.

reply

[–] dragontamer link

> Next time you build or choose a system, consider one that can run free software.

The only workstation that boots with entirely free software is like, the Talos II PowerPC, with a minimum cost of $5000.

Everyone else requires a binary blob somewhere. Either a UEFI blob, BIOS blob, some kind of driver somewhere, or whatnot. Raspberry Pi, AMD, Intel, everybody.

And before the Talos II, I don't think an "Open PC" devoid of proprietary binary blobs even existed. At least, something that is reasonably modern (ie: 64-bit, decent security, decent support with modern OSes)

reply

[–] namibj link

What about pre-ME thinkpads, after replacing the wifi card with an ath9k/open source firmware one? Does the intel chipset graphic require a blob for simple framebuffer/textmode operation? Because I can't remember including any blobs in the libreboot I use there, and iirc I get output before a linux kernel is able to load device firmware.

It is 64bit, and runs pretty much anything from (from what I can tell, but not sure, due to CHMPXCHG16B) Windows 10, over FreeBSD to Android. Probably even something like QNX.

Yes, you might not call this reasonably modern, but according to the hard facts you listed as qualifiers for being reasonably modern, they tick off.

reply

[–] TylerE link

Pretty sure that has a BIOS

reply

[–] dragontamer link

Unless it is running the "coreboot BIOS" (which very few things are), then it has a binary "non-free" blob booting it up.

reply

[–] namibj link

I don't remember whether the video BIOS was extracted from the old binary or if it is the open-source replacement, but I'd tend towards the latter as I don't remember searching for the backup/dump of the original firmware.

And yes, it's running coreboot, and at least CLI/linux-framebuffer arch linux works. I didn't yet get to setting the rest of the system up, but considering I bought it specifically for high-security operation, as the ME can be physically removed without loosing more than the build-in Ethernet port, I'm not pressed to do it anytime soon.

Edit: I'm pretty sure I followed [0], which leads me to the new conclusion that I did use libreboot, a more strict version of coreboot (think coreboot=Archlinux, libreboot=GNU Guix), and had to fiddle with the question whether the open-source video bios would work. This confuses me a little, as I remembered buying an X61s, not an X60s, but from the fact that it booted after flashing, I deduce it had to be an X60.

[0]: https://libreboot.org/docs/hardware/#list-of-supported-think...

reply

[–] undefined link
[deleted]

reply

[–] pecg link

GNU stands for a philosophy of freedom, thus guixsd won't provide official repositories for installing proprietary software, some users don't like it, even though they might be interested in the technological approach of the system.

GNU utilities, are not only unsexy, they are bloated and messy, and prone to failure; the GNU implementations (coreutils: grep, cat, tail, etc) of standard UNIX tools are not done with simplicity in mind.

But hey, after all GNU is Not Unix. For those of us, who really appreciate the UNIX philosophy still have OpenBSD, which is the only light in a world of chaos, in my opinion.

reply

[–] jolmg link

> GNU utilities, are not only unsexy, they are bloated and messy, and prone to failure; the GNU implementations (coreutils: grep, cat, tail, etc) of standard UNIX tools are not done with simplicity in mind.

I've heard people say how GNU code is bloated and messy many times before, but never that they're prone to failure. I've never had any failure myself with any GNU code. Can you give some examples of failures you've experienced?

Also, I'm looking at the coreutils source right now, and it's not as messy as I was expecting. true.c is only a pageful with 80 lines, many of which are simply because of the license comment and the usage() function for --help. cat.c and tail.c also seem reasonably understandable. Biggest complaint I can make is that there's cases where spaces and tabs are mixed in the indentation, but I've long resigned myself to expect that in projects that have more than 1 major contributor.

I do, however, think that glibc and gcc are pretty messy. I tried looking for the definition of fopen() in openbsd's libc and found it in less than 30 seconds by grepping. I still haven't found glibc's. gcc seems to rely heavily on its own extensions, because I don't understand what's going on here:

    int
    main (int argc, char **argv)
    {
      toplev toplev (NULL, /* external_timer */
                     true /* init_signals */);

      return toplev.main (argc, argv);
    }
That looks like a function prototype in a function definition, but it seems to mean an assignment going by the next line. Then in toplev.c, we have:

    int
    toplev::main (int argc, char **argv)
    {
That looks like C++, but the file extension is ".c"...

You know what? Nevermind. Comparing the code for true.c and cat.c between glibc and openbsd's libc, I do rather like how clear openbsd is in its code. Damn. Sexy is a good word. Now I understand why people speak so well of it. I don't even need grep, the source file hierarchy is so clear. Looking back at GNU's true.c, I don't even understand half of what's going on there in those 80 lines, and it turns out that true.c is also the source for false.c, it just #include "true.c".

TL;DR I agree that GNU utilities are messy. I'm not sure of the bloated aspect, because I do like that utilities have internationalized documentation built-in, but that seems to be bloat by openbsd's standards. And I wouldn't know of them being prone to failure, because I never had one with them.

EDIT: Huh. I wanted to reply to Hello71, but there's no reply link under his post. Anyone know why? Anyway, yeah, I saw a comment in the file mentioning that over a line that referred to stdout. Can't check now because I'm away from the computer. I didn't really understand the reason though.

reply

[–] pasabagi link

I love your post. It's really nice to watch somebody go through an honest and curious appraisal of a position.

reply

[–] int0x80 link

It is c++. The file is .c but whatever. They use a lot of c++.

I agree with you however. Having worked with the code gnu relies a lot on macros & a lot of auto generated code. The code is a big mess, imposible to tackle if you dont spend a huge amount of time on it.

A lot of symbols are generated through #defines and pastting (X macros) so you cant grep shit for one.

reply

[–] voxadam link

That reminds me; I wonder how the uutils project[1] is doing. While I still haven't gotten around to giving Rust a shot I think their idea of reimplementing coreutils in the language has merit.

[1] https://github.com/uutils/coreutils

reply

[–] vthriller link

> I've heard people say how GNU code is bloated and messy many times before, but never that they're prone to failure.

Just have a look at changelog for coreutils [0]. Sure it's very long, especially if you're not following its releases, sure it's full of weird edge cases that you might've never encountered (I'm certainly way too lazy to go as far as to look for those rare bugs that I stumbled upon years and years ago but there definitely were some), but this, IMO, is a great illustration of how GNU (or, rather, GNU coreutils) code is "prone to failure"—mainly because it sometimes tries to do way too much.

[0] http://git.savannah.gnu.org/cgit/coreutils.git/plain/NEWS

reply

[–] maskros link

Since you mention 'true' ... I can't help but think of Rob Pike's diatribe https://twitter.com/rob_pike/status/966896123548872705

reply

[–] pecg link

The most "iconic" example of failure most be the shellshock bug in bash, though the time in which it was fixed should be applauded.

reply

[–] Hello71 link

speaking of true --help, did you know that GNU true can exit non-zero? the exact way is left as an exercise to the reader :)

(if you're actually trying it at home, remember that "true" is virtually always a builtin. AFAIK there is no legitimate way to have shell builtin true return non-zero. (overwriting the command doesn't count :P))

reply

[–] jolmg link

Huh. The reply link appeared.

Anyway, yeah, there's a commented line mentioning that:

    /* Note true(1) will return EXIT_FAILURE in the
       edge case where writes fail with GNU specific options.  */
    atexit (close_stdout);
Makes sense. If you:

    $ /bin/true --version >& -
it fails because it was not able to write what you asked it to stdout.

reply

[–] josefx link

By specification it should not print anything or fail in any way http://pubs.opengroup.org/onlinepubs/9699919799/utilities/tr... .

reply

[–] JdeBP link

You'll find that the FreeBSD and NetBSD codebases (for base) are similarly structured and similarly written.

reply

[–] jcoffland link

It's not currently cool to like Richard Stallman because he has opinions that run contrary to Silicon Valley.

reply

[–] astrodust link

It's because he's a dick, and not in the good way.

It's not about his opinions, it's about his ineffective and misguided leadership. Why is GNU still fighting the same battles from thirty years ago when new ones have emerged that they're not even paying attention to?

GNU is becoming the PETA of software, and it's not a good look.

reply

[–] rekado link

> GNU is becoming the PETA of software, and it's not a good look.

As a GNU hacker (and co-maintainer of GNU Guix) statements like this make me sad. It is very unfortunate that Richard Stallman's personality is casting a shadow on the GNU project, which was started by him but is really a loose connection of projects that share ideas that were outlined in the GNU Manifesto.

I see GNU Guix in the tradition of other GNU software like Emacs or the Hurd that aim to give users more power and to remove arbitrary limitations. Emacs is probably the epitome of a hackable system that lets the user shape the software according to their own needs to an extend that is extreme and rarely found in any other system.

The Hurd aims to allow regular users to do things that in traditional Unices requires super-user privileges. It aims to remove arbitrary obstacles to free users from the unhealthy power dynamics of the user/admin division.

Guix gives users powerful tools to manage their software environments without having to beg admins, and to easily package software variants without having to depend on professional distributors. At the same time no user can harm another user on shared systems. Guix gives users the ability to take advantage of software freedom, by making it really easy to hack on software in a user-controlled reliable system.

When seen from this perspective, the GNU system that individual software projects are contributing to is a collection of tools that liberate users from helplessness due to unnecessary restrictions. This common goal defines the modern GNU project these days, and I think it is very unfortunate to overlook this because of Richard Stallman and his quirks, his sometimes dictatorial style, or his harmful attitudes towards important social aspects of free software.

I appreciate Richard's past work immensely, but I do not consider him representative of the GNU project that I work on, nor do I think his leadership style is benefiting the project.

Give GNU a chance based on the project's merits and its goals. Long live Free Software --- copyleft and non-copyleft alike!

reply

[–] astrodust link

I've got some simple advice: Get rid of RMS. Get rid of him now.

The longer he's the figurehead of GNU, the longer he has any say in your projects, the longer he'll poison the well. This "joke" fiasco touched off a firestorm of commentary from people that are quite clear that he's been highly problematic for decades now.

You don't want someone toxic running GNU. Microsoft managed to shed their sweaty gorilla and look what's happened to them. They're not fully redeemed, but they stopped fighting and destroying.

Just as the early FSF cared not for tradition, for history, for the investment of time and energy on the part of others, they should not care today if they want to be a radical force for change. Keep that spirit. Tear down anything worth destroying because it gets in the way of what's right.

The important question, the only question, for an organization that promotes actual change is what can he do to improve things tomorrow.

Sadly we've lost Aaron Swartz, but that's the caliber of person you need today. Fearless, energetic, passionate, and fighting the right fights from the front lines. Aaron will be missed, but the FSF and GNU should be looking for, encouraging, motivating the next Aarons no matter what their background is.

reply

[–] jolmg link

They're fighting the same battle, because it's still on and they haven't won.

I'm still wishing for a world where all electronics hardware and software is open source. Can't really visualize an industry like that be economically functional, but I hope someone does. My hope is with GNU.

reply

[–] astrodust link

Imagine if we were still fighting battles from the 19th century, that Prussia was still exchanging musket fire with France.

That's what GNU is doing today with their stubborn fights about licensing when there's far bigger problems emerging.

How about a right to privacy? How about a right to timely patches for their Linux-based phones? How about a right to repair hardware running GPL software? How about a right to know if your device has security faults?

I can make software that mines the personal emails of dissidents, runs facial recognition on hacked webcams, and ruins lives, and that's all fine as far as GNU's concerned so long as I give out the source code to anyone who asks.

That seems...problematic.

reply

[–] JdeBP link

Richard Stallman on privacy, surveillance, corporate abuse of technology, and Android.

* https://www.theguardian.com/commentisfree/2018/apr/03/facebo...

* https://www.gnu.org/philosophy/surveillance-testimony.html

* https://www.gnu.org/philosophy/surveillance-vs-democracy.htm...

* https://www.gnu.org/philosophy/judge-internet-usage.html

* https://www.gnu.org/philosophy/the-danger-of-ebooks.html

* https://www.gnu.org/philosophy/stallmans-law.html

* https://www.gnu.org/philosophy/android-and-users-freedom.htm...

So your position is that GNU should both get rid of Richard Stallman and start addressing this stuff. Clearly, you are not basing this upon Richard Stallman addressing these very things for quite a few years now via the GNU WWW site.

* https://www.gnu.org/philosophy/essays-and-articles.html

reply

[–] astrodust link

He presents these in the most tin-foil hat manner possible and doesn't build a bridge to people trying to live ordinary lives.

Destroying your phone, not using a web browser, and eating vegan or whatever isn't something everyone can or should do.

If he wants to be some obscure mountain-top philosopher, that's fine, but being the GNU head at the same time is problematic.

reply

[–] t0nt0n link

What new battles have emerged that GNU and FSF are not paying attention to?

Also, GNU is not RMS, and RMS is not GNU.

reply

[–] pasabagi link

I know that RMS is not GNU, but the man is a raging egomaniac - and the way he talks takes credit for basically everything he's come into contact with. Unless I'm underestimating the bounds of possibility for one person's contributions, he uses 'I' in a lot of places it would be fair to say 'we'.

(Note, I came to this conclusion after reading about a bunch of his technical accomplishments, which I can see are awesome, even if the obvious megalomania evidently occasionally dampens their effects.

I think his work is fantastic, his politics are largely reasonable - but I think his self-obsession is often the driver behind a large amount of damaging and counterproductive behaviour.

Politics is the art of compromise - not convincing everybody you're a saint while alienating your natural allies.)

reply

[–] rauhl link

In fairness, he’s correct that Linux is better called GNU/Linux: a Linux kernel really is useless without the GNU userland.

I’m not a fan of him personally, and many of his technical decisions have been questionable, but he’s achieved a lot, and the world is better for the FSF’s existence.

reply

[–] pasabagi link

I'm not sure. Isn't Android proof of a sort that Linux is still worth something without GNU?

I wouldn't have any problem with the 'GNU/Linux' idea if it wasn't so obviously part of a greater pattern - when he talks about it, he talks about GNU being the primary contributor - but he typically uses the singular, even when the plural would refer to GNU, and the singular refers to himself.

I also think the world is better for the FSF, but I cant help but wonder, what would the world be like if the FSF was headed by somebody who felt it more natural to think in terms of 'we', as opposed to 'I'? Even somebody not nearly as technically accomplished, charismatic, and intelligent? I think ultimately, it's the ideas, of knowledge as the common wealth of humankind, rather than the curious personality of RMS, that gave the GNU project its power - and ultimately, it's the limitations of RMS that hold it back.

reply

[–] jcoffland link

> Isn't Android proof of a sort that Linux is still worth something without GNU?

Not at all. Android depends on tons of GNU software.

reply

[–] ChristianBundy link

The FSF is (or seems to be) an extension of RMS, and the same argument could be made for GNU[0].

[0]: https://lwn.net/SubscriberLink/753646/a6ebb50040c5862c/

reply

[–] astrodust link

While they were harassing Linksys about GPL the whole IoT thing happened and now we're living in a world full of trashy Linux-based devices that are a hazard to society. Sure, you can get the source code to your internet-based webcam, but because it can't be easily patched, it can also be hijacked by a couple of high-school kids in Alaska so they can sabotage their Minecraft server hosting competitors.

So good job.

As long as RMS is such a prominent figure the GNU/FSF organization there's no separation.

reply

[–] elago link

"the whole IoT thing" was ever Stallman's responsibility to stop in the first place?

he wants to champion free software, not every just cause under the sun.

IoT devices shipping with insecure configurations is a "failure" of an infosec champion/thought-leader to step up and save us.

On the free software front, RMS contribution are mind blowing to me. I'd be proud if I can ever contribute a fraction of what he did.

reply

[–] astrodust link

The contributions of the GNU team are considerable. RMS in particular? Eh.

The IoT thing was a perfect opportunity to step in, step up, and show some leadership. Billions of devices owned by tens or hundreds of millions of people, all running open-source software!

Instead we get this miserable hell because of his laser focus on licensing instead of responsible software.

Infosec, to their credit, were raising alarm bells from the beginning but nobody had to listen to them because they don't control anything.

GNU, however, does. If they'd extended GPL to include provisions for ensuring that the GPL software on it can be updated in a timely and secure manner, life would be a lot better for people.

reply

[–] mindB link

Isn't that exactly what v3 of the GPL does?

reply

[–] astrodust link

That just prevents the vendor from locking down the software. It doesn't force them to update it in a timely manner.

reply

[–] pxc link

And if the software weren't locked down, anyone (users, communities, other vendors) could step in to provide such updates. That's not some hypothetical, either— compare the rates of OS updates in projects like LineageOS to to the distributions of Android shipped with most phones. If vendors couldn't TiVo-ize, there would absolutely be communities and downstream vendors stepping in to provide devices with regular updates. Because the devices are locked down, that can't happen.

And what do you expect the FSF to do? Out-lobby consumer electronics manufacturers to pass laws requiring some kind of security update guarantee? Even if they succeeded, could we call the result empowerment? Getting out from under the thumb of the manufacturer and actually _owning_ the things you own is the point, not the theoretical promise of recourse if the party which practically retains all of their power over you can be proven in court to have misbehaved, only after the abuse has taken place.

This is absolutely the same fight, and if anything the approach you're arguing for is more conciliatory, not more ‘relevant’.

reply

[–] astrodust link

Theoretically being able to update your device and actually being able to update your device are two different things.

There's going to be a billion variants on every little IoT device in the future and all the best intentions and enthusiasm on the part of the free software community will not be enough to provide patches to all of them.

This is something that's the responsibility of the vendor, and the GNU software license could make that a requirement for using the software.

It's not about laws, it's about licensing. If they don't like the license they're free to use someone else's software.

Having inexpensive operating system software you can dump on a cheap device without license fees is both a great thing, and also what got us into this IoT hot mess.

reply

[–] jolmg link

Force them? It's a license. People use them to grant rights not give themselves obligations.

reply

[–] orangeshark link

Isn't that what GPL v3 is suppose to cover with what they call "tivoization"? They tried to get the Linux kernel to switch to the GPL v3 but that failed.

reply

[–] s73v3r_ link

How is that not purely, 1000% on the makers of the shitty webcams?

reply

[–] astrodust link

They're compelled to give out the source code, but apart from that they can do pretty much anything else they want with GNU's blessing.

reply

[–] ronsor link

Stallman is generally not a very... tactful person.

reply

[–] seba_dos1 link

That's right, it might even hurt his mission, but it doesn't make him less right.

reply

[–] astrodust link

Was he right about abortion jokes?

David Bowie made predictions far more profound than Stallman, and they came from a place of genuine concern, not tin-foil hattery of the GNU variety: https://www.theverge.com/2016/1/11/10753158/david-bowie-inte...

I'd rather have people that cared and were on the right path, picking the right battles, than assholes who are technically correct but their observations are ultimately irrelevant to the larger fight.

reply

[–] M_Bakhtiari link

I don't get the abortion joke debacle. It was blatantly pro-abortion, yet it seems like it's only the pro-abortion people that are upset about it.

reply

[–] JdeBP link

That's because you erroneously think that the conflict was about abortion. It was actually about whether user reference manuals should properly contain jokes about such highly politically charged topics.

reply

[–] undefined link
[deleted]

reply

[–] kiriakasis link

As an outside observer it looks like GNU is conducting an ideological battle that is decreasing in public relevance in the years and so, now, it looks like they are the one not being good neighbours.

reply

[–] Digital-Citizen link

It wouldn't have taken much more time for you to back your point with examples so we'd have some idea of what you're talking about. Please also explain the ideology of how proprietary software is not worth fighting with a practical implementation and ethical discussion.

Most of the time when people object to GNU or rms they fail to convey that they understand what software freedom is or how continually relevant software freedom is today. I'd bet that the majority of threads on these (overwhelmingly corporate) repeater sites are easily handled by stressing how important a user's software freedom is. Every DRM, proprietary software (Windows ignores user settings, this new device from $VENDOR spies on its users, etc.) is easily dismissed by getting into the same discussion about how software freedom would allow the user to alter the software, protect their privacy, treat their friends and neighbors better by sharing improved versions of the software, inspect and modify the software (or have someone they trust do it for them), and run the programs when they want (instead of losing access when a proprietor feels like ending "support"). Snowden readily credits free software for his success in leaking sensitive NSA documents to us all (docs which still make media stories years later). Three cheers for software freedom, rms, and Snowden!

Posts like the parent post tell me sites like these are the thing losing relevance by showing how ineffective public moderation is and how unacceptable it is to dare to say something not echoed in corporate tech media.

reply

[–] computerfriend link

I'm not sure how anyone can honestly think that the ideological battle is decreasing in public relevance. It is massively more relevant now.

reply

[–] jpeg_hero link

It starts with that weird ink drawing of a goat for a logo, it just screams “70’s green screen”

And this is coming from a genx open source / Linux guy. What it must look like to the current generation?!?!

reply

[–] keypress link

Isn't it a gnu? I like the logo/art.

reply

[–] jolmg link
[–] peterwwillis link

> Also what's unsexy about GNU?

"Gnu's Not Unix": A recursive acronym used as a pun about an operating system from the 1970s, existing solely as a reflection of an aging neckbearded hippie hacker's personal philosophy about software, that is pronounced "GUH-NEW".

reply

[–] jolmg link

I don't think it's only his philosophy. In fact, before, I would have thought that personal philosophy to be common sense, but it then turns out it isn't. It still bewilders me how it's the status quo that when you buy an expensive piece of electronics, it's never really yours to use as you please. It's more like the companies are lending it to you for a one-time payment. They keep full control. If they want to remove features[1] or brick the product you bought from them[2] or place arbitrary restrictions on features that require no work from them and then charge extra for lifting the restrictions[3], it's totally ok. How does that make sense? Yet it's the dystopia the industry has been turning into day by day, and it's all made possible because of closed source software.

[1] - https://www.techdirt.com/articles/20100331/0128358800.shtml

[2] - https://www.techdirt.com/articles/20150321/13350230396/while...

[3] - One example of this could be Amazon's ridiculous rental of digital books, since it can only work by downloading the file to your device and then charging you more for it to prevent your device from deleting it. Another example is YouTube Red, to be able to download videos the app already downloads for free anyway to be able to stream, and also so that it won't pause videos when you move the android app to the background.

reply

[–] rauhl link

I love that they took the NixOS idea and converted it from brackets to S-expressions, but I do wish that they’d used Common Lisp instead of Scheme. Had they gone with the former, I think that we’d be one step closer to computing’s ultimate goal of a Lisp machine on every desk …

reply

[–] rekado link

Guile Scheme is the GNU system's designated extension language. In GNU there are more applications that support Guile scripting/extensions than there are CL applications.

(I'm a Schemer and I'd love to have a Lisp machine user environment using Scheme.)

reply

[–] RX14 link

I really love the work the guix folk are doing. I'd love to run guixsd on my laptop if it was easy and supported to run plain upstream linux instead of linux-libre. It just seems like such a lovely easy to use project from the little time I've spent playing with it, it's actually a small shame they're part of the "unsexy" GNU project and subject to GNU politics.

reply

[–] rekado link

In Guix every package ends up in its own directory, which may have references to other packages in /gnu/store. An application bundle is really just a package closure, i.e. the directory for the package and all directories it references, recursively. One way to bundle up things is with `tar` (the default of `guix pack`), but Guix also supports other bundling targets, such as Docker. No special metadata files are required.

Relocation currently requires a little C wrapper, which uses Linux namespaces, as the blog post indicates.

If you want something more advanced, such as a bundle that includes an init and services, it's best to use `guix system`, which builds VM images among others.

reply

[–] tannhaeuser link

That article made me warm up to guix and its practical side. Are guix app bundles just bare tar archives with /usr/local prefix semantics or do they need special metadata files? How are compiled binaries with hardcoded and/or autoconf'd prefixes handled for relocation (I guess using Linux namespaces somehow)?

reply

[–] foob link

The packages that Exodus produces are actually quite similar to those introduced in this announcement. Both tools generate simple tarballs that can be extracted anywhere to relocate programs along with their dependencies, and both tools bootstrap the program execution using small statically compiled launchers written in C. They contrast guix pack against Snap, Flatpak, and Docker, but Exodus would probably make a more apt comparison in many ways.

reply

[–] civodul link

Interesting! The trick that Exodus uses (invoking ld-linux.so directly) is very smart. Perhaps an option to add to 'guix pack' in the future. :-)

reply

[–] chx link

For relocatable ELF binaries, there's also https://github.com/intoli/exodus

reply

[–] JdeBP link

It sounds like you are progressing along the same road that led Rahul Dhesi to invent the ZOO file format.

reply

[–] cyphar link

As far as I can tell the only thing ZOO has over tar archives is having a history of each file (using the VMS concepts of file versions) -- meaning that it probably still has some of the problems I outlined above. While that is useful, it is still not as good as it could be. Also, you don't really want file versions with container images, you want to have conceptual "layers" (which would be sort of like having versioned files but it's more like snapshot IDs -- or like ZFS's birth-times).

reply

[–] JdeBP link

One needs to give it more than a superficial glance. ZOO was designed to be randomly accessible, with the directory headers forming a linked list. It actually has an uncompressed index and can take advantage of seekable files. It also supports both long and short filenames; CRCs of the metadata structures (c.f. the recent kerfuffle about xz); and an extensible, versioned, header mechanism that not only could be extended but actually already once was extended to add the long filename support amongst other things.

reply

[–] cyphar link

Is there an actual paper or some high-level summary of the format -- not to mention a modern implementation? The only summary I could find was the one on Wikipedia. I also found the source code of "unzoo" but it's a bit difficult to understand the benefits of a file format if I first have to understand its implementation.

I didn't take a superficial glance out of laziness, it's because I couldn't find any more information about it. But I think you also missed that I mentioned that the style of versioning implemented in ZOO (as far as I can tell based on a Wikipedia page) is not the correct style for snapshot-like versioning.

reply

[–] Rapzid link

It sounds there is a lot of use cases that overlap with services provided by general file systems. I'd be curious to here your thoughts on that.

reply

[–] cyphar link

You're right that general-purpose filesystems have solved quite a few of the indexing problems already, unfortunately there are a few things stopping general filesystems on a loopback device from being practical (or safe, or the best idea):

* The container (file) for the filesystem must necessarily be larger than the metadata+data for the filesystem because filesystems really don't like almost-full disks. And unless I'm mistaken sparse files are not usable for loopback devices (so you can't hack your way out of it).

* Most filesystems don't have a snapshot-style history so you would have to pick a specific filesystem from that list (otherwise you'd be forced to make CoW duplicates of the filesystem to create snapshots -- which is interestingly how Docker does layered storage with devicemapper) which has slightly similar problems to layered tar archives.

* The kernel's filesystem parsers are not really considered to be safe against an adversary, from what I've been told by filesystem engineers. So mounting random loopback files with filesystems on them might end badly.

* There is no way of looking at the archive using a userspace tool (without mounting), unless you re-implement the kernel parser for the filesystem. To be fair, this is true for any format, but filesystems are far more complicated and harder-to-parse than most other formats.

* Having a single blob as your entire image history and so on will mean that you can no longer have content-addressable storage for your images without adding something like content-defined chunking on top (which is then another layer of storage on top of your underlying storage).

* Using a Linux filesystem would mean you couldn't use the filesystem on different operating systems very easily. Even if it was compatible on whatever other filesystem you are using, userspace has no way of being sure there isn't a bug in either side's parser -- and what happens if one side changes the on-disk format. If the protocol is in userspace then it can be handled there.

* Most filesystems don't let you remap users, so if you wanted to run a container in a user namespace you would need to either rewrite the filesystem structure or mount the filesystem and copy it to another filesystem. To be fair, tar archives require you to do the mapping on extraction which is a similar problem, but far less complicated.

* Everyone would be opinionated about what filesystem to use, which means that you'd have to deal with every filesystem people throw at you, making it harder to be interoperable and adding choices where they aren't necessary. It should be up to the user what filesystem they use for storage, not the image distributor.

Now, this hasn't stopped people from trying to use this. Singularity's internal format is a loopback file with a filesystem inside, and they have privileged suid binaries that mount it. And it does have genuine performance benefits, and if you don't want things like content-addressability then it can work for some usecases.

reply

[–] cyphar link

This is remarkably off-beat for the GNU project. Tar files are far from the most ideal tool for container images because they are sequential archives and thus extraction cannot be done using any parallelism (without adding an index and being in a seekable medium, see the rest of this comment). I should really write a blog post about this.

Another problem is that there is no way to just get the latest entry in a multi-layered image without scanning every layer sequentially (this can be made faster with a top-level index but I don't think anyone has implemented this yet -- I am working on it for umoci but nobody else will probably use it even if I implement it). This means you have to extract all of the archives.

Yet another problem is that if you have a layer which just includes a metadata change (like the mode of a file), then you have to include a full copy of the file into the archive (same goes for a single bit change in the file contents -- even if the file is 10GB in size). This balloons up the archive size needlessly due to restrictions in the tar format (no way of representing a metadata entry in a standard-complying way), and increases the effect of the previous problem I mentioned.

And all of the above ignores the fact that tar archives are not actually standardised (you have at least 3 "extension" formats -- GNU, PAX, and libarchive), and different implementations produce vastly different archive outputs and structures (causing problems with making them content-addressable). To be fair, this is a fairly solved problem at this point (though sparse archives are sort of unsolved) but it requires storing the metadata of the archive structure in addition to the archive.

Despite all of this Docker and OCI (and AppC) all use tar archives, so this isn't really a revolutionary blog post (it's sort of what everyone does, but nobody is really happy about it). In the OCI we are working on switching to a format that solves the above problems by having a history for each file (so the layering is implemented in the archiving layer rather than on top) and having an index where we store all of the files in the content-addressable storage layer. I believe we also will implement content-based-chunking for deduplication to allow us to handle minor changes in files without blowing up image sizes. These are things you cannot do in tar archives and are fundamentally limited.

I appreciate that tar is a very good tool (and we shouldn't reinvent good tools), but not wanting to improve the state-of-the-art over literal tape archives seems a bit too nostalgic to me. Especially when there are clear problems with the current format, with obvious ways of improving them.

reply

[–] catern link

It's not necessarily a good thing for the container to be able to specify locale. Locale should be picked up from the surrounding system; it's just that unfortunately the surrounding system is usually not configured correctly.

And entrypoints/wrappers are definitely possible from a tarball. Just wrap the executables in bin/, replacing them with shell script (or whatever) wrappers pointing to the real executables. That's what Nix/Guix do for languages like Python which require dependencies to be provided by environment variables (as they don't have a way to "close over" the locations of their dependencies).

reply

[–] oconnore link

> Locale should be picked up from the surrounding system; it's just that unfortunately the surrounding system is usually not configured correctly.

And around and around we go

reply

[–] sleepybrett link

Also docker containers are just tarballs of tarballs (one per layer)

reply

[–] geofft link

I realize the title is just a hook for the (very cool!) work in the article, but a couple things that tarballs don't/can't specify that Docker containers can:

- environment variables like locales. If your software expects to run with English sorting rules and UTF-8 character decoding, it shouldn't run with ASCII-value sorting and reject input bytes over 127.

- Entrypoints. If your application expects all commands to run within a wrapper, you can't enforce that from a tarball.

You can make conventions for both of these like "if /etc/default/locales exists, parse it for environment variables" and "if /entrypoint is executable, prepend it to all command lines", but then you have a convention on top of tarballs. (Which, to be fair, might be easier than OCI—I have no particular love for the OCI format—but the problem is harder than just "here are a bunch of files.")

reply

[–] matthewbauer link

Nix has a very similar tool called nix-bundle[1].

[1]: https://github.com/matthewbauer/nix-bundle

reply

[–] matthewbauer link

I think AppImage does this with SquashFS images currently.

reply

[–] tejtm link

to list what is in a tar ball `tar -vtf tarball.tar`

to extract a particular entity 'tar -vxf tarball.tar path_in tarball_to_entity`

edit: good points on it not being efficient for large archives, just demonstrating it is possible.

reply

[–] nerpderp83 link

A tar is a linked list of file paths and contents, it cannot be indexed to a particular file. A compressed tar has to first be decompressed and then the chain of links traversed. Accessing a file in compressed tar is o(n) with where the file is placed within the compressed tar stream.

It isn't that it is possible, it is that is horribly inefficient.

Zips on other hand unify storage and compression such that one has random access to particular file, hence most modern file formats are zips with xml or json inside.

reply

[–] discreditable link

The problem is that to know what files are in the tarball you have to read the whole thing. If the archive is large that's a lot of reading just to get a file list.

reply

[–] nerpderp83 link

Tarballs don't have a TOC and can't easily index into individual entities.

One could create a utility to make tarballs with a TOC and the ability to index while still remaining compatible with tar and gzip. Pigz is one step in the direction.

reply

[–] matthewbauer link

Gobolinux sort of does this. The main difference is GoboLinux uses “version numbers” while Nix & Guix use hashes. It makes a lot of difference for more complicated stuff.

reply

[–] digi_owl link

True.

I suspect there are ways to introduce hashes to Gobo, if one were so inclined. But so far nobody has.

reply

[–] TylerE link

Nitpick: A vanilla tarball is a concatenation, not a compression.

reply

[–] digi_owl link

A quick FYI, Gobolinux operates much the same way.

1. Binary packages are simply compressed archives (tarballs) of the relevant branch in the /Programs tree.

2. branches do not have to actually live inside the /Programs tree. There are tools available to move the branches in and out of /Programs.

All this because Gobolinux leverages symbolic links as much as possible.

reply

[–] sitkack link

See this older discussion on statically linking guile [0], one should be able to bake your source into a C program that statically links Guile 2.2 to create a self contained executable. If that is too cumbersome, I would use a container.

[0] https://lists.gnu.org/archive/html/bug-guile/2013-03/msg0000...

reply

[–] kuwze link

Does anyone know how this would apply, for example, to sharing a Guile 2.2 application with Debian/Red Hat based distributions? I want to use Guile 2.2 for development, but I am worried because it was only recently was released for major distros (at least with Ubuntu I know it was released with 18.04) and it doesn't seem to support the creation of executables.

reply

[–] peterwwillis link

Or one that can list/extract files without reading the entire archive, or one that can use binary diffs, or one that supports encryption, or one that supports long file names, or one that isn't hamstrung by different implementations of different standards on different platforms, or one that doesn't use 512 byte blocks, or one that is actually usable on modern operating systems, ....

reply

[–] sitkack link
[–] peterwwillis link

> This program (named "sqlar") operates much like "zip", except that the compressed archive it builds is stored in an SQLite database

> The motivation for this is to see how much larger an SQLite database file is compared to a ZIP archive containing the same content. The answer depends on the filenames, but 2% seems to be a reasonable guess. In other words, storing files as compressed blobs in an SQLite database file results in a file that is only about 2% larger than storing those same files in a ZIP archive using the same compression.

Uh.... Yeah, I don't need a complicated, incompatible version of Zip that is 2% larger. I'll just use Zip.

reply

[–] JdeBP link

Actually, that time was 1986, if one excludes encryption (which is easy to add).

* https://www.rpi.edu/dept/acm/packages/zoo/2.1/rs_aix31/src/z...

reply

[–] rekado link

Sure. `guix pack` is a neat hack and it isn't tied to any particular archive format.

When using plain Guix you won't need to use any archive format at all; packages simply end up each in their own unique directory and can be used just like that. You can easily spawn a container environment where only the relevant directories under `/gnu/store` are mounted.

It's on my list to add more target formats for `guix pack`, but generally I'd recommend using Guix directly to reap all benefits. `guix pack` is only really useful for cases where you cannot use Guix on the target system.

reply

[–] rekado link

A squashfs backend for `guix pack` exists now:

http://lists.gnu.org/archive/html/guix-patches/2018-05/msg00...

reply

[–] cpburns2009 link

Are you complaining about the complexity of file format itself? My understanding is it's pretty simple: a linked list of headers with the contents of each file after each header. Or are you complaining that it doesn't do compression itself like ZIPs do?

reply

[–] GrayShade link

One think I dislike about tarballs is the lack of random access support.

reply

[–] cpuguy83 link

You can build random access around tarballs, just need to index the header data.

Shameless plug, this is what https://github.com/cpuguy83/tarfs does.

Granted, you do have to traverse the entire tarball.

reply

[–] masklinn link

> You can build random access around tarballs, just need to index the header data.

> Granted, you do have to traverse the entire tarball.

So you can't randomly access a tarball, you can cache the linear access you've already done.

reply

[–] Vendan link

You can read all the headers without reading the whole file by just seeking over the file data....

reply

[–] sitkack link

But you would still have to decompress it. How much support for legacy systems do we need vs just making a slightly better version of a jar?

reply

[–] cpuguy83 link

Decompression has nothing to do with tar, though. But I agree, it is painful to deal with tar+gz.

reply

[–] sitkack link

> Decompression has nothing to do with tar

Which is exactly the problem. Same issue occurred via volume and filesystem management resulting in ZFS. We need systems that compose and also elide.

So something that both archives as set of files and compresses them w/o losing affordances over the layer below it in the process.

reply

[–] TylerE link

He's complaining about the simplicity. He wants something with less suck.

reply

[–] spookthesunset link

What do you mean by massive?

reply

[–] stuaxo link

Please, can we move to an archive format that isn't so sprawlingly massive ?

reply

[–] t0nt0n link

With Guix you get full introspection of your entire package dependency graph, you can check and manipulate every aspect - and it is still simple and easy to work with. With GuixSD you get this same introspection and overview, but of your entire system. creating a container, vm or even a docker image is a simple '$ guix system <container|vm> config.scm' away. And your config.scm is as complex as you like it to.

The simplest way would be to package the app for guix and you could just run '$ guix environment <name-of-package>' and you would be dropped into an environment with all your dependencies and whatever else the application requires in your path ready for hacking, get your sources and editor and start working.

If you need a vm or similar though I'd translate your example above into a system config where:

- packages include python-2.7 and whatever is in requirements.txt (this may mean you have to package a few things, but again this is usually super easy)

- users and groups are added to the config, as they always are, no extra step necessary.

- exposing ports and networking is available as options for qemu script guix produces to launch the vm.

- CMD ./notify.py: create a "simple" service that can be autostarted by the system on boot.

- filesystem access is also handled by arguments to the qemu script.

As always though there are several paths to Rome, and these are just two of them.

Zeromq and libsodium are already packaged on guix, czmq and zyre looks like they would be simple to package, guix is really quite simple to work with, which I think is the reason so many of the users and devs are running it as our daily drivers, even though it is strictly beta (0.14. I think is the last release).

And pointless, come on - what does that even mean? Does it mean you don't value them? I was quite happy to read about a neat new thing I can use my favorite tool for.

reply

[–] justinsaccount link

> With Guix you get full introspection of your entire package dependency graph

Yes, I know all that. It's neat. I would like to learn more about it.

> The simplest way would be to package the app for guix

I was asking how to package the app for guix, and your response is the simplest way would be to package the app for guix...

> If you need a vm or soimilar though I'd translate your example above into a system config where: - packages include python-2.7 and whatever is in requirements.txt (this may mean you have to package a few things, but again this is usually super easy) - users and groups are added to the config, as they always are, no extra step necessary. - exposing ports and networking is available as options for qemu script guix produces to launch the vm. - CMD ./notify.py: create a "simple" service that can be autostarted by the system on boot. - filesystem access is also handled by arguments to the qemu script.

Yes, I'm sure it is super easy. How do I do it?

Do you know how to use the dockerfile I posted above? You run

  docker build -t myapp .
  docker run myapp
that's super easy. 9 lines and 2 commands. You can now add docker expert to your resume.

> Zeromq and libsodium are already packaged on guix, czmq and zyre looks like they would be simple to package,

Well, I was working on a fork of things, so I would have needed to install my forks.

> guix is really quite simple to work with

I'm sure it is!

> And pointless, come on - what does that even mean? Does it mean you don't value them? I was quite happy to read about a neat new thing I can use my favorite tool for.

You are correct, I don't really value posts saying how cool and easy something is and how much better it is than other solutions, when they don't actually present a complete solution someone can actually use.

I get that it is not other peoples job to teach me how to use something like guix, but do people not understand why things like Docker won?

reply

[–] t0nt0n link

Right, your dockerfile contains a requirements.txt with unknown complexity and number of packages, your app is without a name and does not have any links to code.

I'd be happy to provide some examples. Say you want your fork of libsodium:

  (define-public my-libsodium
    (package
      (inherit libsodium) ; now anything not defined in this package will be inherited from libsodium
      (source (origin (method url-fetch)
                (uri "url-to-your-sources")
                (sha256 (base32 "hash"))))
     ; Add whatever other fields your fork needs.
  ))
Sure it's slightly more verbose. That's a bit of the cost of having something you can actually rely on, with that degree of hackability.

If you actually want help to package these things ask on our mailinglist or IRC, we're happy to help with specifics. But you're basicly complaining that I didn't give you a concrete solution to a problem with several missing details that are important. Docker would not be able to instantiate your python project if it did not know the contents of your requirements.txt.

The thing is docker is huge and bloated; is far from secure, and will probably stay that way for the foreseeable future; has a more or less complete lack of introspection; and is not strictly reproducible (sure, it gets quite far along the way, but it really is not).

Guix on the other hand is rather lightweight, and you have a fair amount of control over how lightweight it should be; builds from source, and has a sort of hotpatching system for security fixes; has introspection and is quite close to bitreproducible.

Sure, docker is _easy_, as long as it works. And I'd argue that because of its complexity and obscurity it is not practically free software.

reply

[–] justinsaccount link

I don't think that's very verbose. The dockerfile I was using to build the app basically grabbed a specific version of all the deps and ./configure && make install'ed each one.

I'm completely onboard with the idea of reproducible builds.

> The thing is docker is huge and bloated; is far from secure, and will probably stay that way for the foreseeable future

It would be a mistake to fully associate container workflows with docker itself.

https://github.com/genuinetools/img + https://github.com/opencontainers/runc can be used to provide almost the same workflow as docker, without using docker itself.

You could also use the '-f docker' option the post talks about with runc to run the resulting image in an unprivileged container.

reply

[–] masklinn link

> It would be a mistake to fully associate container workflows with docker itself.

It would also be a mistake to fully associate Docker containers with Docker.

Joyent can run docker containers, securely (zone-isolated), on smartos.

reply

[–] cyphar link

Regarding your concerns about Docker, I agree with that (even though I've been working on Docker and in the wider container community for almost 5 years now). However, there are plenty of tools that are compatible with Docker but provide similar benefits.

For instance, (from the openSUSE community which I'm a part of) we have KIWI that provides builds with full introspection on a package level (similar to what you're doing with Guix). If you build the image inside OBS (our build system) then if a dependency of your image is updated then your image will be rebuilt automatically and published in OBS (where it can be further pushed to any Docker/OCI registry you like). The packages are signed, and the image is also "signed" (though it currently signs the image artifact and doesn't use image signing since that is still not standardised). And most packages in openSUSE are bitreproducible (we build everything in OBS).

The above is far and above much better than the current standard in the "official" world of Docker, but unfortunately because OBS has a UI from the early 2000s (which is when it was written) it doesn't get enough attention outside of the communities that use it (and enjoy using it a lot). Everyone wants Dockerfiles even though they cannot provide these features (and you cannot get package manifests of your images without running a package manager in the image, which means you cannot get vulnerability information from the manifest).

[ Though I'm mostly talking about openSUSE here, I also happen to work for SUSE on the containers team. ]

reply

[–] pxc link

> However, there are plenty of tools that are compatible with Docker but provide similar benefits.

And Guix is one of them, remember? From the article:

> Add -f docker [to your `guix pack` command] and, instead of a tarball, you get an image in the Docker format that you can pass to docker load on any machine where Docker is installed.

:-)

> The above is far and above much better than the current standard in the "official" world of Docker, but unfortunately because OBS has a UI from the early 2000s (which is when it was written) it doesn't get enough attention outside of the communities that use it (and enjoy using it a lot).

This is so true! I've mostly moved on from traditional, imperative package managers and associated distros in favor of the functional package management paradigm exemplified by Guix, but I still recommend openSUSE to my friends who prefer a more traditional/mainstream distro because of the love I have for the Open Build Service and Zypper.

The web interface for OBS does feel clunky these days, but it's a wonderful tool not just for improving the reliability and quality of software packages, but distributing them. Zypper is hands-down the most powerful and complete high-level package management tool I've ever used as part of a binary-based GNU+Linux distro. I love that openSUSE provides an instance of OBS that anyone can use for free to build packages for not just openSUSE but a TON of different distros.

I wish more people would explore, take advantage of, and celebrate OBS just like I wish they'd do the same with Nix and Guix!

reply

[–] gfosco link

I think you've reinforced the point they were making. It's pitched as easier, but clear examples of common usage aren't provided. You've provided a response longer than the 9 line Dockerfile, and we still don't know how to replicate it with guix.

reply

[–] t0nt0n link

I thought giving concrete commandline invocations to be rather clear and precise.

I use 'guix environment <somepackage>' and 'guix system vm config.scm' every day. I don't need more, cause these two solves most of the problems that was described earlier.

What is it I can provide that would be clearer, more common usage, than the examples I use almost literally as they are here?

And that 9 line docker file references at least one other unknown file, and is part of a bigger program. Docker would not be able to reproduce with the information given in that post. How do you expect me to reproduce something with at least 2 huge unknowns?

That is why you got a more generic answer for implementation, but once you have your implementation once, you only need the commandlines I provided.

reply

[–] pas link

That Dockerfile simply runs the commands listed therein in a glorified chroot, and then packages the result. The commands could easily be wget tar.ball && tar xf tar.ball && ./configure --prefix=/bla/bla/docker/ && make -j4 && make install

So, the question is, how to package something with guix, and how to run it.

With docker you run something as docker run [--interactive] [--terminal] [--entrypoint=...] <image> [[command] args]

Your libsodium fork example is nice, but we still don't know how to package a simple program.

reply

[–] tscs37 link

To some extend I sympathize with GP because your post is exactly why I'm currently not using nix or guix.

While it's neat that I can do introspection on my package graph, I don't immediately see any benefit for me when I startup my containers.

I would love to see a full guix/nix script of what GP asked to see a comparison, I like to see hands-on stuff not theoretical.

reply

[–] Ruud-v-A link

> it would be great to use something like guix to define all the libsodium+zeromq+czmq+zyre dependancies and be able to spit out an 'ultimate container image'

You define a package for your own project that depends on libsodium/zeromq/etc from GuixSD. Then you export your own package with 'guix pack'. For an example of what a package definition looks like, take a look in /gnu/packages in the GuixSD repository, for instance libsodium [1] or Vim [2].

I did something similar recently to build an Nginx "application bundle" [3]. It uses Nix (previously Guix, but Nix worked better for me in the end) to build a squashfs image. You can then run the binary on that filesystem with systemd-nspawn, or as a regular service by setting RootImage=. Some advantages over the Docker approach are that you can easily customise the build (e.g. changing the ./configure flags for Nginx without having to manually perform all other build steps), and bit by bit reproducibility (if you build the same commit six months from now, on a different machine, you will still get the same image out).

[1]: https://git.savannah.gnu.org/cgit/guix.git/tree/gnu/packages... [2]: https://git.savannah.gnu.org/cgit/guix.git/tree/gnu/packages... [3]: https://github.com/ruuda/miniserver#readme

reply

[–] rekado link

> Do you want to convince people that something like guix is better than docker

No, we show that Guix is a tool that gives you a way to work with software environments at a higher level; but at the same time you don't have to give up on application bundles like Docker. You can simply generate Docker images or other forms of applications bundles from that higher-level representation.

You are welcome to take a look at this paper that I co-authored where we explain why we use Guix for a reproducible bioinformatics pipeline, and the rigorous, declarative functional package management approach instead of the imperative approach of Docker files:

    https://www.biorxiv.org/content/early/2018/04/21/298653
We're also providing Docker images, but we generate them from a higher-level declarative specification that ensures a high degree of bit-reproducibility.

reply

[–] t0nt0n link

Packaged zyre and czmq. I'll send the patch for their inclusion, but until then here is the code: https://notabug.org/thomassgn/guixsd-configuration/src/maste...

reply

[–] myWindoonn link

From memory, not tested, not spell-checked:

    FROM nixos/nix
    RUN nix-channel --update
    RUN nix-env -i python2.7-{twisted,treq,txgithub}
    WORKDIR /app
    ADD . /app
    EXPOSE 8080/tcp
    CMD python notify.py
The next level would be using the nixpkgs Docker builder directly: https://nixos.org/nixpkgs/manual/#sec-pkgs-dockerTools

reply

[–] davexunit link

It's hard to give you any specific recommendations with so little context, but I will try. For starters, I should point out that you can't really compare Guix directly to Docker. Guix is a package manager, Docker isn't. The article talks about 'guix pack', which makes it possible for Guix to interoperate with non-Guix systems, and one supported system is Docker. You can deploy software with just Guix, too, either on GuixSD or a foreign distro with Guix installed.

Anyway, in your Dockerfile I see that your application uses Python and you do some package management and service management stuff that is mixed together. In Guix, these things are separated. So the first step would be to define a package for your software, and then you would deploy that package. For a real world example of a Python application, here is what the AWS CLI package looks like:

    (define-public awscli
      (package
       (name "awscli")
       (version "1.14.41")
       (source
        (origin
         (method url-fetch)
         (uri (pypi-uri name version))
         (sha256
          (base32
           "0sispclx263lybbk19zp1n9yhg8xxx4jddypzgi24vpjaqnsbwlc"))))
       (build-system python-build-system)
       (propagated-inputs
        `(("python-colorama" ,python-colorama)
          ("python-botocore" ,python-botocore)
          ("python-s3transfer" ,python-s3transfer)
          ("python-docutils" ,python-docutils)
          ("python-pyyaml" ,python-pyyaml)
          ("python-rsa" ,python-rsa)))
       (arguments
        '(#:tests? #f))
       (home-page "https://aws.amazon.com/cli/")
       (synopsis "Command line client for AWS")
       (description "AWS CLI provides a unified command line interface to the
    Amazon Web Services (AWS) API.")
       (license license:asl2.0)))
The package recipe contains all the metadata, build instructions, and dependencies. Now that you have a package, it can be built with Guix and then deployed in a variety of ways. Judging from the Dockerfile, your software is some daemon that listens on port 8080, so:

* You can install the software directly using 'guix package -i your-package-name' and run the notify.py program. Good for trying things out.

* If you are deploying to the Guix system distribution, you could write a service definition so that you can manage the daemon via the init system. The service would take care of creating the notifier user and group, starting the service on boot, etc.

* You could use 'guix pack --format=docker' to export an image suitable for running with 'docker load'

* You could use a different 'guix pack' format (and maybe make it relocatable) for running on some other non-Guix system

I should also add that I don't think the work is fully done yet on handling the entirety of Docker use-cases. It's a work in progress. I can think of a number of things that I want to add to Guix to make this workflow better that I haven't had a chance to hack on yet.

reply

[–] justinsaccount link

That's interesting, but where does it specify which python version is used and the version of all the dependencies?

If the versions are specified in the 'python-botocore' type definitions, how do you install more than one version of a library?

Does guix only track the latest version of dependencies or can you request any version of something?

reply

[–] rekado link

Packages in Guix are just Scheme variables.

The package here uses the `python-build-system`, which defaults to the latest version of Python, but you can override that by specifying `(arguments '(#:python ,my-python))`, where `my-python` is a variable bound to a package value of the Python variant that you want to use.

You can easily install more than one version of a package as long as you have a package definition for it. You can install different variants (not just different versions) into separate profiles.

Guix is a Scheme library providing lots of variables that are bound to package values. These package values may have links to other packages (that's done with quasiquotation). Together they form a big graph of packages with zero degrees of freedom. Every version of Guix provides a slightly different variant of this package graph. When installing any package you instantiate a subset of this particular graph. Updating or modifying Guix gives you a different graph.

In order to keep things manageable we try to keep the number of variants of any particular package in Guix to a minimum, but you can install older variants by using an older version of Guix; or you can add new variables that are bound to package variants or different versions and install those.

It's very convenient and conceptually simple.

reply

[–] jancsika link

@justinsaccount: can you give t0nt0n a clue what the contents of requirements.txt are, plus anything else needed to create a complete port to guix?

Then it would be great to see t0nt0n or someone else who knows guix do the port so we can fully compare these two approaches.

reply

[–] justinsaccount link

it's not really application specific, just stuff like

  requests==2.18.4
the actual packages generally aren't important.

The cases were that would become interesting are where they require some C library dependencies first, like libpq-dev. In those cases something like guix/nix would be nice because it could be used to pull in the specific external dependencies as well.

reply

[–] justinsaccount link

Articles like this are pointless. I get that guix and nix are neat, and I think that every single time something about one of them is posted, but I don't have the slightest clue how to use either one of them.

Do you want to convince people that something like guix is better than docker? Then take something that is currently distributed using docker and actually show how the guix approach is simpler.

i.e. I have a random app I recently worked on where the dockerfile was something like

  FROM python:2.7
  WORKDIR /app
  ADD requirements.txt /app
  RUN pip install -r requirements.txt
  ADD . /app

  RUN groupadd -r notifier && useradd --no-log-init -r -g notifier notifier
  USER notifier
  EXPOSE 8080/tcp
  CMD ./notify.py
How do I actually take a random application like that and build a guix package of it?

Another project I work on is built on top of zeromq, and it would be great to use something like guix to define all the libsodium+zeromq+czmq+zyre dependancies and be able to spit out an 'ultimate container image' of all of that, but all this post shows me how to do is install an existing guile package.

reply

[–] tannhaeuser link

It's a feature: you must be running tar as root or equivalently to restore to uids/gids other than the effective process uid. Otherwise you could happily overwrite any host system file including parts of the O/S. It's a restriction shared by all archivers.

reply

[–] vinceguidry link

You can use the --same-owner flag and extract the tarball as root in order to preserve ownership. The -p flag ensures that the permissions umask will match the archive's as well.

reply

[–] Grue3 link

I like the amazing "feature" where the act of extracting a tar file into a directory can change permissions on this directory. You have to pass --no-overwrite-dir flag to disable this.

reply

[–] master-litty link

That seems sensible to me, what else would you expect?

reply

[–] AdmiralAsshat link

I expect that they should preserve the ownership and permissions of the original file if I tell it to.

reply

[–] jolmg link

How can a normal user create files owned by another user? If tar allowed that, you could write any file with any permission and any ownership anywhere by first crafting a tar file of those files and then extracting them. It'd render the file permissions and ownership system completely moot.

EDIT: To get the effect you want, run tar as root. That's required to ensure you have the permission to override the DAC system, first.

reply

[–] rakoo link

How can it have the same owner if it's a different machine, and users aren't the same ?

reply

[–] spookthesunset link

Do tarballs store the user/group names as strings or do they store the uid/gid instead?

It is one of the goofy things about Unix systems is most tools speak uid/gid and woah is you if two machines on the network have “bob” only as different uid’s.

Not entirely sure if windows has the same problem as to be honest if you use active directory most of that stuff is auto-magic.

My hunch is that going with the ID vs. the “friendly name” has a bunch of trade offs and whichever you pick will come with serious drawbacks.

reply

[–] fapjacks link

Heh I think you meant "woe is you" although what you've got is actually rather delightful.

reply

[–] justincormack link

They can do either - traditional tar formats have uid as a number, but the newer pax format has both numeric and named values.

reply

[–] Hello71 link

in fact, names are the default. "--numeric-owner" must be passed to use numeric values.

reply

[–] saulrh link

I feel you're on the right track with that hunch - Zooko's triangle should apply in some way.

reply

[–] icholy link

Magic ... obviously

reply

[–] delinka link

I think you mean owner rather than permissions. In most cases, you want to maintain permissions/file mode (read/write/execute) but not the original owner.

reply

[–] AdmiralAsshat link

Except it doesn't do either. I've had files that had 666 user:group permissions/owner that I tar into a backup file, then untar, only to find that the file is now 664 with me:me ownership.

It's brought production to a halt on more than one occasion if I try to "restore" from a backup by extracting the files and moving into production without manually fixing them first.

reply

[–] delinka link

Your umask affects mode during extraction. You can pass -p to tar asking tar to attempt to restore exactly the modes in the archive.

If you extract as root, it'll preserve the owner and group. Otherwise, the default is to assign the owner and group of the user running tar.

reply

[–] dozzie link

> I've had files that had 666 user:group permissions/owner that I tar into a backup file, then untar, only to find that the file is now 664 with me:me ownership.

It was PEBKAC, not tar's fault (GNU tar, anyway). Tar does store the original owner and permissions. But the ownership of the unpacked files -- do you really expect your process to set ownership of the files to another user?

The permissions would also be restored to 666 if you ran the tar as root; there are several options whose defaults depend on whether EUID is 0 or not.

reply

[–] noja link

What was your umask?

reply

[–] cyphar link

That's a detail of the extraction tool. In umoci (which extracts tar archives as part of an OCI image)[1] you can remap the users or even extract as yourself and then add an xattr which represents the original owner in the archive (which is then read back when creating a new tar archive from the delta of the rootfs).

[1]: https://github.com/openSUSE/umoci

reply

[–] oconnor663 link

Or where the paths in the tarball can start with `..`?

reply

[–] AdmiralAsshat link

Do tarballs still have that unfixed/unfixable bug where the extracted files will have the permissions of the person who untarr'd the file?

reply

[–] matthewbauer link

Why poorly? I don’t see anything worse about this.

reply

[–] AnIdiotOnTheNet link

Really? Seems like an awful lot of tooling for what is essentially "Put binary and dependencies in folder. Move folder around at will" in sane environments.

reply

[–] pxc link

The tooling already existed because it's part of a stack that goes from build tool to package manager to operating system configuration manager, with all kinds of features for developers floating around along the periphery. It handles all of these things uniformly, reliably, reproducibly, and in a way that deduplicates shared dependencies.

This article is just showcasing a relatively small bit of tooling on top all that which makes it possible to reuse that work to produce containers out of the very same stuff, in a whole range of formats.

`guix pack` and `nix-bundle` are illustrations of how a novel solution (functional package management) to the very problem to which app bundling constitutes utter capitulation (dependency management) can not only retain the virtues the app bundle approach throws away in the hopes of making deployment simple, but even match it in ease of deployment when _none_ of the infrastructure of the package management system is expected to be present on the deployment target.

From where I stand, that's damn impressive.

All of this was achieved without the kind of ‘standardization from above’ that Apple gets to do on its platform. It's true that app bundling could have been a lot simpler if the Linux community lived in a locked box at the mercy of a Vampire King bearing the power to upgrade users' kernels in the dead of night without bothering to ask them, who preempted any diversity or choice in operating system components with a uniform common runtime, and gleefully ripped unseemly APIs out from under developers with every OS release. But instead— thank God!— we have such a wide range of environments under the name ‘Linux’ that I'm ready to agree with you and call it insane. Yet here we see that hackers made it work anyway, without bossing anyone around or compromising on the strengths of proper package management. And that's fucking awesome.

reply

[–] AnIdiotOnTheNet link

Boy, you sure make fragmentation, constant wheel reinventing, and the necessity of complex tooling to perform simple tasks almost sound like a good thing. I suppose it must be for the small percentage of people who value those things over actually being able to do stuff.

Given the near-complete lack of non-oss software support Linux has, it seems like both developers and users rather prefer uniform common runtimes and a lack of diversity in their operating system components. It's almost like a whole lot of things get much easier if there's some kind of standardization.

reply

[–] pxc link

> Boy, you sure make fragmentation, constant wheel reinventing, and the necessity of complex tooling to perform simple tasks almost sound like a good thing.

Why, thank you!

Redundancy of efforts in F/OSS is of course a bad thing. It's perhaps even more tragic in free software than in proprietary software, because in free software, developers have fewer formal barriers to drawing upon the work of others. But it's something free software projects can't simply disable by exerting brute control over their users and contributors. The point is that with tech like this, the hackers behind projects like Guix have triumphed in a tougher struggle than NeXT or Apple ever picked. And they've built technology that copes with a wider range of environments, not via ugly hacks on edge cases, but through a thoughtfully designed build system which renders the whole dependency tree of every program it builds transparent, reproducible, and portable. That they had to build a vehicle for such wild and varied terrain is not what I'm celebrating, the cool thing is that they _did_.

> Given the near-complete lack of non-oss software support Linux has, it seems like both developers and users rather prefer uniform common runtimes and a lack of diversity in their operating system components.

Alternatively, when you refuse to distribute source code, compatibility for you involves greater demands on your platform, because you can't leave downstream distributors to recompile and you refuse to allow your more capable users to fix your software's incompatibilities. It's almost like a whole lot of things get easier when you distribute source code with your application.

Regardless, I think there are a lot of factors that together explain the predominance of free software on free operating systems. Proprietary software companies aiming to hit as large a market as possible with a single codebase turning away from perceived fragmentation in the ‘Linux market’ is certainly one of those many factors.

reply

[–] AnIdiotOnTheNet link

> Alternatively, when you refuse to distribute source code, compatibility for you involves greater demands on your platform, because you can't leave downstream distributors to recompile and you refuse to allow your more capable users to fix your software's incompatibilities.

And yet Windows still manages to run software written for a decade+ old version of it, and users often make compatibility patches for now-unsupported software, all without the source or recompilation. I think a big misstep by the OSS community has been its reliance on the crutch of "you have the source, do it yourself", and that includes making their software even work on a system in the first place. It leads to thinking like "it's ok if we break backwards and forwards compatibility, everyone can just recompile!".

reply

[–] AnIdiotOnTheNet link

Reinventing Application Bundles only 30 years after NeXTStep, poorly.

reply