[–] munin link

This is a gap in this kind of verification work. Systems like coq get around this by having a facility to "extract" code from the proof itself. You would define your algorithm as a Fixpoint, prove properties of that Fixpoint, then extract that Fixpoint into some ocaml code and use it directly in your real programs. This closes the loop you describe, insamuch as you trust the extraction process and the ocaml compiler and runtime (but other work is ongoing to close that loop as well).

reply

[–] pron link

It's not "getting around", and one could write an extraction tool like that for TLA+, too. The difficulty is simply verifying any system, by whatever method, end-to-end, namely verifying that the high-level global correctness properties are preserved all the way down to machine code. We simply have no idea how to do it for real-world, large software, and in those cases it's been done for small, simple software (seL4, CompCert), the process was extremely expensive.

reply

[–] chrisseaton link

The loop is always open though. Who verifies your verification code? Who verifies the processor implementation? All you can do is reduce the gap in the loop surely?

reply

[–] munin link

You can close all of those. You can have large amounts of your verifier itself be verified. You can verify the CPU design (hardware people tell me this used to be standard, and a lot of the formal methods community has roots in verifying hardware).

What you can't close is the language that you use to do all the verification in, itself. You wind up with a core calculus that you have to stare at really hard and trust / prove via other means.

reply

[–] nickpsecurity link

To support your point, Jared Davis and Magnus Myreen did the Milawa prover that builds the logic and implementation in layers starting with a hand-checked, tiny logic. Builds all the way to first-order prover with each layer getting re-checked by those below where feasible. Uses Myreen's method for verification to assembly (or machine code?). So, it's already been done.

http://www.cs.utexas.edu/users/jared/milawa/Web/

The other approach I investigated was proving the logics sound in set theory given it's well-understood, trusted, and widely deployed by mathematicians. Once the logics like FOL or HOL are verified, we just have to verify the specs and checkers (already done to in progress). Run on diverse hardware to prevent glitches there. From then on, mainly just look at logical specs since we should be good. I also found a verified compiler that uses Z specs with Prolog code... a method that might be used with Milawa for bootstrapping trust.

reply

[–] roblabla link

Isn't this susceptible to a trusting trust[0] attack, where your verifier has a bug that makes it verify itself, even though it shouldn't ?

It's always possible there is a loop somewhere. The best you can hope for is make it infinitesimally small.

[0]: http://wiki.c2.com/?TheKenThompsonHack

reply

[–] seagreen link

There's been a simple counter to trusting trust attacks since 2009: https://www.dwheeler.com/trusting-trust/

reply

[–] nonsince link

If I understand the abstract correctly, that relies on having a trusted compiler, which assumably would have to be bootstrapped ultimately from a trusted hand-written compiler in machine code. This effectively counters malicious trusting trust attacks but does not effectively counter trusting trust attacks due to error, because your entire trusted stack has to be correct.

That's not to say there's no way to close the loop here, simply that I don't know that this is it.

reply

[–] dllthomas link

You don't understand the abstract correctly. It relies on having compilers from diverse sources, unlikely to share the same malicious modification. It does not rely on any of them being trusted.

reply

[–] nickpsecurity link

No. The Karger attack on compilers that Thompson talked about was mitigated far as methods to use before Thompson even published it. You just use a verified interpreter, compiler, and/or CPU for the bootstrap. Hardware is couriered with tamper-evident seals. Software built from source or at least use crypto to verify source. It's also least likely attack you'll run into: 0-days in your stuff or compilers optimising away security is more common.

A medium-assurance solution is 3+ compilers with same input and output on diverse hardware by diverse teams unlikely to collude all producing ssme result. If hardware, you can do that with chips plus voting scheme.

reply

[–] mcguire link

A trusting trust attack cannot be eliminated, only be made prohibitively expensive. One way to do that is by using multiple verifier back-ends.

reply

[–] paulddraper link

Considering there is an infinitesimally small chance that a meteor will kill me before I hit "submit", I'll take that chance.

EDIT: Whew!

reply

[–] DanWaterworth link

You always have to make assumptions in any formal system able to express arithmetic as proved by Gödel.

reply

[–] nickpsecurity link

It's been done down to the hardware. Originally by Computational Logic Inc for FM9001 whose prover became ACL2. Then Verisoft went from apps to OS to C subset to VAMP processor (DLX-style). Recently people are doing HOL to hardware that will integrate with HOL/Light whose core is a few hundred or thousand lines to trust. Those people eliminated trust requirements in everything from extraction mechanisms to provers to assembly generation:

http://www.cse.chalmers.se/~myreen/

Follow he and his colleagues work for all that. Especially Milawa and CakeML. If you want a starting point, a small, Forth-like processor verifiable by hand on a node verifiable by eye could run the initial interpreter and prover for the first, real CPU. Plus, common practice is to split work into untrusted generation of artifacts with traces that are verified by a trusted checker that's comparatively tiny and simple. The little CPU would just run the checkers. You can run the generation on anything you like with the speed benefit. :)

reply

[–] unboxed_type link

What about real CPUs (much more complex than 'Forth CPU'), network adapters (has its own CPU), memory controllers, real OS kernels, POSIX library (which is sometimes loosely specified) and so on?

As far as I understand modern truly reliable system (commerical avionics-like) still must be relatively simple to be verified up to hardware level.

reply

[–] nickpsecurity link

Real CPU's are formally verified for certain properties with rigorous tests. That's why they have only 100-300 errors in something with a billion transistors. That practice started after a recall. Now, for examples of full, mathematical verification of about every aspect you're looking at VAMP and AAMP7G.

http://www.spbguga.ru/files/Formal_Verification_of_a_Process...

http://www.ccs.neu.edu/home/pete/acl206/slides/hardin.pdf

VAMP is a pretty-complex, DLX-style CPU. I think they extended it to multicore or concurrency of some kind in another work. Original was single core, though. Passed all the FPGA tests. The AAMP7G combines a stack machine, over a hundred instructions, microcode engine, and separation kernel into one CPU. It was first in recent times in high-assurance security to use microcoding. I tell everyone to do that since it lets us easily change the behavior of the hardware without reverifying everything. Their tools also let you verify software semi-automatically against a model of the hardware ISA. Main stuff is paywalled but this is best free one I could get you. On availability side, they also run three units at once with a voter to reduce the odds the chip will fail. Truly high-assurance exemplar in availability and security.

The prover is ACL2. You can do a trusted checker for that in few lines of code. You might need a lot of RAM and CPU time, though. Even all that can be a few blocks verified by eye and hand that you just plaster over a bunch of silicon then verify correctness automatically. There's also techniques where a partial failure can be magnified into total, visible failure. Tandem/HP NonStop does stuff like that to catch and replace failed components before system downtime happens. So, you can be reasonably sure the components are working even if defects happen during fabrication.

reply

[–] joshuata link

An interesting solution to this problem is verification witnesses[1]. A witness is a machine readable record of a verification task. Several different programs can read the witness and verify it independently. It is especially useful for counterexamples, where it can be a simple program trace to the error state. Given the input program and witness another can check whether the error state is reached.

[1] https://www.sosy-lab.org/~dbeyer/verification-witnesses/

reply

[–] samth link

Right, the concept of the TCB (trusted computing base) is the important thing. Good discussions of remaining TCB in verified code are in the seL4 work, or in this paper: https://www.cs.princeton.edu/~appel/papers/verif-sha-2.pdf

reply

[–] nickpsecurity link

See my link to Myreen's page and follow all those people's publications. Much of the TCB functions in papers like that has been eliminated in other work. It just needs to be cross-checked, integrated, and applied.

reply

[–] samth link

Right, that's one of the nice things about Appel's paper: he shows how to integrate a few different pieces to significantly reduce the TCB (ie, it doesn't include anything about C the language at all).

reply

[–] nickpsecurity link

I think the one you're talking about got the TCB down to a few hundred lines of C. Done in Twelf.

reply

[–] samth link

No, I'm talking about the one I linked to, which is done in Coq.

reply

[–] nickpsecurity link

Oh my bad. I just saw work such as seL4 then assumed it was a seL4 reference. I actually didn't have this one by Appel. Thanks for the paper! :)

Here's the one I was referencing with the tiny TCB:

https://www.cs.princeton.edu/~appel/papers/flit.pdf

OK. So, it was a few Kloc. Still smaller than Coq. They could probably get it even smaller with recent work given translation validation knocks compilers out of TCB. So, in between 803-2668loc.

reply

[–] undefined link
[deleted]

reply

[–] dronemallone link

That "facility" is the Curry-Howard correspondence!

reply

[–] mafribe link

Not necessarily. Isabelle/HOL, an prover not based on CH but on the LCF-architecture, can also do program extraction, see e.g. Program extraction in Isabelle:

https://wwwbroy.in.tum.de/~berghofe/papers/TYPES2002_slides....

reply

[–] samth link

Extraction of terms written using Fixpoint isn't really the Curry-Howard correspondence. Instead, the correspondence is between (for example) proofs of correctness and terms in the Calculus of Constructions.

[Fixpoint terms are also CIC programs, so there's a sense in which they're related, but it's not really about Curry-Howard.]

reply

[–] nonsince link

If the OCaml code is Standard ML-compliant you could run it on Cake, then you only have to trust the conversion

reply

[–] pron link

In principle, you could translate your code -- or even the generated machine code -- back to TLA+ and check it there. This has been done as research projects for C[1] and Java bytecode[2].

But the reality of formal verification is that we simply do not yet have the capability -- with any tool -- to formally specify and verify a large, complex software system end-to-end (namely, from high-level global correctness properties and all the way down to machine code). There are two options: either use tools that can do end-to-end verification like Coq and Isabelle and limit yourself to small software, and even then expend an inordinate amount of effort (like seL4 or CompCert, small programs that have been verified end-to-end, taking years), or use any verification tool (be it Coq or TLA+) for a high-level specification and verification without extending it end-to-end, for a lowered confidence. If you like, you can augment that with code-level specification -- like JML (Java), ACSL (C), SPARK (Ada) etc..

[1]: https://link.springer.com/chapter/10.1007/978-3-319-17581-2_...

[2]: http://ieeexplore.ieee.org/document/6042069/

reply

[–] nickpsecurity link

"In principle, you could translate your code -- or even the generated machine code -- back to TLA+ and check it there. This has been done as research projects for C[1] and Java bytecode[2]."

There's also ASM's and equational methods that allowed specification or verification of instructions or their use to happen in days when the tooling was already adequate. I wouldn't use TLA+ for this unless we're talking about concurrency. Even then, I'd be using it in combination with something else.

You might also find it interesting that one group embedded TLA+ in a sound prover (HOL?) for making the TLA+ analysis sound. They extended it in some way, too. Anyway, the got the benefits of both with that combination. So, some weaknesses in the tooling can be knocked out over time if more people just put labor in.

reply

[–] pron link

It all depends on your precise requirements and aesthetic preferences. I personally find TLA+ more elegant than any other high-level specification language, and very few projects actually require end-to-end verification. Those that do, must be kept small as we simply do not have the ability to do end-to-end verification of large software, no matter what tools or combination of tools we have. We don't even have a theoretical breakthrough to point us at a likely method of achieving that. If you need end-to-end verification, you must accept that the software (or component) verified will be small, and the process arduous. Then again, if you really need end-to-end verification, you must be ready to make that sacrifice. Moreover, to the best of my knowledge, no significant end-to-end verification was ever done in industry without close assistance from expert researchers, except in specialized niches where model-checkers give you sufficient coverage.

Supplementing a high-level verification with code-level specification tools is always possible, but, again, everything here is a matter of the required confidence vs. affordable cost. Additional confidence always comes with additional cost.

reply

[–] nickpsecurity link

That's why you implement it in Ada 2012 w/ SPARK 2014. You can encode the verification conditions as contracts. All the basic ones should be proven automatically. Those that aren't can be turned into runtime checks.

https://en.wikipedia.org/wiki/SPARK_(programming_language)

http://www.electronicdesign.com/industrial/rust-and-spark-so...

Here's an example of combining Event-B with SPARK to split overall verification between tools with each handling what they're good at:

https://pdfs.semanticscholar.org/481c/d4d2409115429f4b824f37...

reply

[–] unboxed_type link

How well Ada/SPARK suits distributed control algorithms I wonder? As far as I understand contracts are good enough for sequential code, but not well suited for parallel/distributed computation?

Runtime checks are good as to prevent something from happening, but they are not able to guarantee the absence of an error in the first place, right?

reply

[–] pjmlp link

A good book about is "Building High Integrity Applications with SPARK".

http://www.cambridge.org/us/academic/subjects/computer-scien...

Ada/SPARK has parallel computations defined at language level, known as tasks.

reply

[–] unboxed_type link

Thanks for the reference.

reply

[–] oconnore link

Huh? The whole point of distributed algorithms is that a set of local algorithms create a global behavior. I received this message so I send that message, etc. The SPARK contracts you write would check that local behavior, and the TLA+ would check that that local behavior results in some global behavior.

reply

[–] unboxed_type link

Well, TLA+ checks your 'contract' by traversing states of your model explicitly which takes a lot of time usually while in ADA we have to deduce feasibility of a contract at compile time. Checking a property of a distributed system at compile time is generally a hard problem, so I expect that ADA`s ability to do this should be rather limited.

reply

[–] oconnore link

You wouldn't write an Ada contract describing the full system, you would write one describing the local behavior you just described in your TLA+ model. The model ensures that the emergent behavior of the system is correct, and the local contract ensures that the local behavior of the various actors is correct -- and therefore results in that emergent behavior. Combined, that's a fairly solid argument that your system does what you intended.

If you had to implement the full proof in Ada, there would be no point to doing the TLA+ work.

reply

[–] unboxed_type link

Oh, now I see your point. Checking a TLA+ model of an algorithm and then implementing each actor in Ada reinforcing it with pre/post conditions perfectly makes sense.

Its just a little out of scope of the current thread, because the author of parent message was talking about using pure Ada/SPARK, without help of TLA+ (As I understand it in the first place), so my comment about using contracts was in that context.

reply

[–] nickpsecurity link

No, I meant implementation in SPARK. The specs were already done in TLA+ in the OP's model. There's also implied porting between two notations.

reply

[–] nickpsecurity link

"You wouldn't write an Ada contract describing the full system, you would write one describing the local behavior you just described in your TLA+ model. The model ensures that the emergent behavior of the system is correct, "

This is how I was seeing it happen, too.

reply

[–] pron link

No need to use Ada, though. Similarly powerful options exist for Java and C. The code-level specification language is JML (Java) or ACSL (C), and there are many verification tools, from SMT solvers, through proof assistant obligations, to concolic or randomized test generation.

Nevertheless, you're never certain doing that, either, as you don't formally tie the code-level specification to the high-level specification. That's OK, though, because end-to-end specification is neither required for most software nor currently possible for anything other than small, relatively simple software.

reply

[–] nickpsecurity link

They exist but SPARK's were much better on automated proving. It was due to SPARK being a simpler language designed explicitly for verification.

reply

[–] pron link

Right, but this is because SPARK is (intentionally) limited. For example, AFAIK, you can't dynamically allocate memory in SPARK. If you write Java code that is as simple as SPARK code, it would also be automatically verified (after all, the solvers are the same). Java also has automated verification that does separation logic, and works well with dynamic memory (Facebook's Infer).

Languages that truly admit full automatic verification (and are also limited) -- including rich global correctness properties -- are synchronous languages for safety-critical realtime, like SCADE. BTW, SPARK contracts are not so rich. There are many useful properties that you can't express.

reply

[–] nickpsecurity link

"If you write Java code that is as simple as SPARK code, it would also be automatically verified (after all, the solvers are the same)."

Java has a different model than SPARK. Especially if you're considering verifying it against the JVM or against assembly. SPARK will probably have an advantage given the papers I've read on JML work. Far as it's overall tooling, I'm aware it has a bunch for proofs and testing as it's one of three I mention to look for in supposedly, high-assurance projects (others being C subsets & Ada/SPARK). Infer is new to me, though, so thanks for mentioning it.

"BTW, SPARK contracts are not so rich. There are many useful properties that you can't express."

Hence, work like E-SPARK that combines Event-B and SPARK. Another I saw uses SPARK, TLA+, and UPAAL for timing analysis on stuff already analyzed at component level. Best to mix and match. DeepSpec with CertiKOS is probably state of the art in that far as popular stuff goes. My Brute Force Assurance concept will do it differently if I implement it where I mix automated analysis and testing tools for code generated in C, Java, and SPARK where each's toolchain catches something the others miss. Semantic mismatches could be a problem but I'm hoping something could come out of it.

reply

[–] pron link

It is unclear to me what requirements you're thinking of when you say "should" or "best to". Given that end-to-end verification is virtually impossible today, any verification method you choose would be incomplete at some level. Exactly which compromises you choose to make depend entirely on the particular correctness requirements of your project and the effort you can afford to spend. There is certainly no one right way to use formal methods, and no one right confidence level a project must achieve.

Also, I don't think it's a good idea to call research projects in formal methods "state-of-the-art". State-of-the-art implies a level of practicality that is often absent even from the most promising research. After some years working with formal methods, it becomes painfully clear that there is a huuuge gap between how well things seem to work in the lab and how they work in real life. That is why formal methods is the only subdiscipline in computer science other than AI that has experienced a research winter, due to unrealized promises and expectations. Formal methods researchers are generally very careful in not building up expectations these days (and the discipline as a whole lowered its goals from those it had in the '70s), but some enthusiasts extrapolate statements they make in an unrealistic way.

As much as I love formal methods, like in machine learning, there is a constant struggle between what people imagine it can do and what it actually can. Everyone using formal methods comes to realize this at some point. Here is what the designers of Spec# had to say:

> [A] conclusion we have drawn from our interaction with developers is that real developers do appreciate contracts... Unfortunately, we have also seen an unreasonable seduction with static checking. When programmers see our demos, they often develop a romantic enthusiasm that does not correspond to verification reality. Post-installation depression can then set in as they encounter difficulties while trying to verify their own programs.

reply

[–] nickpsecurity link

"It is unclear to me what requirements you're thinking of when you say "should" or "best to"."

The goal of making imperative programs correct with formal verification with optionally other methods. SPARK handles it best right now if we're talking knocking out tons of problems with automated provers.

"Given that end-to-end verification is virtually impossible today, any verification method you choose would be incomplete at some level. "

This is true for most software. That's also fine as even informal verification got us quite far. Partial verification with memory- and concurrency-safe language would be great.

"Also, I don't think it's a good idea to call research projects in formal methods "state-of-the-art". State-of-the-art implies a level of practicality that is often absent even from the most promising research."

Maybe, maybe not. The phrases cutting-edge or state-of-the-art usually mean the most recent capabilities coming out of R&D. Many of them are used in experimental prototypes that aren't realistic or not deployed in production form. You'd be correct for those. The claim is incorrect for things such as SPARK which are designed for real-world use, have commercial-grade tooling, and are in industry use. It's cutting edge and quite practical for whatever programs the notation can handle. Even if not full verification, it can help easily verify absence of common errors that lead to crashes or code injection. IRONSIDES DNS picked it for those benefits.

http://ironsides.martincarlisle.com/

"After some years working with formal methods, it becomes painfully clear that there is a huuuge gap between how well things seem to work in the lab and how they work in real life. That is why formal methods is the only subdiscipline in computer science other than AI that has experienced a research winter, due to unrealized promises and expectations. "

I'm with you on that. It's why I distinguish between two kinds of things I promote: things that are worth trying to see what might happen either for theory or practice; things which should work on solving the problem based on prior work. For formal verification, I usually do recommendations on verifying tiny, critical things very similar to prior successes using the tools from prior successes. For instance, I know the Ocaml compiler can be verified similar to prior work since its individual, critical modules are quite similar in their attributes and size. It might be more or less work but it can be done. I can't tell you if you'll need an expert to fully verify your complex, efficient structure in SPARK. That shit is all over the place in difficulty in ways that continually surprise. I'm pretty sure an amateur w/ some training can use a combo of spec-based checks and partial proofs to raise its assurance, though.

I do like your comparison to AI winter. I saw the early papers, esp in high-assurance security, where they thought it was practically all going to get done in a short time. Reality hit. It became niche. It made a comeback with better tooling and hardware but they're more careful now. I still want your references on global, composable correctness possibly being impossible to verify in large programs to be debated by top minds (including skeptics) in the field. Needs way more focus to help us determine how much future activity would be a waste of time/money.

"When programmers see our demos, they often develop a romantic enthusiasm that does not correspond to verification reality."

The problem is they're bait and switched. It happened to me early on. I'd instead like them to see time/labor/defects-detected measurements for formal and informal development. Then, see what level of expertise was available for each. Also see what level of effort went into learning to verify and/or verifying different kinds of things. There's certainly a gap between expectations and reality that needs to be shown up front. Maybe with heuristics or tips on certain categories of program, component, data structure, attribute (eg pointer-heavy), and so on.

reply

[–] pron link

> The goal of making imperative programs correct with formal verification with optionally other methods.

Yes, but correct at what confidence level, given that the higher the confidence the greater the effort, and 100% is generally impossible?

> SPARK handles it best right now

Again, I don't know what you mean by "best". SPARK is a limited language, and there are other limited languages that are even more amenable to verification than SPARK (e.g. SCADE). There's a whole bunch of tradeoffs, and it's unclear how you rank them, saying that one tool is "best".

> Partial verification with memory- and concurrency-safe language would be great.

Again, you're mixing several concepts here, and it's unclear how you rank them. You can ensure memory and concurrency safety with or without language support, and it's unclear what you mean by "partial verification". For example, you could say that guarantees for memory safety and no race conditions are a good sweet spot.

> The claim is incorrect for things such as SPARK which are designed for real-world use

Right, but JML tools are not behind SPARK, and it's unrealistic to expect people to use SPARK for general-purpose programs, as it is not quite a general-purpose language. Also, have you actually used SPARK on a large project yourself? In formal methods, the greatest advocates of certain tools are very often those who have never used them. When you do, you start seeing their limitations, and understand how no specific tool is a panacea and that no tool is "best" for all or even most purposes.

> It might be more or less work but it can be done.

For a very special kind of "can". seL4 is a 10KLOC C program that has been drastically dumbed down, and still took 20 man years! The people who worked on it told me that they believe it could now be done in 5 years, but even 5 years for an extra-simplified 10KLOC program is well beyond practicality for the vast majority of software.

> I'm pretty sure an amateur w/ some training can use a combo of spec-based checks and partial proofs to raise its assurance, though.

That depends on what you mean by amateur and by how much the assurance is raised. For example, SPARK simply cannot express global correctness conditions, let alone check them. Clearly, when you write a database and want to ensure serializability or no loss of data on failure, those are the most important properties by far. This simply cannot be done with any code-level tools -- like SPARK or JML -- certainly not feasibly.

I haven't tried SPARK myself, but I have tried JML with OpenJML, which is very similar, and I agree with Amazon that TLA+ gives you a bang-for-the-buck that is probably an order of magnitude greater than other approaches. Of course, you can then apply JML to code-level.

> I still want your references on global, composable correctness possibly being impossible to verify in large programs to be debated by top minds (including skeptics) in the field. Needs way more focus to help us determine how much future activity would be a waste of time/money.

There is no point to debate something that has been proven, but there's a big difference between what's proven in principle and what happens in practice, going both ways. It is certainly possible that even though correctness provably doesn't compose, it still turns out that for most programs people write, there is an affordable way to verify them. As there is no research -- as far as I know -- that tries to classify what constitutes a "reasonable program", the only way to know is simply to try various methods. To date, we simply have not been able to verify a large program end-to-end, and have not been able to verify small programs affordably. This is a simple fact. It is certainly possible that research will eventually find an approach that works, but it's not like we have a solution for real software today.

reply

[–] nickpsecurity link

"Again, I don't know what you mean by "best". SPARK is a"

The OP is about TLA+ and Rust. Using Rust says they want something imperative, low-level, safe and fast. They also want easier verification. That eliminates all complicated tools (esp proof assistants) plus most of full verification. The parent of my first comment wanted to connect properties proven in TLA+ to the code itself. Maintaining low effort, that narrows it down to Frama-C, Java/JML, and SPARK. If they're a C expert, Frama-C is compelling due to all the other tools that can help them. Java/JML has a similar argument but complex runtime & patent suit-loving vendor (i.e. Oracle). SPARK has high automation, knocks out most of C's problems automatically, has tool support, and a non-malicious vendor. So, solution for this proposal is to try to encode correctness conditions from TLA+ or another analysis in SPARK.

"You can ensure memory and concurrency safety with or without language support, and it's unclear what you mean by "partial verification". For example, you could say that guarantees for memory safety and no race conditions are a good sweet spot."

Full verification is verification of all properties of an algorithm. Partial verification is verifying some of them. You don't really rank the two so much as you prioritize which properties to verify in a partial verification since you lacked time, budget, etc for full verification. Far as memory-safety and race freedom, OP chose Rust which already has that. So, it's a high priority to meet the safety standard OP already desires and has with TLA+ w/ Rust. It's really the OP's preference for easy, specific forms of safety, and efficient imperative language that are dictating my requirements here.

"For a very special kind of "can". seL4 is a 10KLOC C program that has been drastically dumbed down, and still took 20 man years! "

You keep leaving off two things when you mention that. One is that these projects have a habit of reinventing the wheel in tooling using their own languages, runtimes, extensions, new logics, etc. Reuse happens but CompSci likes building new stuff. They usually list around 70% or whatever high number of their effort goes into building new tools/theories. Reusing would drop that down a lot which is where the 5 years comes from. Strictly building on what we have would make that the median or max for a similar project with time coming down further as reusable components are developed. One can see that in ACL2, Isabelle/HOL and Coq ecosystems for projects that actually use what's already there. Or even in CompCert work where new CompSci just keeps adding new components to it to save time/effort.

The other thing is that seL4 targeted C language. That language wasn't designed for verification. By its history, it was barely designed at all starting with BCPL. This was also on shittiest, most-evolution-driven ISA in existence. They also built lots of new tools on top of some of the hardest ones. This thing was set up to be one of the hardest verifications in existence. Proven tools with languages designed to utilize them would reduce the hardness and increase iteration speed. We can see this with Myreen et al's work on CakeML vs CompCert. So much more has happened in less time since they started with a building block (a LISP verified to ASM), built another (a partial SML verified to ASM), and so on to get the big project (decent SML w/ optimizations verified to ASM). Their work was even used in seL4 to verify its assembly. That let it bypass building or needing a certified, optimizing compiler. Likewise, untrusted optimizations + verified, equivalence check let hardware and software verification cheat past a ton of difficulty w/ production cost being sequentially or concurrently running the checker over and over. Lots of potential cheats when designing for easy work w/ lots of reuse. Still hard but not seL4 hard for everything.

"Clearly, when you write a database and want to ensure serializability or no loss of data on failure, those are the most important properties by far. "

Such properties are to be done in another tool then broken down into properties to test on local, code modules. I imagine they'd need at least one specialist for that kind of verification. A database is hard enough that they'd probably be able to hire one. In any case, the OP is using TLA+ to check properties Rust can't. This would be using TLA+ or some other tool to check properties SPARK can't. Similar territory. Also, recall the COGENT work where the key components of an ext2 filesystem was done with half the code in C without painful analysis of mutable state that seL4 had to do. Easier proofs with C code coming out that preserves those properties.

"As there is no research -- as far as I know -- that tries to classify what constitutes a "reasonable program","

There has been under LANGSEC and prior in Abstract, State Machines. They verified them sequentially then their interactions in composition. The thing many keep rediscovering is the model for expressing them has to keep a certain amount of sequential and simple operation. Past as certain point, it becomes impossible to analyze automatically with existing methods. I'm not sure how much work has been done outside the ASM or LANGSEC research on that. We get it incidentally at least during the formal verification efforts: every attempt tells us what can be expressed, proven, etc. Really ad hoc, though.

reply

[–] pron link

> Maintaining low effort, that narrows it down to Frama-C, Java/JML, and SPARK.

Not really. Frama-C aside, JML and SPARK simply cannot express those global correctness properties you naturally express in TLA+. Frama-C is really a set of ad-hoc tools.

> So, solution for this proposal is to try to encode correctness conditions from TLA+ or another analysis in SPARK.

But SPARK simply isn't rich enough to do that in general. You're right that if the TLA+ specification is at an abstraction level that's very close to the code, you can try to express some of those properties in SPARK, but I doubt even that is possible. Concurrent algorithms require liveness properties, and I'd be very surprised if SPARK could express them.

There is something interesting you could try, though: You could express global correctness properties in TLA+, and show (in TLA+) that they are preserved if some simple low-level properties are preserved, and then express and verify those in JML/SPARK. This is very interesting, and I'd love to hear about such an experience (or even try it myself), but my gut feeling is that that it would still be too costly. If someone does do that, however, that's something certainly publication-worthy.

> One can see that in ACL2, Isabelle/HOL and Coq ecosystems for projects that actually use what's already there. Or even in CompCert work where new CompSci just keeps adding new components to it to save time/effort.

But I still don't know of a single nontrivial real-world project that has been affordably verified, even with all that reuse.

> The other thing is that seL4 targeted C language. That language wasn't designed for verification...

Again, everything you say is true, but still there are no examples of affordably end-to-end-verified real-world projects.

> I imagine they'd need at least one specialist for that kind of verification.

Not if TLA+ is used. Amazon (and I, too) specify and verify such properties all the time and they (and I) are not verification experts.

> Also, recall the COGENT work where the key components of an ext2 filesystem was done with half the code in C without painful analysis of mutable state that seL4 had to do. Easier proofs with C code coming out that preserves those properties.

I know the person behind Cogent (he'd also worked on seL4), but he points out that the language targets a very narrow domain. Most programs you can think of cannot be written in Cogent.

> Really ad hoc, though.

That pretty much sums up all of formal methods at this stage, especially end-to-end ones. TLA+ is relatively rare in its universality and depth, but it intentionally gives up on end-to-end both because it is currently infeasible (in general) and because 99.99% of software simply doesn't require such strong guarantees.

reply

[–] nickpsecurity link

"There is something interesting you could try, though: You could express global correctness properties in TLA+, and show (in TLA+) that they are preserved if some simple low-level properties are preserved, and then express and verify those in JML/SPARK. This is very interesting, and I'd love to hear about such an experience (or even try it myself), but my gut feeling is that that it would still be too costly. If someone does do that, however, that's something certainly publication-worthy."

That might have been how E-SPARK connected Event-B and SPARK. TLA+ is easier to use than Evdnt-B and often in similar domains. Ill try to remember this option.

reply

[–] codemac link

As Lamport says: why do we always use blueprints before constructing a building?

I agree there are some missing links, but checking your model before doing detailed implementation is a good idea regardless

reply

[–] unboxed_type link

Absolutely. Anyway this gap is worth mentioning for deeper understanding of a verification problem.

reply

[–] microcolonel link

For seL4, models were used to map all the way from the semantics of the instruction set to the high level model; this included a model of equivalence between the C source code and the machine code. Took them basically from 2006 to 2014 to do that (including the time to author and verify the kernel in particular).

reply

[–] unboxed_type link

Lets say you model-checked some distributed algorithm with TLA+. You then implement it in Rust. How are you going to check that your implementation implements exactly the algorithm you have checked and not some other algorithm which looks very similar?

I think the phrase 'reliable systems' is more appropriate to what you are up to, as opposed to the phrase 'correct systems' which usually corresponds to formal verification.

reply

[–] vog link

Just curious: Are there also attempts in the reverse direction?

That is, are there attempts to improve the Rust language and/or optimizer such that more and more "unsafe" sections can be replaced by safe code that has the same performance? (maybe even compiles down to the same machine code)

reply

[–] ekidd link

> That is, are there attempts to improve the Rust language and/or optimizer such that more and more "unsafe" sections can be replaced by safe code that has the same performance?

I write a fair bit of production and hobby Rust, some of it very performance-intensive. And at this point, 'unsafe' is already very rare in the code I see. The most common uses:

1. For implementing certain core data structures based on pointers. This is pretty rare—there are good implementations of just about everything I need, either from the 'std::collections' library (which uses a moderate amount of 'unsafe') or from third parties (who tend to use much less 'unsafe').

2. For the FFI. Again, this is increasingly rare: More and more key libraries are now available as pure Rust, and the "big" C libraries typically have well-maintained bindings.

3. Theoretically, for performance. But I've written tons of high-performance, benchmarked Rust without ever touching 'unsafe'.

And I do feed a fair bit of my code through fuzzers, and they've never found an exploitable weakness.

So my gut feeling? There really isn't much 'unsafe' out there. And the small amount of 'unsafe' code that exists is probably—based on my experience fuzzing it—"safer" than all but the very best C/C++ code. I can live with that for now, on a strictly practical level.

reply

[–] dikaiosune link

There are already a lot of cases where LLVM can successfully optimize away things like safe Rust's bounds checks in loops.

There are also efforts to include "const generics" which will let you use e.g. array lengths as generics that may be able to teach LLVM new optimizations. Maybe someday in the far future that will grow into some kind of dependent typing extension for Rust? That could potentially allow for safe, bounds-check free array access that would currently require unsafe. But I don't know if that would allow for proving the kinds of things this article discusses.

Ultimately though, the unsafe subset is pretty small and is generally in my experience only interesting when building heavily reused abstractions (the kinds of lock-free algorithms OP discusses for example). It'll be really cool if these efforts expose some ways to shrink reliance on unsafe code, but I'm personally not sweating it.

reply

[–] sanxiyn link

There is interest in enabling safe interface, that is, improving Rust the language so that you can provide safe interface with better performance. A good example is so-called "streaming iterator".

There is less interest in changing Rust so that currently unsafe implementation (not interface) of safe interface can be made safe, that is, verified by compiler. As grandparent stated, and I agree, as long as interface is safe I don't think it is important that implementation is verified by compiler (hence safe) or some external tool or hand proof.

reply

[–] michaelmior link

Maybe I'm missing something, but I could write an interface labelled sage on top of any unsafe code I want. Why doesn't it matter that my unsafe implementation is proven correct?

reply

[–] kibwen link

This is going to sound tautological, but the unsafe keyword is for operations that the compiler can't prove. The unsafe keyword only unlocks a very small number of possible new operations (it doesn't turn off any existing checks), and those operations make unsafe Rust as freeform as C; if you could prove unsafe Rust correct then you could just prove C code correct.

reply

[–] nickpsecurity link

Which Frama-C, Simpl/C, and Astree Analyzer all do. Then that C code is quite trustworthy. Rust has stuff like Simpl in the works but needs something like Frama-C or SPARK for unsafe.

reply

[–] kibwen link

Can you give me an example of something that safe Rust doesn't allow, thereby requiring the unsafe keyword to implement, but that also contains an error that passes the Rust compiler silently but that Frama-C or SPARK would statically detect?

reply

[–] nickpsecurity link

That's a trick question: you know I don't know know Rust enough to answer it. :P However, I fo know the things I mentioned can prove safety properties about unsafe code. Rust compiler can't per these threads. So, If A then B...?

reply

[–] kibwen link

But whether SPARK et al can prove properties about unsafe code isn't the pertinent topic. SPARK and friends are subsets if their respective languages, implying that there are features of those languages that, if their use is not restricted, inhibit the production of correctness proofs. Rust already has such a thing, by default, in the form of non-unsafe Rust, and the notion of "unsafe" in the context of Rust and Frama-C is not guaranteed to be the same. So what I'm asking is for you to demonstrate something that these other languages can prove that safe Rust can't.

By the way, you've been in Rust threads enough years now championing safe languages that you really no longer have any excuse for not learning Rust. :P

reply

[–] nickpsecurity link

"But whether SPARK et al can prove properties about unsafe code isn't the pertinent topic. "

I thought that these worries about unsafe code in Rust indicated people wanted a way to ensure it's as safd as possible. Traditionally, unsafety the language itself cant handle is dealt with via external tools for analysis, proving, and testing. These exist for unsafe code in Ada, C, C+×, Java, and SPARK. Tool-assisted, subsets of each are deployed in safety-critical industries. Anyone doing unsafe codd in them will get more robustness than unsafe code in Rust.

So, you already have protection measures in safe Rust. Other tools, esp SPARK Ada and C enhancements, have them for unsafe code that Rust does nothing to protect. That's a gap in capabilities Rust needs to close. Until then, it's on the table when choosing which to use for unsafe code. Note that Im also a big fan of mixing and matching where we used the C or SPARK provers on unsafe Rust coded to be semantically equivalent. I just know most will prefer Rust to handle this itself.

"By the way, you've been in Rust threads enough years now championing safe languages that you really no longer have any excuse for not learning Rust. :P"

Haha. I do plan to learn it for my Brute Force Assurance concept. Remember I have a brain injury, though, that knocked out memory and good chunk of learning pace on top of full-time job. That plus a shit-ton of broad R&D leaves little time for coding.

reply

[–] solidsnack9000 link

It wouldn't surprise me if even Frama-C and SPARK have a "trust me" annotation.

reply

[–] nickpsecurity link

Pretty much all languages w/ formal specifications do. The reason is that some specification claims are assumed/reasonable, proven outside the tool, or too hard to handle immediately. This can also be used as a maybe later. I'm not sure what theirs are.

reply

[–] dbaupp link

It would be great if it was, but it's also infeasible for a lot of code (at the moment anyway): it takes too much time and effort, and, in the FFI case, requires proving a whole external library correct too.

reply

[–] dikaiosune link

There's a lot of interest in formally verifying things about the unsafe subset of Rust, for example the Rust Belt project: http://plv.mpi-sws.org/rustbelt/. One thing I've not well understood is how these efforts may be affected by some of the conversations around the unsafe code guidelines effort:

https://github.com/rust-lang/rfcs/pull/1643 https://github.com/nikomatsakis/rust-memory-model

reply

[–] burntsushi link

The benefit of Rust is not necessarily to completely avoid unsafe (although, that does happen in practice a lot). The benefit is that unsafe can be buttoned up behind a safe API, hopefully in a way that is Zero Cost. So you might have some unsafe core that you need to pay extra special attention to, but the rest of your code can be in safe Rust.

For example, I maintain a Snappy compression crate in Rust. It uses quite a bit of unsafe internally (mostly for being too clever with unaligned loads/stores), but any users of that crate never need to utter `unsafe`. If there's a bug somewhere, you can be confident it's not in the code that uses Snappy, but rather in Snappy itself, which is presumably a much smaller surface area than "all of the code."

reply

[–] mhh__ link

That's still not really unique to rust, D (at least) does that too. However, good point nonetheless.

reply

[–] burntsushi link

Sure, it's not strictly unique. There are very few language features in any language that are unique. Language design is an evolutionary process. I wish I knew D better so I could provide a better comparison, but if I am to believe[1] and allow myself to wave my hands a bit, then there's a lot more you can do in safe Rust than in safe D. That's important, because if you wind up needing `unsafe` a lot, then you start to lose the possibility of buttoning up a small amount unsafe code behind a safe interface. (This is NOT to say that D's feature here is worthless, but the languages have very different designs and very different trade offs.)

Steve's point is also pertinent. Safe Rust is the default, so it's very easy to audit for places in your code that need extra care.

[1] - https://dlang.org/spec/memory-safe-d.html

reply

[–] steveklabnik link

A big difference is the defaults; Rust made an explicit design decision to make unsafe the default, to mark it with a keyword and to make unsafe a superset of safe. It is totally true that these things aren't unique, but there is a lot of advantage for this specific set of choices in combination.

reply

[–] staticassertion link

> Rust made an explicit design decision to make unsafe the default

You mean 'safe the default'.

reply

[–] nickpsecurity link

C programmers could've had so much fun with that quote. ;)

reply

[–] steveklabnik link

Yes, whoops!

reply

[–] kbenson link

I think the specific portion in that paragraph you should focus on is:

We are giving up a significant benefit of Rust for certain very high-performance chunks of this system. In place of Rust's compiler, we use the TLA+ model checker to gain confidence in the correctness of our system!

With special emphasis on "very high-performance chunks of this system". There's still quire a lot to be gained by having a a lot, or even most, of the system with certain aspects of safety assured. It's not an all-or-nothing proposition, and that's a feature.

Personally I think it's very cool to then take these portions that they can't or won't use the built-in rust safety mechanisms for and do modeling to provide extra levels of assurance. It's probably also quite a boon to this process to have high assurances for much of the system as it may reduce the space (states) which you must model.

reply

[–] solidsnack9000 link

It can be seen in terms of the 80/20 rule. Most of the code doesn't need this kind of unsafety; but with C or C++ it is all more or less treated as though it did.

reply

[–] cabaalis link

Rust-ignorant here; I was reading your "why rust" section. It contains a lot of information about how safe the resulting code is and how that is such a great benefit, and that's why Rust was selected. But then it has this statement: "However, it needs to be noted that when creating lock-free high-performance algorithms, we are going to need to sidestep the safety guarantees of the compiler." .... So why Rust?

reply

[–] eridius link

How does wrapping up code in a closure "fix" the need for comments? And what's wrong with having comments anyway? You haven't demonstrated that comments are an "anti-pattern", all you've demonstrated is that you personally prefer a different style.

reply

[–] EGreg link

Whenever you put a comment, chances are you should have encapsulated the block there into a reusable block of code.

reply

[–] eridius link

Why? You're saying that without any supporting evidence. And frankly, what you're suggesting sounds to me like an unreadable mess. You're going to bloat your code with closures all over the place, simply so you can attach a name to the closure, all to avoid having a simple comment. Naming closures is not a good substitute for comments.

reply

[–] hoodunit link

Naming closures or functions absolutely is a good substitute for a large class of comments. When you write a comment as a function name, that comment gets maintained when the code changes. It also encourages modularizing your code- often when you write a comment it's a sign that your code is trying to operate conceptually on more than one level and it would be more understandable to separate those levels.

I wouldn't say comments are an antipattern but they are a code smell and should be used sparingly and only where the same explanation can't be given through function/variable naming or refactoring.

reply

[–] EGreg link

If you're working with a language that has closures, I started to have a heuristic:

  // if you are writing a comment like this
  // explaining what the next block of code does
  // chances are you should just refactor it
  // into a closure
  _doWhateverCommentSaid(arg1, arg2, arg3);
it's true that this needlessly adds stack overhead, but this way, you are more likely to structure your code properly, re-use it properly (which more than pays for the stack overhead), and move your closure / method to be more and more global as needed.

Perhaps if you have a comment like this:

  f(x); // x matches some assumption
then in fact, you should add validation to f itself, in the form of a function, same as above.

  function f(x) {
     _theAssumption(x);
  }
this habit encourages re-use and proper structuring of code.

in short, comments may be an anti-pattern!

reply

[–] pron link

You just define it :) TLA+ doesn't have a built-in memory model, nor any concept of memory whatsoever. You define your constructs at the level of detail you think is important. Weak memory can be defined using a CPU-local memory that is occasionally reconciled with RAM or other core-local memory.

reply

[–] nicknash link

For lock-free data structures, how does this verification encode the memory model? E.g. high-quality model checkers for the C++11 memory model allow for outcomes inconsistent with the execution order of the code (in addition to outcomes that aren't sequentially consistent, but are consistent with the execution order). They also work on unmodified source. In the past I've seen SPIN/Promela used as a tool for concurrency-checking, but it's silent on memory-models (implicitly sequentially consistent)

reply

[–] dmix link

> PlusCal (formerly called +CAL) is a formal specification language created by Leslie Lamport, which transpiles to TLA+

https://www.wikiwand.com/en/PlusCal

reply

[–] mcguire link

Right, but Prolog doesn't use begin/end's.

reply

[–] mcguire link

"Pluscal has two forms, c and p. They are functionally identical, but c form uses braces and p form uses prolog/ruby-esque begin and end statements that can be a little easier to spot errors with, in my opinion."

Nit: that should be Pascal/Ruby, no?

reply

[–] unboxed_type link

Have you tried it yourself or maybe you know someone who have? Honestly, I think that this development is highly impractical due to very high entrance ticket price for someone not belonging to UW PLSE group ;-)

reply

[–] tschottdorf link

https://github.com/uwplse/verdi is a good example of this (it uses coq and extractions along with minimal glue).

reply

[–] lenage link

thanks for great work

reply

[–] akoster link

Same here!

reply

[–] alexnewman link

I like this a lot

reply