David A. Wheeler's Blog

Sat, 02 Mar 2019

Don’t Use ISO/IEC 14977 Extended Backus-Naur Form (EBNF)

Sometimes people want to do something, find a standard, and do not realize the downsides of using that standard. I have an essay in that genre titled Don’t Use ISO/IEC 14977 Extended Backus-Naur Form (EBNF). The problem is that although there is a ISO/IEC 14977:1996 specification, in most cases you should not use it. If you have to write a specification for a programming language or complex data structure, please take a look at why I think that!

path: /misc | Current Weblog | permanent link to this entry

Mon, 29 Oct 2018

The allsome quantifier

For over 100 years formal logic has routinely represented ideas like “all Xs are Ys” using something called the “for all quantifier”, abbreviated as ∀. This lets people mathematically represent statements like “All Martians are green”. This is really important today, because these mathematical statements can be used to determine if systems work correctly; in some cases this can be used to save lives.

However, there is a problem. It is easy to mistranslate informal statements into formal logic, and these errors can cause serious problems. For example, in formal logic, the normal way to represent the statements “All Martians are green” and “All Martians are not green” can be simultaneously true (namely, when there are no Martians).

Informal statements with this format often embed the assumption that the situation occurs (that is, that there is at least one Martian). This mismatch between formal logic and informal statements could lead to problems, and conceivably to deaths.

I propose a small solution - a new formal logic quantifier called “allsome” (aka “all some”) that is designed to make some of these mistranslations less likely. The allsome quantifier, abbreviated ∀!, simultaneously expresses for all (∀) and there exists (∃) in a way that models many informal statements. I hope that this new quantifier will reduce the risk of mistranslations of informal statements into formal expressions, and that others will eventually agree that allsome is awesome.

For more information, see The allsome quantifier.

path: /misc | Current Weblog | permanent link to this entry

Tue, 23 Oct 2018

Do not install or develop mobile apps unless you have to

If you are thinking about installing another mobile app on your smartphone, or thinking about developing a mobile app, I have a simple recommendation: Don’t do it unless you must do it to get the capability you want. In many cases, using or developing just a web application is the better choice.

First, if you install a mobile application, that mobile application often has a lot of access to information that you probably don’t want it to have. What’s more, many of the organizations developing mobile apps have a business model that involves subtly getting as much personal data about you as possible, and selling that to others. Smartphone operating systems have security mechanisms that try to reduce the impact of these problems, by providing some control over what the application can access, and that’s a good thing. However, it’s hard to stem the tide because smartphone operating systems do have direct access to a lot of data about you. Smartphone operating systems have direct access to your location, fixed personal identifiers like your cell phone number, and often have direct access to information such as your list of contacts. There are cases where it needs to be shared, so in the name of “making things easy” these mobile applications often end up with far more privileges than they should have.

In contrast, web browsers have long had to counter web applications that try to extract data from you. They certainly do not prevent all problems, but they are designed so that they do not give away location, cell phone numbers, or your contact list so easily. Many services are perfectly workable through web browsers instead of web applications, at least for typical uses. This includes sites such as Youtube, Facebook, and Twitter. In addition, once you do not install those apps, you will have room for applications that really do need to be a mobile app… and thus will not need to upgrade your smartphone as often.

And if you’re thinking about developing a mobile app: Don’t do it, at least without seeing if there’s a viable alternative. For many situations, creating mobile applications is a huge waste of money. The United Kingdom essentially banned the development of mobile applications, noting that mobile applications are “very expensive to produce, and they’re very very expensive to maintain because you have to keep updating them when there are software changes.” A related problem is that if you develop mobile apps you typically have to write at least three versions of the software: an iOS mobile app, an Android mobile app, and a web application so people can use large screens. If you’re a commercial organization, a mobile application can not only be costly to develop (in part because you will have to develop the application multiple times), but in practice you’ll have to pay a large cut of your revenue to Google and Apple (typically 15% of your revenue). Are you really sure you want to spend or lose that much money?

What’s more, web applications can increasingly be used instead of mobile/native applications, even in cases where it was impossible at one time. I should first note that I’m using an expansive definition of “web application” here - I just mean anything you can access using a web browser. Historically, the main reason you needed to create a mobile or native application was because you needed the application to work offline. However, service workers now make it practical to create many offline applications and are widely available. Internet Explorer (IE) is the only major browser that does not support service workers, but people who have IE can easily use or install a modern web browser instead (such as Firefox, Chrome, or Edge). There are other ways to develop offline web applications too. Another reason to create mobile or native applications was speed, but recent JavaScript optimization work has made web applications much faster. In addition, WebAssembly makes it possible to create some applications that run far faster and is supported by Chrome, Firefox, Safari, and Edge. When developing web applications to handle all devices you need to use responsive web design, but that is already good practice and widely supported. Since April 2015 Google’s search engine has penalized web sites that are not mobile-friendly; this really is not anything new. Users can easily bookmark your web application to get there later, too.

There are many advantages of just developing a web application. There are a huge number of tools already developed to help you develop web applications, for one thing. Many of those tools are open source software and no-cost. A large number of people already know how to develop web applications, too. Perhaps most importantly, web applications are standards-based; you are not locked into any one organization’s ecosystem.

Most obviously: If you are already going to develop a web application, maybe you don’t need to write two more versions of the same software and try to maintain them too. There are tools that let you try to write software that ports between iOS and Android, but software you do not need to write at all is the easiest to maintain.

Do you need to create a vanity mobile app for your conference so that attendees can see what talks are where? Often the answer is no. Do you need a mobile app so that people can fill in a simple form (such as for government services)? Again, the answer is often no.

Of course, there are perfectly valid reasons to create a mobile application (or other kind of native application) for end-users, and that means there are good reasons to install a mobile app. Some applications require access to specialized device services that are not accessible from a web browser, or have speed requirements beyond what a web browser can currently provide. In addition, some apps are older and are not likely to be rewritten. But today I find that many people are not thinking about the alternatives, and ignoring alternatives is a mistake. Before installing or creating (multiple) mobile applications, ask yourself, do I need to?. If you don’t need to do that, and can use or create a web application instead, you might be able to save yourself a lot of trouble.

path: /misc | Current Weblog | permanent link to this entry

Thu, 23 Nov 2017

FCC Votes against the People and Net Neutrality: Freedom is Slavery

To the surprise of no one, the US FCC led by Ajit Pai has finally issued the order to kill net neutrality.

In short, the FCC is voting to directly harm the US people and instead aid the monopolist Intenet Service Providers (ISPs). More information is on Tech Dirt.

This is inexcusable. Competition is often the best way to get good results, but for various reasons customers often cannot practically choose ISPs; in many cases they are essentially monopolies or duopolies. Where competition does not effectively exist, there must be regulation to prevent the monopolists from exploiting their customers, and the FCC has decided to expressly reject their duty to the people of the United States.

Orwell would be proud of the order’s name, “Restoring Internet Freedom”. Remember, Freedom is Slavery!

I’m sure we have not heard the end of this. This entire process was filled with fraud, with sock puppets proposing to end net neutrality while real people were ignored. All Americans need to make it clear to their representatives that Internet access is important, and that ISPs must be required to neutral carriers, instead of giving preferences to some sites or charging extra for some sites. I recommend voting against any representatives who fail to protect Internet access, as the FCC has failed to do.

path: /misc | Current Weblog | permanent link to this entry

Wed, 22 Oct 2014

Peter Miller and improving make

First, a sad note. Peter Miller died on 2014-07-27 from leukemia. He did a lot of important things, including writing the influential paper “Recursive Make Considered Harmful”. Rest in peace.

I should point out an essay I’ve written about improving make. Make is widely used, but the POSIX standard for it lacks key functions, including vital ones that Peter Miller pointed out years ago. Thankfully, progress has been made. My hope is that progress will continue to happen; I welcome help for improving the standard and implementations of make.

path: /misc | Current Weblog | permanent link to this entry

Thu, 09 Oct 2014

A tester walks into a bar

A tester walks into a bar and orders a beer. Then he orders 999,999 beers. Orders 0 beers. Orders -1 beers. Orders a coyote. Orders a qpoijzcpx. Then he insults the bartender.

This joke (with variations) is making the rounds, but it also has a serious point. It’s a nice example of how testing should work, including software testing.

Too many of today’s so-called software “tests” only check for correct data. This has led to numerous vulnerabilities including Heartbleed and Apple’s “goto fail; goto fail;” vulnerability. The paper “The Most Dangerous Code in the World: Validating SSL Certificates in Non-Browser Software” found that a disturbingly large number of programss’ security depends on SSL certificate validation, and they are insecure because no one actually tested them with invalid certificates. They note that “most of the vulnerabilities we found should have been discovered during development with proper unit testing”.

Good software testing must include negative testing (tests with data that should be rejected) to ensure that the software protects itself against bad data. This must be part of an automated regression test suite (re-run constantly) to prevent problems from creeping in later. For example, if your programs accept numbers, don’t just test for “correct” input; test for wrong cases like too big, zero, negative or too small, and non-numbers. Testing “just around” too big and too small numbers is often helpful, too, as is testing that tries to bypass the interface. Your users won’t know how you did it, but they’ll know your program “just works” reliably.

path: /misc | Current Weblog | permanent link to this entry

Mon, 14 Oct 2013

Readable Lisp version 1.0.0 released!

Lisp-based languages have been around a long time. They have some interesting properties, especially when you want to write programs that analyze or manipulate programs. The problem with Lisp is that the traditional Lisp notation - s-expressions - is notoriously hard to read.

I think I have a solution to the problem. I looked at past (failed) solutions and found that they generally failed to be general or homoiconic. I then worked to find notations with these key properties. My solution is a set of notation tiers that make Lisp-based languages much more pleasant to work with. I’ve been working with many others to turn this idea of readable notations into a reality. If you’re interested, you can watch a short video or read our proposed solution.

The big news is that we have reached version 1.0.0 in the readable project. We now have an open source software (MIT license) implementation for both (guile) Scheme and Common Lisp, as well as a variety of support tools. The Scheme portion implements the SRFI-105 and SRFI-110 specs, which we wrote. One of the tools, unsweeten, makes it possible to process files in other Lisps as well.

So what do these tools do? Fundamentally, they implement the 3 notation tiers we’ve created: curly-infix-expressions, neoteric-expressions, and sweet-expressions. Sweet-expressions have the full set of capabilities.

Here’s an example of (awkward) traditional s-expression format:

(define (factorial n)
  (if (<= n 1)
    1
    (* n (factorial (- n 1)))))

Here’s the same thing, expressed using sweet-expressions:

define factorial(n)
  if {n <= 1}
    1
    {n * factorial{n - 1}}

I even briefly mentioned sweet-expressions in my PhD dissertation “Fully Countering Trusting Trust through Diverse Double-Compiling” (see section A.3).

So if you are interested in how to make Lisp-based languages easier to read, watch our short video about the readable notations or download the current version of the readable project. We hope you enjoy them.

path: /misc | Current Weblog | permanent link to this entry

Tue, 06 Aug 2013

Don’t anthropomorphize computers, they hate that

A lot of people who program computers or live in the computing world ‐ including me ‐ talk about computer hardware and software as if they are people. Why is that? This is not as obvious as you’d think.

After all, if you read the literature about learning how to program, you’d think that programmers would never use anthropomorphic language. “Separating Programming Sheep from Non-Programming Goats” by Jeff Atwood discusses teaching programming and points to the intriguing paper “The camel has two humps” by Saeed Dehnadi and Richard Bornat. This paper reported experimental evidence on why some people can learn to program, while others struggle. Basically, to learn to program you must fully understand that computers mindlessly follow rules, and that computers just don’t act like humans. As their paper said, “Programs… are utterly meaningless. To write a computer program you have to come to terms with this, to accept that whatever you might want the program to mean, the machine will blindly follow its meaningless rules and come to some meaningless conclusion… the consistent group [of people] showed a pre-acceptance of this fact: they are capable of seeing mathematical calculation problems in terms of rules, and can follow those rules wheresoever they may lead. The inconsistent group, on the other hand, looks for meaning where it is not. The blank group knows that it is looking at meaninglessness, and refuses to deal with it. [The experimental results suggest] that it is extremely difficult to teach programming to the inconsistent and blank groups.” Later work by Saeed Dehnadi and sometimes others expands on this earlier work. The intermediate paper “Mental models, Consistency and Programming Aptitude” (2008) seemed to have refuted the idea that consistency (and ignoring meaning) was critical to programming, but the later “Meta-analysis of the effect of consistency on success in early learning of programming” (2009) added additional refinements and then re-confirmed this hypothesis. The reconfirmation involved a meta-analysis of six replications of an improved version of Dehnadi’s original experiment, and again showed that understanding that computers were mindlessly consistent was key in successfully learning to program.

So the good programmers know darn well that computers mindlessly follow rules. But many use anthropomorphic language anyway. Huh? Why is that?

Some do object to anthropomorphism, of course. Edjar Dijkstra certainly railed against anthropomorphizing computers. For example, in EWD854 (1983) he said, “I think anthropomorphism is the worst of all [analogies]. I have now seen programs ‘trying to do things’, ‘wanting to do things’, ‘believing things to be true’, ‘knowing things’ etc. Don’t be so naive as to believe that this use of language is harmless.” He believed that analogies (like these) led to a host of misunderstandings, and that those misunderstandings led to repeated multi-million-dollar failures. It is certainly true that misunderstandings can lead to catastrophe. But I think one reason Dijkstra railed particularly against anthropomorphism was (in part) because it is a widespread practice, even among those who do understand things ‐ and I see no evidence that anthropomorphism is going away.

The Jargon file specifically discusses anthropomorphization: “one rich source of jargon constructions is the hackish tendency to anthropomorphize hardware and software. English purists and academic computer scientists frequently look down on others for anthropomorphizing hardware and software, considering this sort of behavior to be characteristic of naive misunderstanding. But most hackers anthropomorphize freely, frequently describing program behavior in terms of wants and desires. Thus it is common to hear hardware or software talked about as though it has homunculi talking to each other inside it, with intentions and desires… As hackers are among the people who know best how these phenomena work, it seems odd that they would use language that seems to ascribe consciousness to them. The mind-set behind this tendency thus demands examination. The key to understanding this kind of usage is that it isn’t done in a naive way; hackers don’t personalize their stuff in the sense of feeling empathy with it, nor do they mystically believe that the things they work on every day are ‘alive’.”

Okay, so others have noticed this too. The Jargon file even proposes some possible reasons for anthropomorphizing computer hardware and software:

  1. It reflects a “mechanistic view of human behavior.” “In this view, people are biological machines - consciousness is an interesting and valuable epiphenomenon, but mind is implemented in machinery which is not fundamentally different in information-processing capacity from computers… Because hackers accept that a human machine can have intentions, it is therefore easy for them to ascribe consciousness and intention to other complex patterned systems such as computers.” But while the materialistic view of humans has respectible company, this “explanation” fails to explain why humans would use anthropomorphic terms about computer hardware and software, since they are manifestly not human. Indeed, as the Jargon file acknowledges, even hackers who have contrary religious views will use anthropological terminology.
  2. It reflects “a blurring of the boundary between the programmer and his artifacts - the human qualities belong to the programmer and the code merely expresses these qualities as his/her proxy. On this view, a hacker saying a piece of code ‘got confused’ is really saying that he (or she) was confused about exactly what he wanted the computer to do, the code naturally incorporated this confusion, and the code expressed the programmer’s confusion when executed by crashing or otherwise misbehaving. Note that by displacing from “I got confused” to “It got confused”, the programmer is not avoiding responsibility, but rather getting some analytical distance in order to be able to consider the bug dispassionately.”
  3. “It has also been suggested that anthropomorphizing complex systems is actually an expression of humility, a way of acknowleging that simple rules we do understand (or that we invented) can lead to emergent behavioral complexities that we don’t completely understand.”

The Jargon file claims that “All three explanations accurately model hacker psychology, and should be considered complementary rather than competing.” I think the first “explanation” is completely unjustified. The second and third explanations do have some merit. However, I think there’s a simpler and more important reason: Language.

When we communicate with a human, we must use some language that will be more-or-less understood by the other human. Over the years people have developed a variety of human languages that do this pretty well (again, more-or-less). Human languages were not particularly designed to deal with computers, but languages have been honed over long periods of time to discuss human behaviors and their mental states (thoughts, beliefs, goals, and so on). The sentence “Sally says that Linda likes Tom, but Tom won’t talk to Linda” would be understood by any normal seven-year-old girl (well, assuming she speaks English).

I think a primary reason people anthropomorphic terminology is because it’s much easier to communicate that way when discussing computer hardware and software using existing languages. Compare “the program got confused” with the overly long “the program executed a different path than the one expected by the program’s programmer”. Human languages have been honed to discuss human behaviors and mental states, so it is much easier to use languages this way. As long as both the sender and receiver of the message understand the message, the fact that the terminology is anthropomorphic is not a problem.

It’s true that anthropomorphic language can confuse some people. But the primary reason it confuses some people is that they still have trouble understanding that computers are mindless ‐ that computers simply do whatever their instructions tell them. Perhaps this is an innate weakness in some people, but I think that addressing this weakness head-on can help counter it. This is probably a good reason for ensuring that people learn a little programming as kids ‐ not because they will necessarily do it later, but because computers are so central to the modern world that people should have a basic understanding of them.

path: /misc | Current Weblog | permanent link to this entry

Sun, 10 Mar 2013

Readable Lisp: Sweet-expressions

I’ve used Lisp-based programming languages for decades, but while they have some nice properties, their traditional s-expression notation is not very readable. Even the original creator of Lisp did not particularly like its notation! However, this problem turns out to be surprisingly hard to solve.

After reviewing the many past failed efforts, I think I have figured out why they failed. Past solutions typically did not work because they failed to be general (the notation is independent from any underlying semantic) or homoiconic (the underlying data structure is clear from the syntax). Once I realized that, I devised (with a lot of help from others!) a new notation, called sweet-expressions (t-expressions), that is general and homoiconic. I think this creates a real solution for an old problem.

You can download and try out sweet-expressions as released by the Readable Lisp S-expressions Project by downloading our new version 0.7.0 release.

If you’re interested, please participate! In particular, please participate in the SRFI-110 sweet-expressions (t-expressions) mailing list. SRFIs let people write specifications for extensions to the Scheme programming language (a Lisp), and this SRFI lets people in the Scheme community discuss it.

The following table shows what an example of traditional (ugly) Lisp s-expressions, the same thing in sweet-expressions, and a short explanation.

s-expressions Sweet-expressions (t-expressions) Explanation
(define (fibfast n)
  (if (< n 2)
    n
    (fibup n 2 1 0)))
define fibfast(n)
  if {n < 2}
    n
    fibup n 2 1 0
Typical function notation
Indentation, infix {...}
Single expr = no new list
Simple function calls

path: /misc | Current Weblog | permanent link to this entry

Sun, 12 Aug 2012

Readable s-expressions for Lisp-based languages: Lots of progress!

Lots has been happening recently in my effort to make Lisp-based languages more readable. A lot of programming languages are Lisp-based, including Scheme, Common Lisp, emacs Lisp, Arc, Clojure, and so on. But many software developers reject these languages, at least in part because their basic notation (s-expressions) is very awkward.

The Readable Lisp s-expressions project has a set of potential solutions. We now have much more robust code (you can easily download, install, and use it, due to autoconfiscation), and we have a video that explains our solutions. The video on readable Lisp s-expressions is also available on Youtube.

We’re now at version 0.4. This version is very compatible with existing Lisp code; they are simply a set of additional abbreviations. There are three tiers: curly-infix expressions (which add infix), neoteric-expressions (which add a more conventional call format), and sweet-expressions (which deduce parentheses from indentation, reducing the number of required parentheses).

Here’s an example of (awkward) traditional s-expression format:

(define (factorial n)
  (if (<= n 1)
    1
    (* n (factorial (- n 1)))))

Here’s the same thing, expressed using sweet-expressions:

define factorial(n)
  if {n <= 1}
    1
    {n * factorial{n - 1}}

A sweet-expression reader could accept either format, actually, since these tiers are simply additional abbreviations and adjustments that you can make to an existing Lisp reader. If you’re interested, please go to the Readable Lisp s-expressions project web page for more information and an implementation - and please join us!

path: /misc | Current Weblog | permanent link to this entry

Fri, 20 Jan 2012

Website back up

This website (www.dwheeler.com) was down part of the day yesterday due to a mistake made by my web hosting company. Sorry about that. It’s back up, obviously.

For those who are curious what happened, here’s the scoop. My hosting provider (WebHostGiant) moved my site to a new improved computer. By itself, that’s great. That new computer has a different IP address (the old one was 207.55.250.19, the new one is 208.86.184.80). That’d be fine too, except they didn’t tell me that they were changing my site’s IP address, nor did they forward the old IP address. The mistake is that the web hosting company should have notified me of this change, ahead of time, but they failed to do so. As a result, I didn’t change my site’s DNS entries (which I control) to point to its new location; I didn’t even know that I should, or what the new values would be. My provider didn’t even warn me ahead of time that anything like this was going to happen… if they had, I could have at least changed the DNS timeouts so the changeover would have been quick.

Now to their credit, once I put in a trouble ticket (#350465), Alex Prokhorenko (of WebhostGIANT Support Services) responded promptly, and explained what happened so clearly that it was easy for me to fix things. I appreciate that they’re upgrading the server hardware, I understand that IP addresses sometimes much change, and I appreciate their low prices. In fact, I’ve been generally happy with them.

But if you’re a hosting provider, you need to tell the customer if some change you make will make your customer’s entire site unavailable without the customer taking some action! A simple email ahead-of-time would have eliminated the whole problem.

Grumble grumble.

I did post a rant against SOPA and PIPA the day before, but I’m quite confident that this outage was unrelated.

Anyway, I’m back up.

path: /misc | Current Weblog | permanent link to this entry

Mon, 04 Jul 2011

U.S. government must balance its budget

(This is a blog entry for U.S. citizens — everyone else can ignore it.)

We Americans must demand that the U.S. government work to balance its budget over time. The U.S. government has a massive annual deficit, resulting in a massive national debt that is growing beyond all reasonable bounds. For example, in just Fiscal Year (FY) 2010, about $3.4 trillion was spent, but only $2.1 trillion was received; that means that the U.S. government spent more than a trillion dollars more than it received. Every year that the government spends more than it receives it adds to the gross federal debt, which is now more than $13.6 trillion.

This is unsustainable. The fact that this is unsustainable is certainly not news. The U.S. Financial Condition and Fiscal Future Briefing (GAO, 2008) says, bluntly, that the “Current Fiscal Policy Is Unsustainable”. “The Moment of Truth: Report of the National Commission on Fiscal Responsibility and Reform” similarly says “Our nation is on an unsustainable fiscal path”. Many others have said the same. But even though it’s not news, it needs to be yelled from the rooftops.

The fundamental problem is that too many Americans — aka “we the people” — have not (so far) been willing to face this unpleasant fact. Fareed Zakaria nicely put this in February 21, 2010: “ … in one sense, Washington is delivering to the American people exactly what they seem to want. In poll after poll, we find that the public is generally opposed to any new taxes, but we also discover that the public will immediately punish anyone who proposes spending cuts in any middle class program which are the ones where the money is in the federal budget. Now, there is only one way to square this circle short of magic, and that is to borrow money, and that is what we have done for decades now at the local, state and federal level … The lesson of the polls in the recent elections is that politicians will succeed if they pander to this public schizophrenia. So, the next time you accuse Washington of being irresponsible, save some of that blame for yourself and your friends”.

But Americans must face the fact that we must balance the budget. And we must face it now. We must balance the budget the same way families balance their budgets — the government must raise income (taxes), lower expenditures (government spending), or both. Growth over time will not fix the problem.

How we rellocate income and outgo so that they match needs to be a political process. Working out compromises is what the political process is supposed to be all about; nobody gets everything they want, but eventually some sort of rough set of priorities must be worked out for the resources available. Compromise is not a dirty word to describe the job of politics; it is the job. In reality, I think we will need to both raise revenue and decrease spending. I think we must raise taxes to some small degree, but we can’t raise taxes on the lower or middle class much; they don’t have the money. Also, we will not be able to solve this by taxing the rich out of the country. Which means that we must cut spending somehow. Just cutting defense spending won’t work; defense is only 20% of the entire budget. In contrast, the so-called entitlements — mainly medicare, medicaid, and social security — are 43% of the government costs and rapidly growing in cost. I think we are going to have to lower entitlement spending; that is undesirable, but we can’t keep providing services we can’t pay for. The alternative is to dramatically increase taxes to pay for them, and I do not think that will work. Raising the age before Social Security benefits can normally be received is to me an obvious baby step, but again, that alone will not solve the problem. It’s clearly possible to hammer out approaches to make this work, as long as the various camps are willing to work out a compromise.

To get there, we need to specify and plan out the maximum debt that the U.S. will incur in each year, decreasing that each year (say, over a 10-year period). Then Congress (and the President) will need to work out, each year, how to meet that requirement. It doesn’t need to be any of the plans that have been put forward so far; there are lots of ways to do this. But unless we agree that we must live within our means, we will not be able to make the decisions necessary to do so. The U.S. is not a Greece, at least not now, but we must make decisions soon to prevent bad results. I am posting this on Independence Day; Americans have been willing to undergo lots of suffering to gain control over their destinies, and I think they are still able to do so today.

In the short term (say a year), I suspect we will need to focus on short-term recovery rather than balancing the budget. And we must not default. But we must set the plans in motion to stop the runaway deficit, and get that budget balanced. The only way to get there is for the citizenry to demand it stop, before far worse things happen.

path: /misc | Current Weblog | permanent link to this entry

Sun, 10 Apr 2011

Innovations update

I’ve made various updates to my list of The Most Important Software Innovations. I’ve added Distributed Version Control System (DVCS); these are all over now in the form of git, Mercurial (hg), Bazaar, Monotone, and so on, but these were influenced by the earlier BitKeeper, which was in turn influenced by the earlier Teamware (developed by Sun starting in 1991). As is often the case, “new” innovations are actually much older than people realize. I also added make, originally developed in 1977, and quicksort, developed in 1960-1961 by C.A.R. (Tony) Hoare. I’ve also improved lots of material that was already there, such as a better description of the history of the remote procedure call (RPC).

So please enjoy The Most Important Software Innovations!

path: /misc | Current Weblog | permanent link to this entry

Sun, 15 Aug 2010

Geek Video Franchises

I have a new web page on silly game I title Geek Video Franchises. The goal of this game is to interconnect as many geek video franchises as possible via common actors. In this game, you’re only allowed to use video franchises that geeks tend to like.

For example: The Matrix connects to The Lord of the Rings via Hugo Weaving (Agent Smith/Elrond), which connects to Indiana Jones via John Rhys-Davies (Gimli/Sallah), which connects to Star Wars via Harrison Ford (Indiana Jones/Han Solo). The Lord of the Rings directly connects to Star Wars via Christopher Lee (Saruman/Count Dooku). Of course, Lord of the Rings also connects to X-men via Ian McKellen (Gandalf/Magneto), which connects to Star Trek via Patrick Stewart (Professor Xavier / Captain Jean-Luc Picard). Star Trek connects to Dr. Who via Simon Pegg (JJ Abrams’ Montgomery Scott/The Editor), which connects to Harry Potter via David Tennant (Dr. Who #10/Barty Crouch Jr.), which connects to Monty Python via John Cleese (Nearly Headless Nick/Lancelot, etc.).

So if you’re curious, check out Geek Video Franchises.

path: /misc | Current Weblog | permanent link to this entry

Sat, 03 Jul 2010

Opening files and URLs from the command line

Nearly all operating systems have a simple command to open up a file, directory, or URL from the command line. This is useful when you’re using the command line, e.g., xdg-open . will pop up a window in the current directory on most Unix/Linux systems. This capability is also handy when you’re writing a program, because these are easy to invoke from almost any language. You can then pass it a filename (to open that file using the default application for that file type), a directory name to start navigating in that directory (use “.” for the current directory), or a URL like “http://www.dwheeler.com” to open a browser at that URL.

Unfortunately, the command to do this is different on different platforms.

My new essay How to easily open files and URLs from the command line shows how to do this.

For example, on Unix/Linux systems, you should use xdg-open (not gnome-open or kde-open), because that opens the right application given the user’s current environment. On MacOS, the command is “open”. On Windows you should use start (not explorer, because invoking explorer directly will ignore the user’s default browser setting), while on Cygwin, the command is “cygstart”. More details are in the essay, including some gotchas and warnings.

Anyway, take a look at: How to easily open files and URLs from the command line

path: /misc | Current Weblog | permanent link to this entry

Sun, 21 Mar 2010

Using Wikipedia for research

Some teachers seem to lose their minds when asked about Wikipedia, and make absurd rules like “I forbid students from using Wikipedia”. A 2008 article states that Wikipedia is the encyclopedia “that most universities forbid students to use”.

But the professors don’t need to be such Luddites; it turns out that college students tend to use Wikipedia quite appropriately. A research paper titled How today’s college students use Wikipedia for course-related research examines Wikipedia use among college students; it found that Wikipedia use was widespread, and that the primary reason they used Wikipedia was to obtain background information or a summary about a topic. Most respondents reported using Wikipedia at the beginning of the research process; very few used Wikipedia near or at the end. In focus group sessions, students described Wikipedia as “the very beginning of the very beginning for me” or “a .5 step in my research process”, and that it helps primarily in the beginning because it provided a “simple narrative that gives you a grasp”. Another focus group participant called Wikipedia “my presearch tool”. Presearch, as the participant defined it, was “the stage of research where students initially figure out a topic, find out about it, and delineate it”.

Now, it’s perfectly reasonable to say that Wikipedia should not be cited as an original source; I have no trouble with professors making that rule. Wikipedia itself has a rule that Wikipedia does not publish original research or original thought. Indeed, the same is true for Encyclopedia Britannica or any other encyclopedia; encyclopedias are supposed to be summaries of knowledge gained elsewhere. You would expect that college work would normally not have many citations of any encyclopedia, be it Wikipedia or Encyclopedia Britannica, simply because encyclopedias are not original sources.

Rather than running in fear from new materials and techologies, teachers should be helping students understand how to use them appropriately, helping them consider the strengths and weaknesses of their information sources. Wikipedia should not be the end of any serious research, but it’s a reasonable place to start. You should supplement it with other material, for the simple reason that you should always examine multiple sources no matter where you start, but that doesn’t make Wikipedia less valuable. For younger students, there are reasonable concerns about inappropriate material (e.g., due to Wikipedia vandalism and because Wikipedia covers topics not appropriate for much younger readers), but the derivative “Wikipedia Selection for Schools” is a good solution for that problem. I’m delighted that so much information is available to people everywhere; we need to help people use these resources instead of ignoring them.

And speaking of which, if you like Wikipedia, please help! With a little effort, you can make it better for everyone. In particular, Wikipedia needs more video; please help the Video on Wikipedia folks get more videos on Wikipedia. This also helps the cause of open video, ensuring that the Internet continues to be open to innovation.

path: /misc | Current Weblog | permanent link to this entry

Sat, 06 Mar 2010

Robocopy

If you use Microsoft Windows (XP or some later version), and don’t have an allergic reaction to the command line, you should know about Robocopy. Robocopy (“robust file copy”) is a command-line program from Microsoft that copies collections of files from one place to another in an efficient way. Robocopy is included in Windows Vista, Windows 7, and Windows Server 2008. Windows XP and Windows Server 2003 users can download Robocopy for free from Microsoft as part of the Windows Server 2003 “Resource Kit Tools”.

Robocopy copies files, like the COPY command, but Robocopy will only copy a file if the source and destination have different time stamps or different file sizes. Robocopy is nowhere near as capable as the Unix/Linux “rsync” command, but for some tasks it suffices. Robocopy will not copy files that are currently open (by default it will repeatedly retry copying them), it can only do one-way mirroring (not bi-directional synchronization), it can only copy mounted filesystems, and it’s foolish about how it copies across a network (it copies the whole file, not just the changed parts). Anyway, you invoke it at the command line like this:

ROBOCOPY Source Destination OPTIONS

So, here’s an example of copying everything from “c:\data” to “q:\data”:

 robocopy c:\data u:\data /MIR /NDL /R:20

To do this on an automated schedule in Windows XP, put your commands into a text file with a name ending in “.bat” and select Control Panel-> Scheduled Tasks-> Add Scheduled Task. Select your text file to run, have it run “daily”. You would think that you can’t run it more than once a day this way, but that’s actually not true. Click on “Open advanced properties for this task when I click Finish” and then press Finish. Now select the “Schedule” tab. Set it to start at some time when you’re probably using the computer, click on “Advanced”, and set “repeat task” so it will run (say, every hour with a duration of 2 hours). Then click on “Show multiple schedules”, click “new”, and then select “At system startup”. Now it will make copies on startup AND every hour. You may want to go to the “Settings” tab and tweak it further. You can use Control Panel-> Scheduled tasks to change the schedule or other settings.

A GUI for Robocopy is available. An alternative to Robocopy is SyncToy; SyncToy has a GUI, but Microsoft won’t support it, I’ve had reliability and speed problems with it, and SyncToy has a nasty bug in its “Echo” mode… so I don’t use it. I suspect the Windows Vista and Windows 7 synchronization tools might make Robocopy a less useful, but I find that the Windows XP synchronization tools are terrible… making using Robocopy a better approach. There are a boatload of applications out there that do one-way or two-way mirroring, including ports of rsync, but getting them installed in some security-conscious organizations can be difficult.

Of course, if you’re using Unix/Linux, then use rsync and be happy. Rsync usually comes with Unix/Linux, and rsync is leaps-and-bounds better than robocopy. But not everyone has that option.

path: /misc | Current Weblog | permanent link to this entry

Sat, 02 May 2009

Own your own site!

Geocities, a web hosting site sponsored by Yahoo, is shutting down. Which means that, barring lots of work by others, all of its information will be disappearing forever. Jason Scott is trying to coordinate efforts to archive GeoCities’ information, but it’s not easy. He estimates they’re archiving about 2 Gigabytes/hour, pulling in about 5 Geocities sites per second… and they don’t know if it’ll be enough. What’s more, the group has yet to figure out how to serve it: “It is more important to me to grab the data than to figure out how to serve it later…. I don’t see how the final collection won’t end up online, but how is elusive…”

This sort of thing happens all the time, sadly. Some company provides a free service for your site / blog / whatever… and so you take advantage of it. That’s fine, but if you care about your site, make sure you own your data sufficiently so that you can move somewhere else… because you may have to. Yahoo is a big, well-known company, who paid $3.5 billion for Geocities… and now it’s going away.

Please own your own site — both its domain name and its content — if it’s important to you. I’ve seen way too many people have trouble with their sites because they didn’t really own them. Too many scams are based on folks who “register” your domain for you, but actually register it in their own names… and then hold your site as a hostage. Similarly, many organizations provide wonderful software that is unique to their site for managing your data… but then you either can’t get your own data, or you can’t use your data because you can’t separately get and re-install the software to use it. Using open standards and/or open source software can help reduce vendor lock-in — that way, if the software vendor/website disappears or stops supporting the product/service, you can still use the software or a replacement for it. And of course, continuously back up your data offsite, so if the hosting service disappears without notice, you still have your data and you can get back on.

I practice what I preach. My personal site, www.dwheeler.com, has moved several times, without problems. I needed to switch my web hosting service (again) earlier in 2009, and it was essentially no problem. I just used “rsync” to copy the files to my new hosting service, change the domain information so people would use the new hosting service instead, and I was up and running. I’ve switched web servers several times, but since I emphasize using ordinary standards like HTTP, HTML, and so on, I haven’t had any trouble. The key is to (1) own the domain name, and (2) make sure that you have your data (via backups) in a format that lets you switch to another provider or vendor. Do that, and you’ll save yourself a lot of agony later.

path: /misc | Current Weblog | permanent link to this entry

Thu, 08 Jan 2009

Updating cheap websites with rsync+ssh

I’ve figured out how to run and update cheap, simple websites using rsync and ssh and Linux. I thought I’d share that info here, in case you want to copy my approach.

My site (www.dwheeler.com) is an intentionally simple website. It’s simply a bunch of directories with static files; those files may contain Javascript and animated GIF, but site visitors aren’t supposed to cause them to change. Programs to manage my site (other than the web server) are run before the files are sent to the server. Most of today’s sites can’t be run this way… but when you can do this, the site is much easier to secure and manage. It’s also really efficient (and thus fast). Even if you can’t run a whole site this way, if you can run a big part of it this way, you can save yourself a lot of security, management, and performance problems.

This means that I can make arbitrary changes to a local copy of the website, and then use rsync+ssh to upload just those changes. rsync is a wonderful program, originally created by Andrew Tridgell, that can copy a directory tree to and from remote directory trees, but only send the changes. The result is that rsync is a great bandwidth-saver.

This approach is easy to secure, too. Rsync uses ssh to create the connection, so people can’t normally snoop on the transfer, and redirecting DNS will be immediately noticed. If the website is compromised, just reset it and re-send a copy; as long as you retain a local copy, no data can be permanently lost. I’ve been doing this for years, and been happy with this approach.

On a full-capability hosting service, using rsync this is easy. Just install rsync on the remote system (typically using yum or apt-get), and run:

 rsync -a LOCALDIR REMOTENAME@REMOTESITE:REMOTEDIR

Unfortunately, at least some of the cheap hosting services available today don’t make this quite so easy. The cheapest hosting services are “shared” sites that share resources between many users without using full operating system or hardware virtualization. I’ve been looking at a lot of the cheap Linux web hosting services like these such as WebhostGIANT, Hostmonster, Hostgator, and Bluehost. It appears that at least some of these hosting companies improve their security by greatly limiting the access granted to you via the ssh/shell interface. I know that WebhostGIANT is an example, but I believe there are many such examples. So, even if you have ssh access on a Linux system, you may only get a few commands you can run like “mv” and “cp” (and not “tar” or “rsync”). You could always ask the hosting company to install programs, but they’re often reluctant to add new ones. But… it turns out that you can use rsync and other such services, without asking them to install anything, at least in some cases. I’m looking for new hosting providers, and realized (1) I can still use this approach without asking them to install anything, but (2) it requires some technical “magic” that others might not know. So, here’s how to do this, in case this information/example helps others.

Warning: Complicated technical info ahead.

I needed to install some executables, and rather than recompiling my own, I grabbed pre-compiled executables. To do this, I found out the Linux distribution used by the hosting service (in the case of WebhostGIANT, it’s CentOS 5, so all my examples will be RPM-based). On my local Fedora Linux machine I downloaded the DVD “.iso” image of that distro, and did a “loopback mount” as root so that I could directly view its contents:

 cd /var/www     # Or wherever you want to put the ISO.
 wget ...mirror location.../CentOS-5.2-i386-bin-DVD.iso
 mkdir /mnt/centos-5.2
 mount CentOS-5.2-i386-bin-DVD.iso /mnt/centos-5.2 -o loop
 # Get ready to extract some stuff from the ISO.
 cd
 mkdir mytemp
 cd mytemp

Now let’s say I want the program “nice”. On a CentOS or Fedora machine you can determine the package that “nice” is in using this command:

 rpm -qif `which nice`
Which will show that nice is in the “coreutils” package. You can extract “nice” from its package by doing this:
 rpm2cpio /mnt/centos-5.2/CentOS/coreutils-5.97-14.el5.i386.rpm | \
   cpio --extract --make-directories
Now you can copy it to your remote site. Presuming that you want the program to go into the remote directory “/private/”, you can do this:
 scp -p ./usr/bin/rsync MY_USERID@MY_REMOTE_SITE:/private/

Now you can run /private/nice, and it works as you’d expect. But what about rsync? Well, when you try to do this with rsync and run it, it will complain with an error message. The error message says that rsync can’t find another library (libpopt in this case). The issue is that and cheap web hosting services often don’t provide a lot of libraries, and they won’t let you install new libraries in the “normal” places. Are we out of luck? Not at all! We could just recompile the program statically, so that the library is embedded in the file, but we don’t even have to do that. We just need to upload the needed library to a different place, and tell the remote site where to find the library. It turns out that the program “/lib/ld-linux.so” has an option called “—library-path” that is specially designed for this purpose. ld-linux.so is the loader (the “program for running programs”), which you don’t normally invoke directly, but if you need to add library paths, it’s a reasonable way to do it. (Another way is to use LD_LIBRARY_PATH, but that requires that the string be interpreted by a shell, which doesn’t always happen.) So, here’s what I did (more or less).

First, I extracted the rsync program and necessary library (popt) on the local system, and copied them to the remote system (to “/private”, again):

 rpm2cpio /mnt/centos-5.2/CentOS/rsync-2.6.8-3.1.i386.rpm | \
   cpio --extract --make-directories
 # rsync requires popt:
 rpm2cpio /mnt/centos-5.2/CentOS/popt-1.10.2-48.el5.i386.rpm | \
   cpio --extract --make-directories
 scp -p ./usr/bin/rsync ./usr/lib/libpopt.so.0.0.0 \
        MY_USERID@MY_REMOTE_SITE:/private/
Then, I logged into the remote system using ssh, and added symbolic links as required by the normal Unix/Linux library conventions:
 ssh MY_USERID@MY_REMOTE_SITE
 cd /private
 ln -s libpopt.so.0.0.0 libpopt.so 
 ln -s libpopt.so.0.0.0 libpopt.so.0

Now we’re ready to use rsync! The trick is to tell the local rsync where the remote rsync is, using “—rsync-path”. That option’s contents must invoke ld-linux.so to tell the remote system where the additional library path (for libopt) is. So here’s an example, which copies files from the directory LOCAL_HTTPD to the directory REMOTE_HTTPDIR:

rsync -a \
 --rsync-path="/lib/ld-linux.so.2 --library-path /private /private/rsync" \
 LOCAL_HTTPDIR REMOTENAME@REMOTESITE:REMOTE_HTTPDIR

There are a few ways we can make this nicer for everyday production use. If the remote server is a cheap shared system, we want to be very kind on its CPU and bandwidth use (or we’ll get thrown off it!). The “nice” command (installed by the steps above) will reduce CPU use on the remote web server when running rsync. There are several rsync options that can help, too. The “—bwlimit=KBPS” option will limit the bandwidth used. The “—fuzzy” option will reduce bandwidth use if there’s a similar file already on the remote side. The “—delete” option is probably a good idea; this means that files deleted locally are also deleted remotely. I also suggest “—update” (this will avoid updating remote files if they have a newer timestamp) and “—progress” (so you can see what’s happening). Rsync is able to copy hard links (using “-H”), but that takes more CPU power; I suggest using symbolic links and then not invoking that option. You can enable compression too, but that’s a trade-off; compression will decrease bandwidth but increase CPU use. So our final command looks like this:

rsync -a --bwlimit=100 --fuzzy --delete --update --progress \
 --rsync-path="/private/nice /lib/ld-linux.so.2 --library-path /private /private/rsync" \
 LOCAL_HTTPDIR REMOTENAME@REMOTESITE:REMOTE_HTTPDIR

Voila! Store that script in some easily-run place. Now you can easily update your website locally and push it to the actual webserver, even on a cheap hosting service, with very little bandwidth and CPU use. That’s a win-win for everyone.

path: /misc | Current Weblog | permanent link to this entry

Mon, 19 May 2008

YEARFRAC Incompatibilities between Excel 2007 and OOXML (OXML)

In theory, the OOXML (OXML) specification is supposed to define what Excel 2007 reads and writes. In practice, it’s not true at all; the latest public drafts of OOXML are unable to represent many actual Excel 2007 files.

For example, at least 26 Excel financial functions depend on a parameter called “Basis”, which controls how the calendar is interpreted. The YEARFRAC function is a good example of this; it returns the fraction of years between two dates, given a “basis” for interpreting the calendar. Errors in these functions can have large financial stakes.

I’ve posted a new document, YEARFRAC Incompatibilities between Excel 2007 and OOXML (OXML), and the Definitions Actually Used by Excel 2007 ([OpenDocument version]), which shows that the definitions of OOXML and Excel 2007 aren’t the same at all. “This document identifies incompatibilities between the YEARFRAC function, as implemented by Microsoft Excel 2007, compared to how it is defined in the Office Open Extensible Mark-up Language (OOXML), final draft ISO/IEC 29500-1:2008(E) as of 2008-05-01 (aka OXML). It also identifies the apparent definitions used by Excel 2007 for YEARFRAC, which to the author’s knowledge have never been fully documented anywhere. They are not defined in the OOXML specification, because OOXML’s definitions are incompatible with the apparent definition used by Excel 2007.”

“This incompatibility means that, given OOXML’s current definition, OOXML cannot represent any Excel spreadsheet that uses financial functions using “basis” date calculations, such as YEARFRAC, if they use common “basis” values (omitted, 0, 1, or 4). Excel functions that depend upon “basis” date calculations include: ACCRINT, ACCRINTM, AMORDEGRC, AMORLINC, COUPDAYBS, COUPDAYS, COUPDAYSNC, COUPNCD, COUPNUM, COUPPCD, DISC, DURATION, INTRATE, MDURATION, ODDFPRICE, ODDFYIELD, ODDLPRICE, ODDLYIELD, PRICE, PRICEDISC, PRICEMAT, RECEIVED, YEARFRAC, YIELD, YIELDDISC, and YIELDMAT (26 functions).”

I have much more information about YEARFRAC if you want it.

path: /misc | Current Weblog | permanent link to this entry