25 July 2013

The default SL avatar mesh sucks

I've been working on a rigged mesh avatar replacement. The idea is that it's a full-enclosure bondage suit, with some design features inspired by a story on the excellent Gromet's Plaza website; in particular, the feet are replaced by ballet boots, the hands by mittens with no fingers, and the head by a rounded hood. Eventually, I want to apply normal and specular maps to make it look like authentic latex, but that's down the road a little.

I've been using Avastar to make mesh replacements from avatar shapes for a while now. It works well, and reliably, once I figured out the requisite workflow. Avastar uses the standard SL avatar mesh, as would be required to maintain compatibility for clothing designers and the like (its intended audience). Unfortunately, that mesh deforms badly as the avatar moves.

This picture shows the biggest problem for what I'm doing:
See the weird edge sticking out on the inner thigh? That's not the only problem: there's a corresponding pocket on the other side of it that turns the edge you see into a wedge shape. That's the worst part, though the crotch deforms badly in front, as well, and the hips deform weirdly, though not as bad.

There's a JIRA, STORM-1800, that gives a start on fixing avatar weight issues. I don't know if it fixes this problem or not, though I would hope so. However, that doesn't solve my problem.

The real problem lies in the low number of vertices in the standard avatar mesh in that area. I don't know why they skimped on this way, but the avatar's old enough that they probably didn't think it was important when it was made - since that was long before SL became popular in the ways it has.

Compare this:
 with this:
The second is much less haphazard, with vertices and edges at places where the avatar needs to deform. As you might expect from looking at these, the second avatar does indeed deform much, much better with movement. It's Utilizator Mode's Avatar 2.0, his idea of what a replacement avatar should be. He gets it right, for everything I've seen.

Unfortunately, I can't use it directly, both for reasons of intellectual property and because the shape isn't what I'm looking to make for this project. He also only offers this in a female version; I want to make a male suit, as well, and the proportions would need fixing for that. Considering that he offers the avatar for L$300, it's not reasonable to expect that he'd release the base avatar mesh in a usable form for others to work with directly - and rip off and sell as their own.

What this tells me is that the only way to get from point A to point B is to completely rebuild the avatar mesh from the ground up, as Utilizator Mode has done. That's a metric buttload of work, and one that not only am I not prepared to tackle for this project, but is probably beyond my limited artistic abilities. As an artist, I make a decent programmer.

Reweighting the SL avatar mesh is probably going to be of only limited utility because the geometry just isn't there. That's also a nontrivial amount of work, all by itself.

What a nuisance. Poor design decisions - or ones that are simply optimized for a much different set of conditions than ones that actually prevail once the product is launched - have bitten number products as they go through the lifecycle, and the SL avatar certainly is an outstanding example of that.

22 July 2013

A difference in approach

There's been a lot of back and forth between me and Henri Beauchamp, and me and Siana Gearz and to a lesser extent Latif Khalifa, about the difference in development approaches between Firestorm on the one hand and CoolVL and Singularity on the other.

Henri commented, on an earlier entry on this blog,
Using the wrong tools (a code repo) and method (merges, instead of a a true fork and backports), you waste your time while I work alone but much faster.
For your info, the *only* tools I'm using to program are a text editor (nedit), and the 'grep', 'diff' and 'patch' commands.
He also noted, in another comment,
Also, April is quite late for SSB support (I had it fully working and released in January, i.e. 3 months sooner)
There's only one problem, but it's a doozy. As it turns out, his SSB code has a major bug in it. SUN-99 describes a corruption in the Current Outfit folder (COF) that is caused by CoolVL's implementation. It only affects those who use CoolVL and change outfits in SSB regions. The result is multiple copies of the COF. This cannot be fixed by a viewer. It requires Linden intervention. Since the SSB servers depend on it being right, the user's avatar never updates correctly, and thus their appearance is broken for everyone but the user themselves.

Henri has never made it any secret that he disagrees strongly with the whole idea of the COF. In a comment on Nalates Urriah's blog, Henri rips it up one side and down the other, explaining how it was not necessary, that LL's old way of doing the equivalent could have been extended to handle multi-attach (the original reason for the COF), and that his method of doing it is superior.

That all may well be true. Unfortunately, there's one major flaw in his thinking: The viewer interacts with the Second Life servers. Those servers have very specific expectations of how the viewer will behave. Henri's viewer does not behave that way. Instead of using LL's code which is known to work with the server, he implemented his own - and got it wrong, to his users' detriment.

(And all of this is quite aside from one problem with his implementation that the COF doesn't suffer: his implementation makes what you wear at login be the same thing you were wearing when you last logged out on that computer. If you switch computers and change outfits, when you go back to the original machine, you'll change back. This is not what the user expects.)

The same thing goes for Singularity, in a different area of the viewer. Monty Linden has done a lot of work on the HTTP server and client in the LL server and viewer. The original implementations were broken in a number of ways, and caused many strange and wonderful problems in the viewer and the users' experience. Monty rewrote large portions of that to do things in a much more standards-compliant way. Included in that is a throttling mechanism to prevent one or a small number of users from hogging the simulator resources and making everyone else's responsiveness worse.

Singularity's Aleric Inglewood thought he could do HTTP better than LL. While he started before LL announced their improvements, he stuck with his code even after Monty's was released. The results have turned out different from what he expected. Now he's struggling to find ways around the throttles, and wiring in fallbacks to the old way of fetching textures via UDP when that fails. Since meshes can only be fetched via HTTP, he's kinda stuck there.

Again, this is because the SL servers expect the viewer to act in a particular way and don't deal well with those that do not. In this case, it's an active defense, instead of a simple breakage, but the results are still the same: a broken user experience.

This kind of incompatibility is one major reason we don't reimplement any functions that go straight to the LL servers. We can and do make user interface changes. We keep our cotton-pickin' fingers out of the server interface code.

Nyx Linden just sent out a long email to the TPV developers' list outlining exactly how the viewer should treat the COF, including things to verify in the code. Since we use the code from the release viewer, I don't expect we'll have to spend much time on that. Henri's got some work ahead of him.

This ties in with a related point: when LL rolls a new feature, we can implement it by just using their code and splicing in our changes. We don't have to backport it. We certainly don't have to rewrite it, let alone reinvent it. We can just use it. We deliberately chose to hack on the V3 UI, trying to maximize the amount of code in common between V3 and Firestorm, instead of welding the Phoenix UI code on the side for this reason.

Henri may be able to do things faster, but a wrong answer is still wrong no matter how fast you get it.

Yes, we do seem to piddle around and take our own sweet time and waste time on QA (according to Niran) and so on...but when we release it, we get it right.

So I'm not particularly interested in how Henri or Singularity's development model may be better than Firestorm's. I care much more about putting out a viewer that users can depend on, first time every time, and that gets the job done without screwing up users' inventories or hammering the LL servers. No, we're not perfect (see Firestorm 4.4.1). We can always improve. We know enough to realize that. So do our users, and that's what really matters.

10 July 2013

Server-side appearance is rolling

LL has started rolling out server-side baking.

Inara Pey has an excellent treatment of just what it means, especially to folks who aren't running SSB-capable viewers. Phoenix users, this means you.

Still don't want to upgrade? See figure 1.

02 July 2013

Dealing with a screwup: The story of Firestorm 4.4.2

Monday was Canada Day, eh? It was also a big, busy, pain in the ass of a day for the Firestorm team.

The first we heard of the problem was when Jessica Lyon told us in the developer chat that there was a big problem with Firestorm 4.4.1 and she'd convinced LL to give us until Tuesday to put out a new release before 4.4.1 was blocked.

To explain what the problem was, I need to first explain the statistics system. Linden Lab keeps statistics on a wide variety of performance measurements in the viewer. You may know that there's a crash statistics list that TPV developers in the Directory get if their viewer reaches a usage threshold, and that determines the order viewers are listed in the Directory each week.

There's far more than that, though. When Oz Linden says that 75% of users can run with Advanced Lighting Model turned on and get acceptable performance, that number didn't come from a Ouija board. It comes from actual reports of system configurations and measured performance sent in by every viewer directly to LL in the form of a statistics message.

That statistics message is supposed to be sent in once every 10 minutes. It contains performance and resource utilization numbers. It does not contain any personally identifying information. It's fed directly into LL's statistics system, where it gets crunched.

The problem was that Firestorm 4.4.1 was sending that message every 30 seconds instead of every 10 minutes. The statistics servers were getting hammered badly by having to deal with a sudden 20-fold spike in data. Oz discovered this when he went to generate the statistics report for last week.

There's a clause, 2.f, in the Third Party Viewer Policy that says:
You must not impose an unreasonable or disproportionately large load on our infrastructure or interfere with our providing the normal functionality of Second Life.
(The statistics message itself is required by 2.h of that same policy.) Guess what we did? LL was completely justified in telling us to fix it or else.

The reason we were spamming the server turned out to be a leftover from our helping to test server-side baking as it got close to its rollout next week. LL needed a big test case, and not just with their own viewer. We were in the final stages of doing QA on 4.4.1, so we made a change to enable enhanced logging of the viewer's responses to appearance update messages so that anyone who encountered a problem would have already collected the information needed to debug it, and LL would have a leg up on actually finding the problem. Part of that change was enabling a debug setting that sped up the viewer statistics message.

We handed out builds with that change made to folks to use in the test. The test was a rousing success by all reports, and LL was quite happy with it.

The problem was that we never undid that change in the release branch of the code. Oops. (All right, it was more of an "aw, SHIT!".)

So we had this problem to deal with. We talked for a bit about how to deal with it. There was no actual code change needed. All we had to do was change the default of one debug setting and, for the sake of completeness, the contents of an XML file used to control how the viewer logs data. Unfortunately, there is no good way to get the userbase to make this change en masse. Not only would we miss many who ignore messages of the day and such, making the changes would be difficult for many of them. (Users of SL are, by and large, not techies. This is a Good Thing.) It quickly became clear that we were going to have to spin a new release.

We knew right up front that we needed to own up to the problem and be open and public about dealing with it. Our users expect, and deserve, nothing less.

We also chose to block 4.4.1 ourselves, rather than have LL do it. We have the ability because Firestorm downloads information from our servers at startup, and one thing it loads is a list of blocked releases. When a release is started that's on the blocked list, it puts up a message, with some explanation text, and then exits when the user clicks OK. This does not require that any information about the user be collected at all, let alone sent to the Firestorm Project servers. If LL were to block it, on the other hand, the user would get a message about how they were not allowed to log on to Second Life with that viewer - with no further explanation. We felt this would cause more harm from user confusion than the very limited benefit that might come from having LL be the boogey man would give. Unfortunately, while the statistics message spam probably does not hurt OpenSim servers, our method blocks 4.4.1 for OpenSim users, as well.

There's a bug in the GPU table that causes problems for folks using ATI Radeon 6000M, 7000M, and 8000 chipsets. We have a fix for that. There was a fair amount of discussion about including that fix in 4.4.2. We decided eventually not to do so, because it would have taken much more QA than the change that was forcing the update, and that would have taken time we just didn't have. Jess had gotten LL to extend the deadline for the block to the next day; they weren't going to sit still for a 2-day QA cycle, especially pushing up against the July 4 holiday.

Finally, we decided to pull 4.4.1 from distribution immediately. We decided that there was no benefit in allowing people to download a release we knew was broken and that we knew we were going to have to block. We rolled back the version check on the website to 4.4.0 so people wouldn't be nagged to install a version that they couldn't get, and Jess wrote the first of two blog posts telling people we were going to have to force an update.

Then we waited while 4.4.2 was updated and built. We had to do that twice because of an error in the first build that omitted the very SSB compatibility flag that was the reason we pushed out 4.4.1 when we did. Still, we got the builds done, and the QA team turned around a very quick smoke test. We pushed 4.4.2 out the door, published the second blog post, waited a few minutes, and blocked 4.4.1.

Then we all joined the support groups and dealt with the flood of questions and complaints from the users. Oy.

The questions themselves weren't so bad. We knew we were going to have to deal with that. There was a massive volume of them, to be sure, but that's why we all piled in. Chat lag in the Firestorm Support English group was ferocious. (Why, oh why, did LL have to fix SVC-7031? :-) ) The questions were repetitive, and many of them were answered in the blog postings that we'd asked people to read. That wasn't the annoying part, though. Not even the guy who refused to read the blog post and demanded that we answer in 10 words or less why 4.4.1 wasn't good enough any more was truly annoying. (By now, you should understand just how impossible that request is to satisfy.)

There was a very common reaction of "Is this some kind of a joke?" (Or a hoax?) We wish it was, and that people were asking the question says good things about the users' perceptions of the quality of our code.

There was a fair amount of unhappiness that were were pushing out a new release so soon after 4.4.1. We expected that, and deserved it. We screwed up; this is the price of that screwup.

No, the annoying part was the tinfoil hat brigade. There were people saying "ZOMG, they're capturing our personal information all over again! Just like Emerald!" Uh, no. There was even one guy who was convinced that we'd stolen his credit card info, though we did get him calmed down eventually.

The wait to get 4.4.2 built, tested, and pushed out to the download servers was interminable, but it finally got out there. That was when people discovered that installing 4.4.2 over the top of 4.4.1 was a non-event. They didn't even have to uninstall 4.4.1 first, though many did. A full clean install with manual clearing of caches and the like wasn't needed for those upgrading from 4.4.1. The install was universally reported to be painless and take about 5 minutes or so. They even loved the performance, though there shouldn't be much reason for performance to improve much. We'll take it.

Now, it's all over but the cussing. People will continue to be surprised over the next several days that they can't log in any more with 4.4.1, and the support folks will have to explain over and over and over. (As I write this, 4.4.2 has been downloaded just over 26000 times; we have far more users than that.) But the LL servers aren't getting hammered any more, and, I hope, people will forgive us for the madness.

We did learn a lesson about our release process: Once we branch for release, nothing goes in without being tracked and verified before we actually spin the release code. We had such a process in place, but it broke down. That won't be allowed to happen again.

We didn't like doing this. It was a lot of work, and a lot of hassle, and made our users' lives harder than they should have been. Forcing folks who'd upgraded on our strong recommendation to do so again, five days or less later, is not particularly user-friendly. We had no choice, though, and I don't know that there's much we could have done differently once the problem became apparent.

01 July 2013

Tracking LL's big changes

Ever since the release of materials in LL's viewer 3.6.0, there've been calls for us to put it in Firestorm. I'm quite receptive to the idea. Since I worked on the materials project, I know what it can do, and would love to see Firestorm get the capability to use it. I also think that it won't take off until we add it to Firestorm, just as mesh didn't take off until we added it to Phoenix.

There's a big roadblock for us, though. As I pointed out in a thread on SLUniverse:
The problem isn't that the materials code depends on the CHUI code directly. The problem is that the merge process depends heavily on code changes being applied in the same order to the same files. The CHUI changes hit a large part of the viewer codebase. (That's why it took LL a year to get CHUI out the door.) Inevitably, those changes hit files that the materials project changed. When they do, if you don't merge in the CHUI changes first, then you have to do a lot more work to fit the materials project changes into the code - work that you'll have to undo when you finally get around to putting the CHUI changes in, or will have to do over and over if you ignore the CHUI changes altogether.
This is the real reason that TPVs track the Linden viewer so closely: self-preservation. The more we diverge from the LL viewer, the harder we have to work to keep up with LL's changes.
CHUI is, for us, a big no-op in functionality. The changes are mostly duplicative of changes we put in Firestorm a long time ago, though different in design (and not particularly better, either). Still, the changes permeate the codebase, and we have to accommodate them (mostly by bypassing them with calls to our own code, but not always). This is a nontrivial exercise.

We're working on that now. We started working on it in earnest right after releasing 4.4.1. The code compiles and links and runs, but has lots and lots of bugs in it. We're working through those, one at a time, and will get it up to where we think it should be as quickly as we can. Once we do, then we can put stuff we actually want in the viewer.

Nalates Urriah pointed to my comment on her blog. Her post drew two comments, one from Henri Beauchamp and one from NiranV Dean. The two comments are worth replying to for entirely different reasons.

I've said before, and will keep saying, that I admire Henri for his efforts to keep the V1 UI interface alive in CoolVL, but I think he's fighting a losing battle. He commented,
“I told you, Tonya…” This is what I can say, seeing how the Cool VL Viewer (a v1 UI viewer that Tonya pretended would be unmaintainable in the long term, see: http://sldev.free.fr/forum/viewtopic.php?f=5&t=584#p2430) had its experimental branch with materials implemented only a couple of weeks after the code was opened by LL (and v1.26.9 is now kept exactly on par with LL’s v3.6 materials viewer), and how Firestorm lags big time behind every new feature (the Cool VL Viewer was also the first TPV with SSB, back in January 2013, while Firestorm only gained it recently.
This proves how prominent is a development model over the size of the developers team…
This is one case where Henri didn't have to worry about LL changes. The reason is in the name: CHUI is a UI-specific change, one Henri could largely, if not completely, ignore. That's a rarity in the world of SL viewer development, and he will seldom be so lucky.

Henri's incorrect when he says Firestorm got SSB capability "only recently". The initial SSB-capable Firestorm release was publicly available April 22 of this year, but the code was in the codebase much sooner, back around the end of February.

As for the difference in development models, it's obvious that Henri spends a large amount of time on his viewer, more so than any individual member of the Firestorm team. Again, that's admirable, but I still have my doubts as to how sustainable it is over the long run.

Niran's comment is more revealing:
Just keep in mind that a total retard like me is faster in implementing Materials and SSB and keeping up with LL while still doing heavy UI modifications than Firestorm. Mostly because i dont have a freaking huge Viewer with millions of features to maintain… and because i dont do 2 weeks QA. 2 weeks QA are 2 wasted weeks in which i could have collected a week worth of bugreports, fixed them, worked more on other stuff, make a release, collect bugreports and feedback on that one and, fixed stuff, work even more on other things and update a second time.
"I don't do 2 weeks QA." That says it all right there. Niran's viewer doesn't get any QA, as far as anyone can tell. He slaps it together, tests it himself for some minimal period of time, and throws it out to the world.

QA is never wasted when you're working on software that's intended for someone else, never mind a lot of someone elses, to use. That he derides it as he does shows hos true mindset: he hacks on code rather than making good software.

Niran's a programmer, not a developer of production-quality software. He has no understanding of the difference. That's fine for him and his users, but the average user would much rather have a viewer that is actually likely to work well and stably rather than have the absolute newest shinies. You get there by moving more slowly and carefully than Niran does, actually working to integrate patches form others instead of just dropping them in, and actually doing meaningful QA with more than just one or two testers.

The Firestorm team does what it does the way it does it for good, sound reasons. I think the proof of how well we succeed at it is in the viewer we put out.