Metapuzzles, Backsolving, and Short-Circuiting: A Study of Three Puzzlehunts

(This post will include limited commentary, and potentially spoilers, on three puzzlehunts from earlier this year: the MIT Mystery Hunt, which can be found here), the Galactic Puzzle Hunt, which can be found here, and the Cryptex Hunt, which can be found here.)

Hi all! Since we last spoke, Mystik Spiral was (were?) accepted to Miskatonic University, and my team is very excited. You can see our application video (and lots of others) on the Miskatonic site. In addition to an obvious bias toward ours, I really enjoyed the videos from The AMI-Gos, Boneless Chickthyologists, Friday the 13th Part VI, The Gray Old Ones, and Innsmouth High School Scuba Squad. I have friends on all those teams, which probably made them extra-amusing.

I’ve promised to discuss backsolving policies in the Mystery Hunt, which was going to be Part 6 of my Mystery Hunt recap, but then I participated in two other puzzlehunts that dealt with backsolving in very different ways, and I thought it might be useful to compare and contrast. Of course, that means I’ve been assembling an epic post in my head for weeks, which makes it harder to get things on the page. Let’s see if it lives up to my personal hype.

First of all, let’s get some terminology straight, since I know some people think of “backsolving” as solving any puzzle with partial information, but I think it’s more specific than that. I’ll define backsolving as the practice of solving a metapuzzle (or at the very least, figuring out how the metapuzzle works) and using that information to confirm a correct answer to a feeder puzzle without solving that puzzle. I’ll define short-circuiting as the practice of solving a metapuzzle and understanding how it works, but managing to do so while missing a significant number of feeder answers. (Note that short-circuiting often leads to subsequent backsolving.) Short-circuiting sometimes results from wheel-of-fortuning, a common puzzlehunting strategy where the solver figures out that a puzzle or metapuzzle is generating a series of letters, and the solver manages to guess a lot of missing letters by pattern-matching.

Okay? Okay.

MIT MYSTERY HUNT 2019

What elements of this puzzlehunt encouraged backsolving?

In the 2017 Mystery Hunt, solving metas was a very powerful thing. We intended quest metas to be a mechanic through which power teams kept themselves from being bottlenecked, but if you solved quest metas quickly (not to mention if you solved character metas quickly and used them to help you solve other character puzzles) you could cut through the Hunt like a hot knife through butter.

There were multiple exploitable features of that structure, but one was that strong teams solve metas faster, so if solving metas opens puzzles faster, the rich get richer. This was one of the motivations behind a structure that based unlocks exclusively on puzzle solving rather than on metapuzzle solving. The problem with this is that if you solve a metapuzzle and don’t get a bonus for it, it massively incentivizes backsolving, as we discovered when many many teams (starting with Left Out, who short-circuited the first meta with ten missing answers, which is insane, by the way) started calling the remaining answers in for everything in sight.

What, if anything, did this puzzlehunt do to discourage backsolving?

We didn’t have any explicit policy (certainly not communicated to teams at large) about backsolving. When we wrote the Hunt, we thought the “mingled metas” mechanism would effectively reduce backsolving, because teams wouldn’t know where to put their orphan answers. But a lot of us on Setec feel that calling in an answer twice is totally part of Hunt, and calling it in nine times is tres obnoxious. A lot of teams don’t feel that way, and in their defense, if the goal is to win and you have no way to know other teams aren’t doing that, it’s hard to argue that you should place artificial constraints on yourself.

In practice during Hunt, when we felt a team was attempting backsolves to an abusive extent, we sometimes put them in the “penalty box” by giving them a stern phone call and then not answering their calls for a while. We’ve done this in past years and it usually only affected a few teams… but this year we hit the trigger too quickly and then felt compelled to treat other teams the same way out of fairness. Yet based on comments on an earlier post, we didn’t succeed in consistency here. And there’s a question of whether penalizing people for doing something we didn’t tell them to do is fair. I think there were balls dropped here.

What, if anything, did this puzzlehunt do to discourage short-circuiting?

We thought the mingled metas would make it tougher to solve metas in general, particularly in that you might not know how many feeder answers to expect. We also ended up with more metas than usual where even when you knew how the meta worked, you needed most or all of the answers to carry the process out. This was not intentional on my part as meta captain, but I suspect some of our authors had the 2017 Cleric round in the backs of their minds.

What (in your blog host’s opinion) should be changed?

Backsolving is a part of Mystery Hunt, and I don’t think that should stop. How much backsolving is appropriate is an incredibly controversial question, and I encourage you to voice your opinions in the comments. I’ve already said some of what I think in the “two backsolves vs. nine backsolves” example above. But there’s a whole lot of gray area between 2 and 9… where’s the line? And how do you set up a policy that limits abusive backsolving, but doesn’t penalize teams (especially casual teams) if a puzzle is legitimately challenging for them and they need to try multiple answers?

I would recommend, at the very least, two things to future Mystery Hunt constructors. First, if you’re going to have a backsolving limit policy, make it clear to solvers as part of the Hunt instructions. Don’t make them guess, and don’t assume their sense of etiquette matches yours. Second, there are two things Setec pretty clearly agrees should be prevented by the Hunt software; it should not allow teams to have two pending answers submitted for the same puzzle, and it should not allow teams to have the same answer submitted for two different puzzles. Should teams be able to spam the answer confirmer? Maybe, maybe not. But I maintain that submitting two things that can’t both be right, and cutting in front of everyone else before you find out if either of them is wrong, is a jerk move. Our tech team realized after Hunt started that this would probably have been easy to implement. We just didn’t think of it, because we didn’t realize that SOME OF YOU ARE MONSTERS.

CRYPTEX HUNT 2019

What elements of this puzzlehunt encouraged backsolving?

To start with, the fact that you had to solve every single puzzle. I know this because… well, let’s back up.

The hunt was run through ClueKeeper and built around a collection of puzzles embedded in a fake magazine. Incidentally, I really liked this combination; in a puzzle set like this, one of the potential annoyances is not knowing if this page is one puzzle or two; having a separate interface that listed which magazine elements had corresponding puzzle answers (and their enumerations) felt like it made the playing field more fair.

Last year, the Cryptex Hunt (eventually) got a couple of full posts from me in this blog… that was back when I had small amounts of free time. So this year, instead of a full post about how I failed to win Cryptex Hunt, you get a partial post about how I (co-)WON Cryptex Hunt! Wooooo! This was not something I expected when I fell asleep. I solved a bunch of puzzles right away when the hunt began on a Friday evening, but I started to stall out, and the leaderboard showed a big lead for [flips my Jangler/Projectyl two-sided die] Projectyl, with several other solvers slightly ahead of me. I didn’t feel close enough to finish before passing out, so I went to bed, expecting a winner to be crowned by the time I woke up.

One wasn’t. I guess you know that since I already told you I won, unless you thought I somehow finished the hunt in my sleep, which might be a better story, so feel free to skip ahead and pretend that happened. Actually, in the morning I noticed no one had solved the puzzle I believed was the meta, and I had some ideas on how it might work, so I decided to put more focus on that. Also, Jackie was now awake and able to contribute, and she made some progress on puzzles I wasn’t getting anywhere on.

I thought I was placing enough answers into the meta grid that my idea must be right, but I proved not everything could go in. Then I realized that if one of the letters was misplaced, everything could fit; so I pinged the main hunt author, and he confirmed that the grid was wrong, which was released as an erratum shortly after. That fix was enough to solve the meta, and we were the first to do so! But nothing special happened! I poked my head into the escape room Slack and asked if solving the meta meant we won. Nope. The goal was to solve every puzzle. (Errol seemed surprised… I’m not sure if he was surprised the meta could be solved without all the inputs, or just that anybody would try? Neither surprises me, but this is what I do.)

And so the backsolving began. Because enumerations were given and we had a few letters in some answers, we were able to guess two of them right away. Another took a bunch of guesses, because the answer we wanted, while common, was not an option that came up on OneLook (it was a title). One answer resisted backsolving, and we had a large chunk of that puzzle solved… Eventually Jackie cracked the last step of it, and we finished our complete.

So to answer the question I asked in the first place, if a hunt allows you to solve a metapuzzle but won’t let you be done unless you go BACK and SOLVE all of the puzzles that fed into it, then yes, that incentivizes backsolving.

What, if anything, did this puzzlehunt do to discourage backsolving?

Since the hunt was on ClueKeeper, there wasn’t anybody on the other end of the line to throttle excessive backsolving. CK itself does have some auto-throttling; if I submitted three wrong answers in a short period of time (guess how I know this), it would lock me out for a minute. What effect does that have on a solver? I can say that it slows you down, but it also means you feel no guilt whatsoever submitting an answer 61 seconds later. That may or may not be the desired effect.

What, if anything, did this puzzlehunt do to discourage short-circuiting?

Nothing, really. But if you did short-circuit (as we did), the “gotta solve ’em all” requirement made that accomplishment less helpful, unless you complemented it with backsolving.

What (in your blog host’s opinion) should be changed?

Mileage will vary here, but one of the things I find satisfying about puzzlehunts is that it’s a mad dash from Point A to Point B. Good metapuzzles don’t require you to have all the inputs, and the process of solving around the missing data is something I really enjoy. The goal of solving everything on the page feels more like a crossword than a puzzlehunt, where you’re not done until you’ve filled in every box. I enjoy crosswords, but I enjoy them more for the themes; once I know the theme, completing the grid is only appealing to me as a timed competition (or practice for timed competition).

In a structure like last year’s Cryptex Hunt finale, where you only see one puzzle at a time and you need to solve each one to unlock the rest, it makes sense that you have to solve everything to finish. But in an event like this one, where everything was provided at once, finishing the metapuzzle and then having to go back and clean up was a bit weird. Backsolving as a strategy to advance in a puzzlehunt feels like part of a game; backsolving to check the boxes to be declared the winner was anticlimactic. (Apart from that objection, I thought it was a well-designed event that was slickly presented and led to an exciting race that kept changing leaders. And I’m excited to have a gorgeous cryptex in our house soon!)

GALACTIC PUZZLE HUNT 2019

What elements of this puzzlehunt encouraged backsolving?

The Galactic Puzzle Hunt is a lot like the MIT Mystery Hunt on a smaller scale, which makes sense since it was created by a Mystery Hunt team. As I understand it, the 2017 GPH was actually created because Galactic Trendsetters had time on their hands after the 2017 Mystery Hunt ran short. Add this to the fact that the 2009 WALL-E-themed GPH was clearly inspired by the Escape From Zyzzlvaria theme my team used for the Mystery Hunt that year, and I think we can all agree that I am the spiritual founder of the Galactic Puzzle Hunt.

Anyway, there are a lot of gonzo puzzlehunt ideas, in terms of both puzzle content and overarching meta structure, that have rarely fit anywhere other than into the Mystery Hunt, and I admire the fact that GPH throws a lot of these ideas at the wall and most of them stick. I love both the Puzzle Boat and Mark Halpin’s Labor Day puzzle suites, but I think GPH has established itself as the premiere online-only puzzlehunt in terms of quality and ambition.

Anyway, my point is that GPH is a lot like the Mystery Hunt. There are a lot of reasons backsolving is beneficial during Mystery Hunt. So there are a lot of reasons backsolving is beneficial during GPH.

What, if anything, did this puzzlehunt do to discourage backsolving?

GPH has what I think is a brilliant policy for wrong answers. The various Australian online puzzlehunts (MUMS, SUMS, CISRA we hardly knew ye) often allow either 100 answer attempts per puzzle, or per day. This is a very reasonable cap if the puzzles are clean; they aren’t always, and I remember at least one Aussie puzzle on which my team spent 50 guesses before getting it correct, and we weren’t backsolving. I don’t remember it fondly.

GPH instead allows only 20 guesses per puzzle, but they note in their rules that if you use 20 guesses, and they judge you used them legitimately in an attempt to solve the puzzle (as opposed to, say, trying the name of every US state), they’ll give you more. This seems like a sane way to curb abusive backsolving attempts while still allowing for the possibility that a nasty puzzle might call for a high quantity of submissions. I think 20 may still be a high limit for an event with live submissions, but the GPH model works just fine for an electronic-submission event.

Having said all this, backsolving is only relevant if you have unsolved puzzles by the time you solve the meta. That didn’t happen a lot in this year’s GPH because…

What, if anything, did this puzzlehunt do to discourage short-circuiting?

My team, Killer Chicken Bones, was in second place near the end of the weekend before two things happened: (a) most of my team disappeared (half the team knew in advance they’d be leaving for two separate work trips in Hawaii, and another had separate work obligations… all this was before I had an ugly health scare later that week), and (b) due to the length of the hunt, the organizers started passing out meta hints and free answers like candy. All of this made me a lot less engaged in the event  and led to our team not finishing, but we probably would have kept more momentum going if we hadn’t gotten to the point where we had four puzzles left to solve, and NO METAPUZZLES AVAILABLE. Despite the fact that we had two of these left to solve. What?!

As it turns out, the structure was designed such that you would get to see a meta once you solved a certain (pretty high) proportion of the puzzles going into that meta. During the hunt, we had no idea this was what was happening; we just thought they required a lot of total puzzle solves to access the metapuzzles. Later I heard that one of the last metas we opened was the first meta another team opened. This certainly blocked both backsolving and short-circuiting. You can’t solve a metapuzzle with 6/10 answers if you can’t even see the metapuzzle!

This was especially problematic with this year’s structure/theme, which revolved around learning a constructed alien language. (I could go on about that for hundreds of words, but this is closing in on 3K already. In short, I liked the concept, and I enjoyed associating concepts with words/roots; I did not enjoy trying and failing to work out the tense/case infix conjugations, and ultimately I thought the depth and detail of the language design was too big for its britches.) All of the metapuzzles involved digesting certain aspects of this language, which took time. I find that in most puzzlehunts, I look at the meta(s) early and spend a lot of time thinking about it in parallel as I solve other puzzles. That wasn’t possible here because of how late the metas opened. Once we finally opened one, we had to start from scratch. Ultimately, we failed to solve the Artists meta, which certainly had the most work to do after opening it, and if it had been available earlier, we could have used our resources more efficiently.

What (in your blog host’s opinion) should be changed?

There is a traditional “eighty percent rule” for metapuzzle writing. The idea of this is that a good metapuzzle should be solvable with any 80% of the answers and ideally not much less. This forces teams to solve a reasonable number of feeder puzzles (and thus not short-circuit the meta too drastically) while allowing them to solve around a puzzle that might be broken, too hard, or simply not their cup of tea.

This is a hard balance to nail (I’ve failed at it many many times) but the solution is not to hide the metapuzzle until teams already have 80% of the answers. What makes a metapuzzle special is that it’s initially impossible, and it gradually becomes more approachable as you obtain more feeder answers from elsewhere in the hunt. If you don’t show teams the meta until they have enough answers to solve it, it’s not really a meta… it’s just one more puzzle. Albeit one that you have to solve to advance, and thus one you wished you’d had more time to look at.

GPH has done really cool and creative things with their unlock structures, and I have faith that they will keep trying new stuff (until they win Mystery Hunt and get to try stuff on an even more epic stage). I hope, at least in terms of what I like about puzzlehunts, that they consider this “don’t show them the meta until they have most of the answers” mechanic to be something that did not stick to the wall.

SO WHAT DO YOU THINK?

Well, first of all, I can’t believe you read this far. The hunts described above are three different hunts of different lengths run for different audiences, so it’s not surprising that they all dealt with metapuzzle “enforcement” differently, and the results varied. I wrote one of these hunts, I won another, and I crashed and burned in the third, and my instincts on how constructors should deal with backsolving and short-circuiting were sharpened by all three experiences. What are your opinions and instincts about it? Let’s chat about it in the comments. I’ve put a few hours and a few thousand words into writing this, so I’m going to bed, and I assume that by the time I wake up, Projectyl will have won the blog post.

28 thoughts on “Metapuzzles, Backsolving, and Short-Circuiting: A Study of Three Puzzlehunts

  1. My preliminary opinions on backsolve and submission limits:

    * I agree there should be a limit on spamming the answer checker, though I don’t think it should be as low as one at a time, unless there’s a way to remove an answer submission and replace it. A limit of three seems kinda reasonable. (Having a total submission limit supersedes this, of course.) I wonder if, for an automated answer checker, it could be implemented simply as a punishing cooldown (e.g. if you attempt to submit more than X answers in Y time, you can’t submit anymore for Z time where Z > Y).

    * A limit on backsolve attempts on multiple puzzles may be advisable but should depend on the structure. No limit should be imposed if backsolves don’t actually unlock anything, of course, but if they do unlock stuff, a cooldown might be useful. Still, I’d allow it for three puzzles at a time. (Again, having a total submission limit probably supersedes this.) That said, this might not be as good as a submission limit on each puzzle itself, since the process of figuring out a meta and then back-devising puzzle answers can be quite entertainingly clever itself.

    * Regardless of the above, there should be a user-reportable backsolve status checkbox. To keep this honest for the purpose of keeping stats, there should be no penalty for having this box checked, plus a clear indication that it is only for data.

    * I don’t have a specific opinion on short-circuiting on metas, as I haven’t personally solved or written them much, but with regards to short-circuiting in general, I generally find them interesting, akin to sequence-breaking in videogames. As a writer, I’m usually more likely to make stuff too obtuse than too short-circuitable, but I’m curious to find out about sequence-breaking opportunities. If they’re sufficiently elegant/entertaining in my opinion, I might leave them in, since the puzzle-solving experience should be about cleverness and fun more than merely tedium and work.

    Like

  2. *reaches end of post* I… wha? Huh. Okay, lemme see what I can do. The post has MIT, CRYPTEX, and GALACTIC, so sort-by-length isn’t ruled out yet. What would that give us…


    M I T
    - - - -
    - - - - -
    - - - - - -
    C R Y P T E X
    G A L A C T I C
    - - - - - - - - -

    Oh, there we go, look at the central letter or bigram (depending on whether the length is even or odd). I?????PAC?… That’s clearly going to spell out INNERSPACE, which is nice and thematic to the mechanism. Not really a lot of info for backsolving, but I guess it goes like that sometimes.

    Liked by 2 people

  3. I’ve been participating in the Mystery Hunt for nearly 15 years now, usually not really competitively. In general, the teams I’ve hunted with are a bit shy when it comes to calling in educated guesses as answers to puzzles (whether they’re backsolved or we’re attempting to forward solve or we’re wheel-of-fortuning). Sometimes the person in charge of our team gets frustrated if we have three plausible but not probable answers for a puzzle in an hour. At the same time, there may be other teams that are throwing a dozen plausible but not probable answers for a puzzle in even less time. It’s hard to gauge what’s appropriate or not because we don’t actually know how other teams operate.

    I like the GPH gives us a maximum number of answers so that we can have an idea of how reasonable we’re being with calling in potential answers. I similarly like how ClueKeeper also has a way to throttle teams. I wish that the Mystery Hunt organizers each year would provide *some* guidance on what’s reasonable or not. I don’t have strong feelings of what that should be (a maximum number of answer submissions per puzzle, or a waiting period after you’ve submitted too many incorrect answers in a short amount of time, or even just a written suggestion that teams should not submit more than 5 wheel-of-fortuned answers per hour on a single puzzle so that we all have a shared sense of what is reasonable). I think that even if this was just an unenforced suggestion, most teams would be happy to be playing by the same rules.

    Like

  4. For Mystery Hunt specifically, if you put in a limit on answers per puzzle, some care would need to be taken as to how team captains can stop random team members from calling in answers. There’s just a huge difference between a team of 6 and a team of 60 in enforcing discipline. I’m not sure how that could work logistically.

    Like

  5. I disagree on the characterization of submitting an answer combo you know can’t possibly be right as “a jerk move”.

    Effectively, once you start backsolving, you are now playing a giant game of Mastermind, except that the colors are the search space of all possible answer words. Is it a jerk move to guess “Red Red Red Red” even when you know the secret code cannot possibly be all red? I see it as a viable strategy to narrow down the search space in an organized fashion.

    Like

  6. Noah: “There’s just a huge difference between a team of 6 and a team of 60 in enforcing discipline. I’m not sure how that could work logistically.”

    I would argue that this is a matter of how you choose your team. If somebody can’t be trusted to act in the team’s best interest, maybe they shouldn’t have been invited to join the team? (For what it’s worth, Setec is a team of roughly 60, and I do not believe we’ve had anyone go rogue and just start spamming answers like crazy.)

    Glenn: “I agree there should be a limit on spamming the answer checker, though I don’t think it should be as low as one at a time, unless there’s a way to remove an answer submission and replace it.”

    I’m curious why you think it should be more than one. The only justifications I can think of are (a) if you mistype an answer or realize something right after submitting it, you shouldn’t have to wait a few minutes before correcting it (I disagree, since I don’t think a short wait is unreasonable to incentivize accuracy), or (b) that you might not be able to control your team (see above). The ability to delete from the queue might be a nice feature, but I think the limit should be one either way.

    Just to be clear, I’m not saying you shouldn’t be able to submit more than one answer at a time, period. If you solve two puzzles in quick succession, you should be able to submit answers for both of them. I’m arguing for an automatic restriction on submitting more than one answer for the same puzzle, or the same answer to different puzzles. In both cases, that second stab is either due to a mistake or due to a desire to use the answer checker as back solving confirmation, and while neither is inherently sinful, I don’t see why it’s unreasonable to let the solver wait a few minutes to submit the second guess.

    Wei-Hwa: “Effectively, once you start backsolving, you are now playing a giant game of Mastermind, except that the colors are the search space of all possible answer words.”

    That may be your perspective as a solver, but you’re doing so with the answer checker, the principal purpose of which is to confirm whether the solver has solved a puzzle correctly. In Mastermind, the entire mechanic of the game is that you begin by submitting guesses that you know probably aren’t correct. This is especially problematic in a call-in system (which people can debate the merits/drawbacks of) where your use of the answer checker as a logic puzzle delays other teams’ confirmations based on actual puzzle solving attempts.

    I believe the constructor has the right to decide how teams are meant to use their confirmation system, and it’s probably best for them to communicate that. But just because checking answers could be used to narrow your search space doesn’t mean that’s the right thing to do… I think we’d both agree that calling in all the five-letter words in the dictionary for a puzzle is obnoxious, even though it might eventually work.

    Like

    • I was originally inclined to agree with you on restricting backsolve guesses, but then someone on my team pointed out that backsolving can itself be an entertaining challenge.

      I feel that brute-force backsolving is a bad thing, but if a solver can narrow it down to a handful of possibilities (let’s say two puzzles) then I think it might be okay for them to guess them. Perhaps even be more lenient if solving the feeder puzzle no longer mechanically helps anything.

      Like

  7. Setec is a far far outlier in terms of being well-organized, disciplined, older, and strict about who can join the team. Especially teams that have current students and teams that are dorm-based are just going to have a hard time doing this. Ditto teams with remote solvers. We’ve certainly had mild-to-moderate instances of people calling in more answers bad answers than the quartermasters would prefer, and this can be dealt with after the fact by tracking the person down and asking them to stop. But if there’s a limit on the number of answers, then you really want to catch that before they’ve screwed over the whole team.

    Basically I don’t think we should be pushing teams towards “no frosh allowed” rules, and I think that’s a danger with some of these ideas unless they’re carefully implemented.

    An example of a mechanism that might be better is that after say 5 wrong answers, when you call back you give the person on the phone the option to cancel the call-in before you confirm or disconfirm the answer. Alternately you could give teams two logins and only allow call-ins from the trusted account.

    Like

    • I’m not advocating for “no frosh allowed” as much as I’m advocating for “tell your frosh Hunt has guidelines and that they shouldn’t be morons, and if they’re going to insist on being morons, ask them to leave.”

      I agree that Setec may be an outlier, but I’ve also been on two other teams without encountering a serious rogue answer submitter (including one that was brand new and didn’t have institutional memory to fall back on, although all the members were essentially hand-picked). Atlas Shrugged was the largest and certainly the most frosh-ridden, and some members of that team were unpleasant to be around (and I know for a fact that some of them found me unpleasant to be around), but I don’t recall anyone spamming the queue; I think they all understood the team existed because of its leaders, so they respected the leaders.

      I do think you make a good point that “total allowed answers on one puzzle” should never be a limit for Mystery Hunt (and I agree), because as you noted, an irresponsible team member could cause irreparable damage. But if it’s just a limit on “simultaneous submissions on one puzzle,” you can immediately tell if someone’s submitting when they’re not supposed to, and then you track them down and chide/dismember accordingly.

      Like

      • If you’re hunting out of a dorm (or any other setup that’s not one room) it’s not necessarily easy to figure out who is spamming the queue. (Unless your team has set up some kind of internal server to track that, which Plant does, but I think we’re unusual.).

        The first time I hunted I was invited to Random by a friend and never interacted with any “leadership” types. They wouldn’t have known who I was, or had any way to make sure I was well-trained on how calling in worked.

        At a bare minimum one would need “we tracked down the person who was causing the problem and yelled at them” to be reason enough to get your answers back.

        If someone at Atlas Shrugged were spamming the queue would you have known?

        Like

      • “If someone at Atlas Shrugged were spamming the queue would you have known?”

        Yes, because someone on the constructing team would have reached out to the team captain, and Rhode would have come down like an avalanche. If your team is too big for the social contract to apply, make your team smaller.

        The more backsolving gets talked about, the more I start to think we should remove the online answer submission for Mystery Hunt. If people can’t understand “this team put in tens of thousands of work to make you this thing that you get to do for free, maybe don’t continue to demonstrate that you don’t give a shit about the puzzles they wrote by spamming backsolved answers at them.”

        Like

      • At any rate, I’m not arguing that teams shouldn’t do their best to encourage better behavior, and even sometimes ban people for behaving badly. I’m just saying that it’s really rough to permanently penalize a team for behavior that they don’t have a good way to monitor or control until after-the-fact, and where this difficulty of monitoring and controlling is due to design decisions made by the organizing team (call-out instead of call-in, no tracking of who is submitting answers, no way of letting someone look at puzzles without also authorizing them to call in whatever they want, etc.)

        Like

      • Tanis said:

        If people can’t understand “this team put in tens of thousands of work to make you this thing that you get to do for free, maybe don’t continue to demonstrate that you don’t give a shit about the puzzles they wrote by spamming backsolved answers at them.”

        That’s not always an A follows B situation, though, regardless of what it may feel like. I have it on my list to go back and and give a fair shake to all of the Mystereo Canto puzzles that I haven’t seen, both forward-solved and back-solved by my team, because I am fully appreciative of their high quality and their overall joie de vivre. And I say that as a member of a team (that year) that wanted to win pretty badly and was making full use of the backsolving-opens-more-puzzles-and-gives-us-a-broader-attack-surface feedback loop in the process.

        For that matter, just because a team forward-solved a puzzle doesn’t mean that they appreciated *that particular puzzle*. It may simply be something that was not to their tastes. It may have been resource-intensive-but-absolutely-necessary-to-keep-going when most of their team happened to be asleep. I agree the sentiment from elsewhere that a team and its solvers have a social-contract-esque obligation to respect the puzzles and their authors; I do not, however, believe that a team or its solvers have an obligation to enjoy a particular hunt or its puzzles in precisely the way that the hunt-running team specify. It is worth remembering that in many cases (I would estimate a slight majority of them), that the puzzles which end up getting backsolved are the ones which a team’s solves are stalled, sloggy, or in some other way just not enjoyable for that particular team, in which case, it’s probably not a bad thing that the team is attempting to backsolve those puzzles, at least from an overall fun perspective.

        None of this excuses *spamming* backsolves (or spamming short-circuited forward partial solves, which seems to elicit less attention), which is an entirely separate social animal, and has much to do with the paradoxical interactions between Mystery Hunt being a social, leisure, and competitive event simultaneously. The competitive aspect of hunt (as well as the desire to open as many puzzles as you can as quickly as possible) will likely always result in competitive teams wishing to solve/backsolve/short-circuit as quickly as they can within the parameters set by the hunt organizers, even if the combinatorics mean that that teams will be using as many as 20 or 30 backsolve attempts to backsolve effectively at a given point. The *temptation* to spam backsolves is always going to be there, and the temptation to *spam* backsolves will not only be present, but be exacerbated, when measures to make backsolving more difficult are in place. I’m not sure any sort of social sea change is possible in this regard at this point, but it is certainly worth discussing nevertheless.

        Like

      • For me there’s a huge difference between backsolving and abusive backsolving. But there were definitely also people on Setec who felt bummed that puzzles they spent lots of time on got bypassed. (As a back solver myself, I just consider that part of the process when I write for Hunt… But I also understand I’ve gotten to do that eight times now, and if somebody writes one puzzle for one Hunt ever, it’s kind of a downer if it seems like everybody’s skipping it. I suspect Uncommon Bonds in 2019 didn’t get solved for realz very often.) When somebody calls in the same thing six times, it sends the message to the constructors, “We’re not trying to skip this one puzzle that’s not our jam… We’re trying to skip everything regardless of whether it stops up the call queue.”

        This actually suggests there’s a place for the “gotta solve ’em all” dynamic I didn’t love in the Cryptex Hunt; if you have to solve everything, every puzzle at least gets engaged with. But then again, (a) people who primarily solve Mystery Hunt might be particular about contributing to that thing that they like, where backsolving is an expected thing to some degree, and (b) in a gotta solve ’em all hunt, if one of the puzzles turns out to be too hard or broken, you can expect that to be the main thing solvers remember afterwards.

        Like

      • I’ve generally hunted on teams that are more laissez-faire on answer submission, so as an extra data point, here are times I’ve seen multiple answers for a single puzzle.

        * People across several rooms or continents are working on the same puzzle have solved it and 2 people simultaneously decide to submit it before they realize the other is already intending to submit.
        * The above happens, except they only have a partial and are trying to wheel-of-fortune the answer and didn’t spot the pending answer because, again, simultaneous submission.
        * Someone makes a typo in answer submission so it gets resubmitted with the correct spelling. (i.e. if you know the pending answer is wrong, why shouldn’t you submit the right one right away, instead of waiting for a call-back telling you your answer is wrong?)
        * People are backsolving and haven’t been told to be reasonable on backsolve attempts.

        I would say that submitting 2 or more backsolve attempts on the same puzzle with the knowledge that at most one of them is correct is, for sure, a dick move. But there are plenty of non-malicious reasons to have multiple pending answers. I would say usual hunt solving leads to ~2 pending answers at most, just from lack of coordination.

        I don’t see any problems with not allowing the same answer to go to 2 different puzzles – that seems like the more common form of backsolve abuse to me.

        Like

  8. I think instead of getting rid of online answer submission, it might be simpler to restrict online answer submission to not all users. Or at least have some way to *track* online answer submission. For example, you could have an extra required box for call-in where the person calling in puts their name/nickname/username. Back in the call-in days, the leadership could see in real time who is calling in what and talk to them if it was a problem.

    Liked by 1 person

  9. I was glad you mentioned the 80% rule because I think that if the metapuzzle really is designed to require 80% of the answers to solve (and not significantly more or less), then you sidestep most of these issues. I.e. backsolving is less concerning of an issue when only 20% of the feeder answers are available to be backsolved, and then you don’t really have to worry about artificially rate-limiting guesses.

    Most metapuzzles have enough parameters that can be tuned (do you give the puzzles in order or out of order, how helpful is the flavortext, how suggestive is the layout and design of the metapuzzle) that you can get reasonably close to that 80% mark no matter the inherent difficulty of the gimmick. The trick is (and I think we fell short in 2017 on this point) to clearly establish that 80% design criterion, and spend enough time test-solving, including under final layout and design conditions, to see how close to the mark you’re landing.

    I’ll emphasize that “including under final layout and design conditions” is I think an often-overlooked component; I can think of many metas over the years that have been made significantly harder OR easier based on small final design choices that appeared innocuous, but had a (surprisingly) large effect on the always-mysterious process of solvers being able to suddenly “grok” a metapuzzle.

    Liked by 1 person

  10. I know I am very much in the minority, but I much prefer an opaque penalty system that is totally dependent on the whims of the running team. If given clear submission rules, most teams will simply push right up to the limit of these rules. I have no doubt that with a rule such as “teams must wait a minute before submitting another answer to a specific puzzle,” that there will be teams that will code up a bot that will submit a given string of guesses for each puzzle every sixty seconds. With an unpredictable system determined by humans, teams may think a bit harder about spamming the queue with far-fetched guesses, since they don’t know what sort of penalty this will earn them. Perhaps then they will be more likely to save submissions for more educated guesses, or in dire cases, when they actually solve the puzzle.

    Any rule-based system is a system that can be gamed. Performance in the Mystery Hunt should be determined by the team that is best at solving puzzles, not the team that most efficiently spams the answer submission system.

    Liked by 2 people

    • See also: https://rady.ucsd.edu/faculty/directory/gneezy/pub/docs/fine.pdf which begins:

      The deterrence hypothesis predicts that the introduction of a penalty that leaves everything else unchanged will reduce the occurrence of the behavior subject to the fine. We present the result of a field study in a group of day-care centers that contradicts this prediction. Parents used to arrive late to collect their children, forcing a teacher to stay after closing time. We introduced a monetary fine for late-coming parents. As a result, the number of late-coming parents increased significantly. After the fine was removed no reduction occurred. We argue that penalties are usually introduced into an incomplete contract, social or private. They may change the information that agents have, and therefore the effect on behavior may be opposite of that expected.

      Liked by 2 people

  11. Personally I quite like the system where a meta unlocks after X% of its feeders have been solved. Ideally X can be a little less than 80, so there’s still some room for feats of short-circuiting. This expands the space of viable metapuzzles a lot. And, most importantly, it eliminates the risk that you’ve missed a short-circuit weakness and a team will get to ignore an *entire round* of puzzles because of it. Normally there’s too much riding on the constructing team’s assessment of one single (meta)puzzle’s difficulty.

    Liked by 1 person

    • I believe GPH did this last year too, at least for the opening round (I forget if the harder rounds worked this way too or if they had their metas immediately visible).

      It was a bit anti climactic when the opening round meta in GPH took us like ten minutes to solve without having to think particularly hard, though we had the flip side in the later rounds where we barely managed to solve one meta by the last hours of the hunt (and most of them were open for a while before that).

      Like

      • GPH in 2018 gated the metas behind a similar metric to this year’s hunt. We definitely cracked a few of the mid-round metas as soon as they unlocked because we had theories from the round name itself on how it worked.

        Like

      • I may have noticed this less (and/or been annoyed by it less) in 2018 because it was more transparent which things led to which metas. But there was also the subtle difference that points were added over time or via clicking, so in theory, you might gain access to a metapuzzle with a lower percentage of answers… just later.

        This is a good moment to note that I appreciate (and based on our design decisions, most of Setec leadership appreciates) transparency in Hunt unlock structure. I think it’s okay and even good if some elements are initially surprising (wait, there are three more characters? wait, this town connects to THREE towns, and each town has an extra puzzle in it?) but ideally after making a decent amount of progress, you should know why things have been opening and have some idea of what future actions will accomplish. Whereas in GPH 2019, I didn’t know why the initial metas opened (and why the last two didn’t for a long time) until I chatted with other teams after the event.

        Like

  12. One thing which is worth mentioning about backsolving in general is that for particular unlock and advancement structures, it acts as a safety valve in the event that puzzle progress or puzzle or round unlocking is happening considerably more slowly than anticipated.

    Because of this, design decisions which are made with either the goal or the net effect of reducing backsolving or making it more difficult will accordingly require more diligent attention to understanding whether the expected difficulty level of the puzzles and metas is the actual difficulty level, and to understanding the affects of various paces and types of progress through the chosen unlock and advancement structure.

    Like

  13. I think the issue is really about baseless guessing, not about backsolving or short-circuiting.

    Looks like commenters defend (and enjoy!) backsolving and short-circuiting in many circumstances. Both are a valuable part of the puzzle hunt experience, and both are a built-in byproduct of what people say makes a satisfying metapuzzle. Solvers really like the feeling of being able to short-circuit a metapuzzle and backsolve an answer.

    I think the problem, for hunt authors, is when backsolving and short-circuiting is not performed with enough of the regular “puzzle solving” process.

    Like, if you told me you like solving puzzles and you were eager to try the ones I write, and so I wrote one for you, then I’d really like you to try to solve it. If you ignored the work and structure behind the puzzle, favoring a dictionary attack instead, I’d be frustrated. (Psst, Wei-Hwa, this is the thinking behind why it’s “a jerk move”.)

    Similarly, for a metapuzzle, I’d like solvers to meet one of those on its own terms. Those terms usually include attempts at short-circuiting along with some predictive backsolving! Consequently, if my feeder puzzle gets backsolved because the metapuzzle was solved in the intended way, then I’m merely disappointed instead of frustrated.

    It really is the baseless guessing that gets people down. The attitudes around it are basically the same as the last big comment party on this blog. There, the “problem” was “There’s a new puzzle in Valentine’s, and I can’t tell if it goes to a meta I already solved or if it’s going to give me something new.” If it’s a “problem”, it’s because you want to put your effort towards making progress on the hunt. On the other hand, dude, it’s a puzzle and you like puzzles, it should be fun.

    The same attitude is going to rule the day on baseless guessing: Are you trying to solve puzzles? Or are you trying to put your time into the efforts that will advance you fastest?

    It seems like that’s the main question worth answering here, moreso than whether it’s okay to call in an answer when there’s one, two, or nine puzzles it could fit with.

    Liked by 1 person

    • Brian wrote: “Like, if you told me you like solving puzzles and you were eager to try the ones I write, and so I wrote one for you, then I’d really like you to try to solve it. If you ignored the work and structure behind the puzzle, favoring a dictionary attack instead, I’d be frustrated. (Psst, Wei-Hwa, this is the thinking behind why it’s “a jerk move”.)”

      But you’re not writing a puzzle for me. Instead, you’re writing a puzzle that is 1% of an event, and I’m 3% of a team that’s trying to solve the event, not your individual puzzle. The chances that I see your specific puzzle that you wrote for me is tiny. This is like if you had crafted a super-elegant crossword clue, put it in a crossword, ask me what I thought of it, and I respond, “oh, I didn’t look at that clue; I got the answer from all the crossings.” Sure, it makes sense for you to be frustrated, but if you wanted me to appreciate your clue, maybe don’t put it inside a larger puzzle system where a solver can get the final goal without having to look at your clue.

      If my goal in playing Hunt was to maximize the solving of puzzles, I wouldn’t be on a team of 50 people.playing during the Hunt; I’d be solving them afterwards at my leisure with maybe a small group of like-minded folks. Your question is right on point: “Are you trying to solve puzzles? Or are you trying to put your time into the efforts that will advance you fastest?” The answer is that I’m finding the point between those two extremes that is going to maximize my fun. And that could be 90% solving puzzles and 10% advancing fastest.

      For some events, I prioritize 100% solving puzzles. And that means, no large team and no splitting up the team to work on different puzzles at the same time. Sometimes that’s fun. And sometimes that’s frustrating, especially if the puzzles are hard and there’s no release system (the most recent GPH was an example of that).

      Like

    • To be clear — at no point did I ever say I don’t understand why Dan considers it a “jerk move”. I completely understand why some of you think it’s a jerk move; I just happen to disagree.

      Like

Leave a comment