Hyperphone
"Did you know the Greenland shark doesn't reach reproductive maturity until it's over 100 years old?"
"Yeah, it's crazy! What evolutionary pressures could possibly have produced that trait? Maybe it has to do with... "Well I was thinking that...
Apparently this post extremely fails to communicate what a hyperphone even is. I want to leave the post as is because I think the motivation is really important. But you may wish to skip to the section "Hold on, what even is a hyperphone?" at the end.
1. Aside: Tools for thinking Parathesizers
Tools that help you think would be cool to have. By thinking, I mean what you're doing when you're solving problems that aren't satisfactorily addressed by your existing skills; the activity that leads to insight, ideas, creating language; the activity that defines, propagates, and solves constraints by generating new structure. A tool for thinking would be a tool that fits into this process, somehow enhancing it, or being available so that the thinking process can amplify its own strength by using the tool.
This seems, if not disjoint from, very distinct from how people use the phrase "tools for thought". Although the phrase is abstract, the actual tools described that way are more like tools for keeping files, and tools for conducting business. (Not to belittle those goals!) A spreadsheet is more a tool for thinking than any note-taking system I've seen.
I'll use the term "parathesizer". Analogous to "synthesize" = "together-put", a parathesizer is a tool that puts things alongside each other. This is the most I'd hope to get out of software tools, for the time being. Synthesis needs a mind, but the software can help with bringing things together from afar, making some contact, correlating things that are by default cumbersome to correlate. (What might be a better word?)
See here for some more parathesizers.
2. Short provocation for a hyperphone
Sometimes there's a moment in conversations where someone says something, and that thing holds so much significance for the speaker and/or the listener, that one-track linear time can't contain all the ensuing thoughts and questions. It'd be cool if those moments could be given more space to unfold and to take a prominent place in the conversation.
3. Longer motivation for a hyperphone
Decoupling speaking and listening
Spoken conversation has some magic to it, that I don't know how to get from any single other medium (e.g. text). Text as a medium for conversation allows asynchronicity; you can comment on, point to, pin, and reread any part of the written record. It seems to me at some points in conversations about ideas, that it would be better to decouple the temporal progressions of speaking and listening, and enable non-linear progressions, as in text.
Multiple insistent movements
Alice says something. Bob is triggered about what Alice says, and interrupts Alice to express something from his triggeredness. So far this is all good and fine, we're just bringing things up to the surface so we can speak and think and feel them together. But then something Bob says from his triggeredness triggers something in Alice. So, ok, now Alice wants to interrupt Bob. If she does so, then she could be accused, sometimes justly and sometimes unjustly, of adversarially interfering with Bob's processing. On the other hand, maybe Alice's triggeredness seems to her to have really been bound up with the fact that Bob interrupted what she was originally saying. In that case, it's Bob who perhaps is interfering with Alice's processing.
So now we are in an escalating conflict with ambiguous instigation, multiple urgent needs, and no clear priority or method for processing together. See also: esc. What to do? If Alice and Bob have a hyperphone, then here's what happens:
When Bob gets triggered, he presses pause on Alice. Alice keeps talking, but is alerted that Bob pressed pause. (She's learned, after a couple hours with the hyperphone, to smoothly decide whether to keep going or to wait for Bob.) Bob talks out what he's newly seeing. When he's done-for-now, he goes back and continues listening to Alice's original thought; and he can pause again, and speak again, if——as is likely, with a triggeredness——he didn't fully process it out on his own, but was able to see more by going back to the originally triggering line of thought. At the same time, Alice can listen to Bob's processing——or not, depending on context and Alice's and Bob's own specific norms, and depending on whether Bob labeled that branch as "actually, I figured it out and don't think I bid for you to listen to this as if I'd said it to you in real time". And so on.
Ephemeral glistening sentential artifacts
We're talking and thinking together. You've glimpsed something out of the corner of your eye——an apparent contradiction in your beliefs, a phenomenon that seems charmed, a trailhead, a handle on something that's been evading you, a way to tilt your perspective sideways and see things from a novel orientation. You struggle to say it, you make a few drive-by attempts that gesture in the direction or bite off a piece, but you haven't yet grasped the thing firmly enough that you're confident you'll remember the idea clearly——firmly enough that you can pull yourself up and sit on it, to reach further from there.
Then, in the interplay of our ideas, in a burst, you pull the pieces together, strung together like Odysseus's bow, in tension, but with a basic rough-and-ready integrity, at least for now. You speak a sentence that's an artifact. A faceted gemstone, hanging in the air between us, for us both to look at.
We'd wish to walk around the floating gemstone. We'd wish to view it from all sides, refract light through it from different directions, feel its edges, adjust its facets, compare it to our other ideas. We may, through collective effort, be able to sustain the gem there in the air at least for a time.
But then you produce another gemstone, and I produce one too. Now we can't hold them all up. Our ability, not just to urgently immediately attend, but even to remember at all, is stretched thin. Gems dissolve and are blown away like powder and spidersilk and needle-thin icicles.
Each gem could have taught us a cornucopia, patterns emerging with their stately cadence one after another from the gem. Each gem was hard earned, with silent squints and long-honed taste. Each gem lost tells us that gems are not valued.
A hyperphone can't hold up a gem in midair, all on its own, while we go off elsewhere. But a hyperphone can record how we spoke the gem and label it and then give the gem back to us. If you drop your gem, you'll then have to be like a third interlocutor interloping, hearing the gem from a third-person perspective; but you're also the first-person, and you'll recreate the gem much more quickly. So a hyperphone relaxes the constraints a little; it's an assist for gemweavers, like if, when you're juggling, you're allowed to toss me a ball to hold for a second whenever you want.
Sparks of thought are precious and shouldn't be so easily left to be lost.
Abstract example
Say Alice and Bob are conversing.
Normally: At most one of Alice and Bob can speak, and at most one can listen, at one time. Bob can only hear X at the rate and time that Alice says X. If at one moment, Alice and Bob each want to [speak and be heard] or [follow a thought], then at least one of them will have to wait. To recall a previous idea, Alice has to remind herself of the idea, and then "manually" re-evoke the context for Bob. (Or, what is also a loss, she only half-listens to Bob, without him knowing, degrading common knowledge and losing Bob's thought.)
With a hyperphone, Alice and Bob can, at the same time: speak and be (eventually) heard, follow a thought, sit in silence, listen to anything that has already been said (at any speed). In particular, if Alice says something of particular significance, then Bob and Alice can both immediately burst out with many trains of thought, and Alice's igniting thought as well as the resulting flames of thinking can all later be pointed to and revisited.
There's a lot of important stuff that comes from a fully coupled conversation. For example: the constraints might be a forcing function to compress ideas and make them handier; the limited channel encourages strongly applying taste to which branches to follow; and a very productive mode is fast back-and-forths to hone in on the "domain specific language" needed to communicate stuff across minds that use different mental vocabularies. Also, using a hyperphone adds some overhead and constraints of its own. So I'm imagining starting from the assumption that the conversation will by default be fully coupled, and then enable and organize excursions into asynchronicity as breathing room for when the ideas start arriving too quickly and bigly.
Octopus mind
Our conscious experience——which is to say, narrativized broadcasted deliberate global-workspace thinking——is unitary. But behind that, going on in the background, there are many threads of noticing / pushing / tinkering / searching / asking / imagining.
To participate in a fully synchronous conversation, we filter and decide ("away-cut"), picking just one thread. If we wish to multiplex between our own timethreads, out loud in a conversation, we are burdening our interlocutor: they either have to burdensomely spin up multiple threads of interpreting our threads, or they have to filter away threads, effectively ignoring us. Since interpretation is slower than expression, this is a bottleneck.
With a hyperphone, sychrony is relaxed into quasi-synchrony. At the price of a little less liveness, a little less immediate feedback, a bit more costly coordination to drill down when back-and-forth is needed for the homing signal, what is purchased is a larger buffer to store multiple timethreads being expressed for future interpretation. Given the larger buffer, octopus minds are somewhat less constrained to filter away some arms's activity.
A wider bandwidth for timethreads opens up the possibility of collaboratively working on thoughts that require more channels of creativity. Indra's net creates a centripetal force, pulling thoughts into the convex hull of the preexisting conceptual scheme. To explore multiple simultaneous modifications requires more channels of creativity. Like the difference between evolution piling up isolated tweaks and a designer leaping to an island of effectiveness, across a valley of ineffectiveness, via multiple simultaneous changes, a multichannel conversation can rewrite conceptual schemes in ways that are more difficult in single channel conversations.
Hyperphone as a parathesizer
If a conversation is a garden of forking paths, then in normal speech, interlocutors are synodotors: they walk together. What does a hyperphone parathesize? It parathesizes spots on divergent paths. The interlocutors separate, go to two different places in thinking, and then come back together. The two different places have been brought alongside each other by the hyperphone (and the speakers).
4. The UI challenge of hyperphone
It seems that there are major user interface problems that would have to be solved to make a usable hyperphone.
Streams
I'm imagining something like: at all times, both speakers (who are perhaps in separate audio spaces, speaking through computers) are recorded. At first there is a single displayed stream, which is both speakers's streams. When speaker S1 wants to diverge, ze disengages from concurrency with speaker S2. That means that
- S1 stops hearing S2's stream (and a marker is left, denoting up until when has S1 heard S2's stream).
- S2 stops hearing S1's stream.
- S1's stream stops being associated with S2's current stream as "concurrent" or "two-way channel open".
- Each of S1's stream and S2's stream are now individual (non-concurrent) streams.
S1 can at any time reengage, i.e. undo these changes, so that there's a single paired stream. While disengaged (from concurrency with S2), S1 can click around to listen to any prior recorded chunk of audio (from either speaker), and any speed. S1 can also start speaking again without reengaging. This can be in a new thread, or tagged as branching from some point in any stream (or even in response to multiple points).
Organization of the display
It's not clear to me how to organize all this on a screen, as there might be a strange branching structure, multiple independent threads, long conversations, and so on. There's also a large source of complexity: what's relevant to each speaker is different, and they should be able to navigate separately, but on the other hand, it's important to have common knowledge, for example of who has read what. Streams could be visualized as chunks that are light or dark for S1 depending on whether or when S2 has listened to them. If there are manual annotations that reposition elements, though, then the speakers could end up with pretty different displays.
One can imagine all sorts of additional features, such as waveforms, AI speech-to-text and text-to-summary tagging, manual annotations of e.g. priority or response, pinning key moments to the display, and so on. But these should be in response to immediately felt need, rather than what sounds cool.
If you're interested in hyperphone, feel free to contact me, my gmail is: tsvibtcontact
5. ntext
[Update July 2024: ntext now seems like a waste of time; if someone wanted to build this they should just go for the actual thing with audio.]
Since dealing with audio is hard and also the UI will be complicated, a better starting place might be text. It's easier, and some of the UI challenges overlap with those of hyperphone.
"ntext" has two meanings: it's equivalent to "co-context"——it's the software that creates the conditions for context to be appropriately tracked. Also, it's $n$ columns of text, as follows.
A problem with text chat is the lack of threads. "What?" you say, "basically every chat app has threads.". Yes but they're all dumped into one column, interleaved, as in Signal. You make have channels, but those can't be viewed at the same time. Maybe you have two columns (as in slack or discord), but still:
- You can't view, and fluidly interact with, more than two threads in one context.
- Threads aren't first class. To find a thread in discord, you have to find the message that started the thread. There's the single real main thread, and then other second-class threads.
- You can manually annotate replies. But you can't annotate that one message replies to multiple messages.
- And, crucially, you can't (AFAIK?) click on a message to see a column that displays the upward reply chain, thereby giving you the context of that reply, without having to scroll back through all the messages from all the interleaved conceptual threads that got dumped into the one UI thread.
- You don't have separate "I saw this" autodetection for organic (annotated reply) threads.
Another missing feature is pinned messages, which seem quite important for making the context richly available, and for communicating what seems important to people.
One can also imagine "stack traces". They could simply be manual: one notes "milestone" messages, and then can later show the messages reply-upstream of this message, but only the milestones. Or, show all recent milestones. Or the stack trace might be usefully automatable: milestones are those messages from which multiple branches grow, or those messages which a language model things are tangents / digressions / changes of subject, or something.
Assuming these features don't exist, presumably that's because people don't want them enough to be worth the UI clutter and the program complexity. Fair enough, but those are some of the demands of certain high-cognitive-load conversations, which are where the thinking happens. Or at least, it seems to me that there ought to be useful features of this flavor, given how often I'm in text conversations with an overflow of ideas on multiple threads, dumped into a single stream.
If you're interested in ntext, feel free to contact me, my gmail is: tsvibtcontact
6. Update July 2024
It remains the case that I might possibly build this but that it would go much faster with collaborators and/or money, especially collaborators who know well the fundamentals of how to deal with audio.
Here is a doc with some design details:
https://docs.google.com/document/d/1XGiOGiPNWXtHV6LIo9NBX2jAwI9HzsIysy44pIa_GVg/edit
Here are some pictures from that doc:
7. Hold on, what even is a hyperphone?
10 second pitch
A hyperphone is an audio chat app that works like two people simultaneously YouTube livestreaming and listening to each other, giving the benefit of recording and playback controls.
What a hyperphone is
A hyperphone is an audio chat app. It can be used exactly like a phone call with exactly the same experience.
But a hyperphone also lets the conversation be a little bit asychronous. That means if Alice and Bob are talking, they don't always have to be hearing what the other person is saying exactly when the other person is saying it. Two simple examples:
- Alice says something, but there's some problem with the internet connection, so some of the stuff gets lost in transmission. With a normal phone, what Alice was saying is just lost. With a hyperphone, it gets recorded, so Alice's machine retransmits it.
- Alice says something, but Bob didn't get it the first time because it was complicated or he was distracted.
In both cases, Bob can skip back a bit and (re)listen to what Alice said. At the same time, Alice can keep talking. Bob can catch up to Alice by listening at 1.5x speed, say.
A more complicated example:
- Alice says something super interesting that makes Bob have an idea he really wants to say. But Alice isn't done with her thought. With a normal phone, Bob has to either hold his thought or interrupt Alice. With a hyperphone, Bob can "pause" Alice for a while, and then later hit play again.
- Concretely, Bob can start talking immediately in a separate channel (either he presses a button, or the hyperphone automatically does it somehow). So now Bob's not hearing Alice, and Alice isn't hearing Bob.
- When he's done, he can go listen to what Alice had gone on to say.
- And when Alice is done talking, she can go listen to what Bob had said in the separate channel.
This example can't be handled by literally having two YouTube livestreams, one by Alice for Bob to hear and the other by Bob for Alice to hear. We need Alice's playback of Bob talking to pause when Bob starts talking in parallel with Alice, but then have what he said be available to her later (so we can't just have Bob mute himself).
That's it.
What a hyperphone is not
It's not a text app. It's an audio chat app.
It's not about branching or threads, though it involves branching and threads. A hyperphone doesn't want there to be branching and threads. It's an audio chat app, so you're supposed to be talking with a person——you say stuff to them that they listen to, and you listen to the stuff that they say to you. You're always trying to get back to synchronous talking; the threadedness is to accommodate gracefully when perfectly synchronous talking isn't so great.
It's not about making a tree, or doing something complicated with pointers and references and whatever. Maybe there's something good like that, but a hyperphone is much more basic. First things first.
There are more complicated things that might be nice to have, that a hyperphone could provide. E.g.: Alice wants to bookmark a point and not forget to come back to it. So she clicks and presses a button, and now that audio block / point in time gets pinned to the user interface. But it's hard to tell in advance which things like this would actually be helpful.