Diaspora Community - Federation Discussion

Sun 13 Oct 2013 2:27PM

Public post federation

Jason Robinson Public Seen by 50

The lack of public post federation in Diaspora is IMHO a make or break feature. The whole network is a little broken as small pods are cut of most of the posts on the network due to the way current federation works.

Here is my proposal for solving this issue, please see wiki post here.

It is not a comprehensive solution that can just be implemented now. It is a high level suggestion for going forward with talking about such a feature.

Jason Robinson Mon 14 Oct 2013 6:28AM

@jonnehass yeah I have no real grasp of the D* protocol so I didn't mention too much about the specifics in the proposal, just the idea of the relay.

To me the central hub makes sense for a reliable source of network data. I mean we don't decentralize the project page and the wiki either - the central hub isn't any different from those resources.

Elm Wed 23 Oct 2013 7:15AM

Still, I also believe tag federation across pods would be a good feature for Diaspora…

Brad Koehn Thu 14 Nov 2013 12:52PM

I have a different way to solve this problem. I'll try to get a page on the wiki this weekend.

Brad Koehn Fri 15 Nov 2013 3:30AM

OK, alternative proposal here: https://wiki.diasporafoundation.org/Tag_aggregation

Brad Koehn Fri 15 Nov 2013 11:40AM

Let me know if I should start a new Loomio proposal. I'm new at this.

goob Fri 15 Nov 2013 4:48PM

What advantages does this method have over the scheme I proposed in this thread (it talks about tag federation about half-way down).

I'm not a coder at all, so this is just a concept rather than anything more detailed, but I hope it might help to improve D*'s federation, especially for new and one-user pods.

Brad Koehn Fri 15 Nov 2013 5:11PM

@goob My concern with that approach is that it's very inefficient (it looks like O(n²⁾ for you computer science types where n is the number of pods; the amount of network traffic grows rapidly as the number of pods increases), and it implies that all pods are equally trustworthy sources of information about the podsphere.
Also it seems to me that your proposed solution would require a lot of new code to be developed, alongside new messaging semantics. This is based on a very quick analysis so I could be way off base here.

I'm trying for an incremental model that scales well with minimal new coding required. Also in my proposal tagged posts are federated in a very efficient model that scales linearly (O(n + m) where n is the number of posts and m is the number of pods).

goob Fri 15 Nov 2013 6:43PM

Thanks for you reply. That sounds plausible (damnit!).

Brad Koehn Fri 15 Nov 2013 11:19PM

@Goob no damnit at all! I wouldn't have thought of using the pod to help locate other users or index other posts were it not for your proposal. There's probably a better idea out there than the one I proposed too; I just hope I can contribute something.

Jason Robinson Sat 16 Nov 2013 11:40AM

Great @bradkoehn ! And @goob all ideas are welcome - they all add to our ability to build the best solution.

There are many similarities regarding mine and Brad's idea - especially the part that relates to a central hub to store pod information. I really really am sure we need this, for many reasons. I'll put something in the wiki and separate this into it's own place since it is kinda separate though required by the public post federation/aggregation. Once we have a spec we'll just need to vote since I know some community members don't like this idea even if it is totally opt in :P

As for the idea from Brad, I'm quite sure it would do the job and would be happy if either idea was implemented. Initially I was questioning the idea of pull instead of push, but I guess the pubsubhubbub takes care of that problem (even though then we do rely on those external services, the default diaspora uses is from google).

I do think though that my relay server idea is lighter because there is no need to save posts. It also handles redundancy - pods are not tied to any particular aggregator and thus even if all but one of them are down the post will be delivered to all listeners.

Security I guess in both would be the same. Except I see some worry that an aggregator could be populated with non-authentic posts, and even if no pods accept the posts, some other source might do. Since the aggregator would have an open interface, it wouldn't take long for someone to build an app to show posts in the diaspora network going through the aggregator. In this situation it would be trivial to inject posts into the aggregator, unless the aggregator checks all of them. In the relay idea this is not a risk since the aggregator doesn't store posts.

Any other opinions on these ideas?

Flaburgan Mon 18 Nov 2013 4:09AM

I don't know enough to talk about the technical point, but I know something: I'm strongly opposed to anything which would involve Google services (even if it's public data): we saw how they turned of Google reader, or decide suddently to make Google Maps API a paying service. I don't want to depend of a company for a feature critic like this one.

Maciek Łoziński Tue 19 Nov 2013 8:21PM

There are many P2P networks and routing protocols out there, In my opinion we should go this path.
What if every pod was a relay for it's own users' followed tags?

Elm Wed 20 Nov 2013 3:53PM

I’ll prefer the distributed path even if some seeds/pods would loose a bit of not federated info. (Not sure to understand how it would work out though). @macieklozinski : could you precise how it would work for tag federation ?

Maciek Łoziński Wed 20 Nov 2013 9:43PM

I'll try to do some deeper research on possible p2p solutions.

Maciek Łoziński Wed 20 Nov 2013 10:53PM

I'm not sure if it fits well with Diaspora's protocols, but I could suggest something like this:

When user A shares with user B on another pod, user B's pod becomes "neighbor" of user A's pod.
User B's pod "subscribes" to user A's pod for all tags that user B's pod users follow.
Each pod keeps a list of it's neighbors and tags they subscribe.
When user on a certain pod makes a public post, it's sent to all neighbors subscribed for tags present on this post.
If a pod receives a public post from other pod and does not have this post in it's database, it passes it to all neighbors subscribing to tags present in this post, and saves post to database.
If a received post is already present in database, nothing happens.

goob Thu 28 Nov 2013 4:04PM

I think there's a big difference between having a central hub which contains information pertaining to the D* network but which is separate from the network itself (such as the project website, poduptime, etc), and a central hub which is an integral part of the D* network and receives/sends/stores data from that network, such as post data, which is what is being proposed here. With a central hub as an integral part of the network, the network would no longer be fully distributed.

If a central hub of any sort is actually needed in order for post/tag federation to work properly, I suggest it be restricted to holding meta-data, such as a list and IP addresses of pods or relay servers. This could be the same central hub which helped people to choose a pod to register at, as poduptime does at the moment.

It would then only be referred to when a new pod or relay server was brought online. The new pod would then call hub.diasporafoundation.org (for example), which would give it some pods/relay servers to contact from which it could pull post data. The actual transmitting of post data would be done by the pods/relay servers themselves, with no involvement from the central hub.

This is similar to one of the proposals I made in this discussion on adding pull to Diaspora's push model (the proposal concerning tags).

I'm not sure relay servers separate from the pods themselves would be needed; I think there is a way of making pods federate public data more effectively without using a separate network of relays, if they are connected correctly together.

Note that in the following, when I talk of connections/sharing between pods, I'm not talking about the normal connections between pods which exist, but a kind of meta-network to push public data around more effectively, of the kind Jason talks about in his proposal.

I would suggest using a kind of 'cell structure', in which each pod is connected directly with several other pods in the network, and through that structure build up a list of public posts and tagged posts data to pass on to other pods. This avoids the problem of scalability faced if 'every pod knows every pod'. If the relay connections between pods are made correctly, public data will be federated to every pod quickly, via indirect routes (Pod A shares it with the several pods to which it has direct connections; those pods share it with the pods with which they have direct connections; and so on). If there is redundancy built in to this network, it won't matter if several pods in this network are down; the data will get fed around to the whole network eventually in any case.

It might be that each pod needs to be connected only to two other pods in the network for this to work, like the classic Communist cell structure – as in the graphic below (not perfectly illustrated, but it gives you an idea):

Cell structure network

I'm sure there is a way of coding into the D* software itself so that it builds a network of connections such that each time a new pod is brought into the network, the network recalibrates its connections so that this new pod is made a part of the sharing network, without reference to any external source such as relay servers or a central hub. Likewise each time a pod drops out. However, I would have no idea how to do this! I hope someone out there will do, and that my partly developed concept will spark ideas for practical solutions in their mind.

If a central hub is needed to help new pods get connected, I think we should have a mirror or two on other servers just in case the project site is down when a pod is brought online.

Jason Robinson Thu 28 Nov 2013 6:11PM

@goob , will read the rest of you long comment later, but you should maybe read my proposal too ;)

a central hub which is an integral part of the D* network and receives/sends/stores data from that network, such as post data, which is what is being proposed here.

I have proposed no such thing. This is the reason I stopped the whole vote for the central hub because not many people even understood my proposal.

goob Thu 28 Nov 2013 6:47PM

My mistake. I did read you various proposals and wiki articles, but there’s been so much to read and digest that I got confused. I read that suggestion somewhere on one of the several threads on this/related topics, then while writing I got my wires crossed and thought it was you who had proposed it.

Just ignore the last seven words in that extract. The point stands, no matter who proposed it, or even if no one has proposed it yet!

Jason Robinson Thu 28 Nov 2013 8:06PM

OK read the whole post now. I think we are thinking on similar lines. However, as a software developer I always think of one of the golden rules of software design - making sure each component has one purpose and that only. Incorporating everything and the rest too is possible - hey we could make diaspora also serve files and incorporate an IRC server + maybe do some test automation services on top. But it's a bad idea. Diaspora server as it is now exists to provide the UI for the server. The federation stuff is actually being pushed out of the main component just so that diaspora will be more flexible. Why would we want to bundle up more non-UI related features then?

IMHO, the system to federate posts around should be decentralized, but it should also be it's own mini-network of volunteers. This is exactly what my relay servers proposal is about. :) A bunch of relays taking care of the public post handling in a decentralized way - and pods will not even have to decide which relay to use, giving total redundancy even if all except one relay is down.

I still feel many people misunderstood this which is why when I finish the statistics hub, I'll start working on a POC relay and see if I can provide the hooks on D* side (the more difficult part for myself, being ruby).

Also, as you said, we could federate the metadata for relays around totally without a central hub. Sure it's possible, but imho it's a bad idea. It adds nothing to decentralization and does not benefit anyone in any way except adding complexity. A simple list on the project site would do fine, since pods would only need to pull it in every so often to refresh their list.

Decentralization is a good thing and awesome - but it's not a magic word to use with everything and assume that it makes thing better.

Jason Robinson Thu 28 Nov 2013 8:11PM

Btw, my original proposal said storing the "wants tags" list on the central hub. This is not really necessary if such data is stored on the relays instead. It just would mean more posting of said lists around since all relays need to know asap or the pod will miss posts. Storing the list on the central hub would make for less bouncing off lists around - if the central hub is down it doesn't matter since relays have the latest list and will then refresh once the central hub is back up.

At no point in the proposal was I proposing that traffic stops when the central hub is down :)

goob Sat 21 Dec 2013 11:47AM

Does anyone know what it is specifically in the code or structure of the Diaspora network which is causing public post federation to work unreliably?

If it is because Diaspora relies on push notifications to transmit data between pods, could this be solved by allowing a pod to send pull requests to other pods in the network for any data missed when it comes online after some downtime or after being overloaded and unable to receive communications from other pods, or after it is brought online to the network for the first time? I propose a potential solution to this under 'Non-communication' (the third point) in the discussion about adding pull to the push model for federation. While there are concerns about scalability for the other points (getting new pods fully connected and federating tags) in my post, hopefully enabling a pod to send a pull request to other pods when it comes online so it can pick up data (including public posts) it missed while it was offline would help federation of public posts at least in some circumstances where it currently fails.

If we could identify the various factors causing causing federation of public posts not to work properly in different circumstances, it would, I'm sure, be a big help in solving the problem.

Jason Robinson Sat 21 Dec 2013 1:13PM

@goob it's not that public post federation does not work properly, it's that it's not implemented at all. Currently posts just end up on various pods - there is no technical design to say that any public post should be available to any subscriber on any pod.

Personally I want to start prototyping the relay concept - I think once I have a working demo it might be liked ;)

goob Sat 21 Dec 2013 1:24PM

OK, thanks. That definitely sounds like a design flaw!

goob Mon 17 Feb 2014 10:04PM

By the way, @bradkoehn, I think it would be worth making your proposal here in Loomio, as proposals on the wiki tend to get overlooked.

Ryuno-Ki Mon 17 Feb 2014 10:30PM

Thanks, @seantilleycommunit, for granting me writing permission here :)

@jasonrobinson: Say, I'm a bad programmer guy.

Relay receives a post from a pod

Relay already has a cached list of pods and what hashtags they want so relay will deliver post to pods that are interested in one or more of the hashtags in this post. Relay is not for public message keeping - it will delete any posts as soon as they have been pushed out.

Could I misuse your proposal in any way by running a modified version of the code, which does not delete any posts?

It would come in handy, if you could list the information, you want to "store" in the directory/hub, to better judge this proposal.

Jason Robinson Tue 18 Feb 2014 7:56AM

@ryunoki well, since all the posts are public that would go through the relay, what does it matter if someone would? :)

You could already do it, just start saving posts from large popular pods by following a few hundred popular tags.

Actually the relay way you would only get a subset of posts - the more relays, the less posts that go through each relay. Say 5 relays, you would only get approx 1/5 of public posts from opt-in pods, and even then only those with one or more hashtags.

By proposal, the hub would not store anything else than data related to which pods should receive which tags. So something like a dictionary with pod host and N tags that it wants.

Ryuno-Ki Tue 18 Feb 2014 9:36PM

I'm trying to consider worst cases to improve the proposal, Jason. That's all :)

Maciek Łoziński Mon 17 Mar 2014 1:44AM

my federation protocol proposal:
https://github.com/loziniak/diaspora_federation

Rasmus Fuhse Mon 17 Mar 2014 7:59AM

In my opinion the big question is:

Is it better make following of hashtags be part of the protocol (like Maciek's proposal).

Or is it better to make a search-endpoint be part of the federation, which can be used to search for postings with hashtags (and maybe users and other stuff).

Both ways will work, but which way is more reliable and more performant? Is there any third option?

Maciek Łoziński Mon 17 Mar 2014 9:30AM

Can you tell more about the search-based approach? When the search would be performed and how often? By whom and on which servers? What exactly would be searched for?

Rasmus Fuhse Mon 17 Mar 2014 10:59AM

There is more than one possible search-approach. Jason for example would like some central search-server(s) like friendica or redmatrix have. But it would also be possible to have a search-endpoint on each pod that might be called by "neighbor" pods periodically or only if a user requests a search. But those details are not the question at this early stage, I think. The big question is still what do we want: pushing the news or pulling the info?

Maciek Łoziński Mon 17 Mar 2014 1:51PM

Maybe better than wonder and debate, it would be better to try one way, and when it’s not ok, then try another. There are quite a few ideas for developers to choose from. Maybe we should let them decide what is easier/faster to implement?

Jason Robinson Mon 17 Mar 2014 7:02PM

Interesting proposal @macieklozinski - not a bad concept imho. Would love to hear from the more federation-stuff experienced devs.

Although imho I still think federating public posts should be outsourced outside pod software itself. Podmins are already complaining about heavy sidekiq processes - keeping public post federation in the core code would be a big burden to all pods.

Would need to do a simulation to calculate really :P

But I agree that we should just do something :) Any sane implementation would be cool.

Mark Williams Sat 5 Jul 2014 2:05AM

I first tried Diaspora a few years ago by joining a pod with lots of users on it, and loved that right after creating my account I had posts appear in my feed that matched the tags I was interested in. I finally returned to D* a couple weeks ago to help with development, and was disappointed after setting up my own pod how lonely it feels without public posts from the rest of the network being pushed to me! So I'm glad to see so many here who agree that federation of public posts is a very important feature for D*.

Doing this right is not trivial, but I think a DHT-based (Distributed Hash Table) solution might be the right fit. I'm not an expert on the various flavors of DHT out there, but after doing some research it looks like Pastry might be a good choice. In particular, there is already a publish/subscribe application called Scribe designed for it, and an open source implementation called FreePastry. In a nutshell, the Pastry+Scribe combination provides O(log(n)) average routing hops between nodes, high tolerance of nodes entering/leaving the network, automatic load balancing of topic subscription management and notification multicasts across the network, and the ability to structure the routes between nodes in a way that minimizes overall latency/bandwidth (or other relevant metric.) The idea would be that every D* pod would run a node in the DHT network, which would allow the overhead associated with managing subscriptions and disseminating public posts to subscribers to be automatically shared among all the network's nodes.

I am going to run some simulations using FreePastry+Scribe to verify this approach for a "hashtag subscription" feature for D*, but before digging in too deeply I have a few questions:

1) FreePastry is written in Java 5 and its architecture takes advantage of Java threads and asynchronous IO. It might not be a trivial exercise to port this to Ruby+Rails, and I think in any case it would be best to keep any new DHT component cleanly decoupled from the main D* application. What is the development team's stance towards adding a JVM instance (OpenJDK 6/7) as a new tier to the pod design? I think it would complicate pod setup and configuration a little, but probably not too much.

2) FreePastry's implementation uses its own TCP connections for messaging, and UDP for keep-alives. Its architecture is very modular, and so it's probably possible to proxy all its communication through D*'s existing https-based communication scheme if absolutely necessary. But in the interests of performance and clean design, the much better approach is probably to let the Pastry tier handle its own P2P network communications, and let it communicate through a web services API locally with the Ruby+Rails tier and/or directly with the local database for everything else. In terms of security, https isn't needed, since we're dealing with public posts; all that needs to be done is make sure that the payloads carried by Pastry are cryptographically signed. The downside to the separate P2P communication is that it would add to the firewall setup requirements for a pod (although FreePastry already has the ability to use uPnP to open its own ports to the internet, where supported.) How does the development team feel about the idea of requiring additional ports to be opened between pods and the internet?

3) Are there any parts of the database that are designed to be usable as an interface, i.e. not meant to be controlled and accessed exclusively by the Ruby+Rails and Sidekiq tiers? (For example, is it "legal" to write posts directly to the database without going through the Ruby+Rails app?)

With a DHT-based P2P network to leverage, other useful functions could eventually be added in a scalable way to D*, for example (a) load-balancing of requests for relatively large content like images, so that for example an image in a post from a tiny pod that gets wide distribution in the D* network doesn't result in that pod being swamped with requests for the image from the entire network, (b) network-wide features like user search/discovery, (c) helping other D* functions to scale as the network grows, such as propagation of public posts from originators to followers.

Thanks for your time, I appreciate any feedback or advice you may have!

Melroy van den Berg Wed 6 Aug 2014 9:49PM

@markwilliams2 I love the idea, and I'm also researching the problem and possible solutions. See point #2 on my list: https://wiki.diasporafoundation.org/User:Danger89

I hope that we can come in contact with each other to discuss this futher and finally try to implement a working prototype.

Melroy van den Berg Thu 7 Aug 2014 7:34AM

https://www.youtube.com/watch?v=B_HTdrTgGNs

Melroy van den Berg Thu 7 Aug 2014 8:12AM

Let's place is like this: I think to make Diaspora a good decentralized social network, the relational database should be removed and replaced by an Apache Cassandra database (for example), at-least a database which is vertically scalable with high availability & reliability.

This is also known as 'NoSQL database environment'. This means in fact.. that the current project as it is should be rewritten almost entirely (!) to compete against existing social networks like Facebook, Google+, Twitter, etc.

So... Good luck :)

Jason Robinson Thu 7 Aug 2014 8:16AM

@melroyvandenberg I think you will find little support for a complete rewrite - unless you do it yourself ;) You can always fork and replace the DB.

diaspora* started with MongoDB which didn't work for some reason. Do you mind explaining in more detail why you think a NoSQL database would be better than a relational database, for diaspora*?

Melroy van den Berg Thu 7 Aug 2014 8:24AM

@jasonrobinson I try to dive deeper into distributed hash table (DHT), which makes it possible to search users within the network (regardless of the pod). But the same will work both public messages and hashtags, etc.

A non relational database and using hashing (key-value) will make this possible. That is the current problem of Diaspora, the decentralized network isn't really connected, a pod floats in the Internet currently.

Melroy van den Berg Thu 7 Aug 2014 8:29AM

Maybe this site gives you a better explanation of the implementation details of the idea of DHT (funny sentence):
http://www.rackspace.com/blog/cassandra-by-example/

Jason Robinson Thu 7 Aug 2014 8:38AM

That is the current problem of Diaspora, the decentralized network isn’t really connected, a pod floats in the Internet currently.

I think that is the whole point that pods "float" in the internet. I'm quite sure the current model isn't something that would be powerful and scalable enough to take on a network like Facebook - but the diaspora* server isn't really something that is supposed to do that IMHO. It's just server software - and there is no requirement to connect to the wider network of diaspora pods.

To make it really big the server should be just nodes that people can run that automatically enhance the network. Now things are different. Each pod is very independent with absolutely no constraints placed on how to run it or even on what configuration.

The diaspora* network really only federates on the protocol level. What uses the protocol doesn't matter. There is already Friendica (made in PHP) that talks the diaspora* protocol. There is also a Python version (Pyaspora) that also talks diaspora*.

goob Thu 7 Aug 2014 10:22AM

diaspora* started with MongoDB which didn’t work for some reason.

Sarah Mei (a previous developer for Diaspora) wrote this article about why MongoDB didn't work.

makes it possible to search users within the network (regardless of the pod)

It is already possible to search for users on other pods. Melroy, the problems you're encountering (including some of those on your to-do list) may be because you've set up a new pod for yourself very recently. One of the software's problems is the case when a new pod connects to the network for the first time - at first it doesn't have established connections with other pods, so things such as search and following #tags return no results. This is a real problem, and it is something that could usefully be tackled.

It may be that changing the database isn't the answer to your problem: simply running your pod for a while, making connections with other pods, will bring the results you're looking for.

Melroy van den Berg Thu 7 Aug 2014 10:28AM

@goob far enough, however... I got 2 registered persons, who doesn't do anything.. nothing happens with the system, it will not 'connect with other pods', meaning it will not share information among other pods including myself.

That is the problem, the base of a social system should be sharing. That is where DHTs kicks-in, however, this requires a whole different way of thinking

Jason Robinson Thu 7 Aug 2014 10:38AM

@melroyvandenberg what is you d* handle - or add me at jaywink@iliketoast.net

Yeah as @goob said, this is a huge problem. We need some addition to the protocol to support network wide searching (imho hackish and a large burden to the network) - OR a central hub that would be queried (opt-in publishing handle there).

Unfortunately the opposition to any "central helpers" is kinda strong here - maybe that sentiment will change :)

Jason Robinson Thu 7 Aug 2014 10:40AM

Public post federation (pushing) around the network is also one thing - a few proposal have been made to tackle that here and here at least - but no implementation yet.

Melroy van den Berg Thu 7 Aug 2014 10:47AM

More info on wiki:
http://en.wikipedia.org/wiki/Distributed_hash_table

goob Thu 7 Aug 2014 10:55AM

That is the problem, the base of a social system should be sharing.

The basis of Diaspora is sharing. Every pod in the network shares data (where appropriate) with every other pod. The problem you are experiencing - and it is a major problem, which needs solving - is how to get your pod to start connecting and sharing data with enough other pods to receive all relevant content.

I don't think it's a problem with the type of database being used - it's a problem of your pod, new to the network, knowing what other pods are part of the network and how to find them in order to be able to access their databases.

goob Thu 7 Aug 2014 10:58AM

Have a look at our tutorial series on getting started, which will tell you how the network should work, once your pod is connected to other pods.

I made a proposal to help in certain situations, one being when a new pod is added to the network, here. Apparently some of my proposals would not be scalable as the network grows in size, but there are various ideas knocking around related to this problem of the experience of new pods. If you can help solve this problem, that would be fantastic.

Melroy van den Berg Thu 7 Aug 2014 11:38AM

I still think the solution could be DHT, please read the Bittorrent DHT spec about nodes / node ID's and route tables:
http://www.bittorrent.org/beps/bep_0005.html

And read '5.4 Bootstrapping' of the Facebook Cassandra PDF:
http://www.cs.cornell.edu/projects/ladis2009/papers/lakshman-ladis2009.pdf

EDIT, even Patrick McFadin says it:
http://youtu.be/B_HTdrTgGNs?t=1h3m49s

Mark Williams Thu 7 Aug 2014 4:18PM

Glad to see there's still interest in this topic. I've made some progress on a prototype solution for public post federation using Pastry; I'll update this thread when it's ready for testing on real pods.

In case this needs more clarification: the local database for a pod, regardless of whether it's a relational database or a key-value store (and hard experience taught the D* team that a key-value store is not the appropriate choice), is completely distinct from the DHT-based distributed network layer (Pastry, in this case) I'm proposing to use to implement federated public post features. This is a "third way" which avoids the type of naive network search that puts a large burden on the network, while also avoiding the need to introduce central hubs.

IMO central hubs would weaken D* by creating a dependency on special, well-connected, resource-intensive nodes, which would be too expensive for the average person to run. This would erode much of the benefit of a decentralized D* network; it would make it much less democratic, more susceptible to interference and obstruction. It would also be a less robust solution than distributing responsibility for doing federated processing across many/all D* pods.

Jason Robinson Thu 7 Aug 2014 5:05PM

@markwilliams2 looking forward to seeing your update!

Trolli Schmittlauch Fri 8 Aug 2014 9:11AM

I guess @melroyvandenberg 's point is that if we want to federate all public posts, currently this would require every pod to receive and save all public posts locally.

Saving posts locally is ok as long as only the posts from contacts in aspects of users of the pod are saved. This isn't much data, at least on small pods. But saving all public posts requires a lot of more space and resources, doesn't it?

@melroyvandenberg 's idea seems to be dynamically getting the posts from other pods only if requested by the user. I don't know whether this can work, but this is my understanding of the issue.

Florian Staudacher Sat 9 Aug 2014 10:25PM

I'm a little skeptical to introduce a whole new concept like DHT. It may seem appropriate, but I also detect faint amounts of "if all you know is a hammer, everything looks like a nail" ... ;)

So, what about the pseudo PubSub stuff we have going for relaying interactions on posts to/from different pods? I suppose that could be extended into a full-blown PubSub system where users could actually subscribe to contents on other pods...

Jason Robinson Sun 10 Aug 2014 7:10AM

@florianstaudacher the requirement IMHO is that everything is transparent. If a user follows a tag - posts should be in an ideal situation seen from all around the network - just like on Twitter. That is at least something I would see that would be the only way it would work - normal users will just be confused if they have to do additional work to "enable" content from other pods.

Melroy van den Berg Thu 14 Aug 2014 11:50AM

@jasonrobinson Exactly my point.

Maciek Łoziński Thu 9 Oct 2014 11:08AM

@melroyvandenberg, do you suggest that pods should be connected on a database level (Cassandra) instead/in addition to the protocol level?

Melroy van den Berg Thu 9 Oct 2014 12:41PM

@macieklozinski
I think that would be indeed a wonderful solution. Except that you should think about privacy in transporting this data between the pods, similar to the way that privacy is also important of the current data which is sent using the current protocol.

This way we can see other posts from every pod in the world, meaning we are fully finally connected between the pods (just like a big network). Which is your main goal after all, right?

Jason Robinson Sat 30 May 2015 2:16PM

Started thinking of a hacky non-core code requiring solution that would allow users to participate via pod looking relay servers. Terrible in terms of privacy compared to the original proposal, but it would be easier to implement :P

If it worked and would be wanted, then core code could be introduced to make it transparent.

aj Sat 30 May 2015 2:40PM

i kind of like the way diaspora forms communities and groups in sort of an organic way, if there were a common aggregate of all public posts like a global pubsubhubbub or whatever it would maybe kind of change the way of it...

Jason Robinson Sat 30 May 2015 5:33PM

That works fine for medium to large pods, but single user and small pods are lonely places until they share with lots of contacts. One shouldn't be required to follow hundreds of users just to see public posts.

aj Sun 14 Jun 2015 10:46PM

ya starting my pod i more or less had to find contacts on jd and then search for the same contact from my pod to add it, a real pain... would be great if a new pod could get a feed of public posts from one of the larger pods, at least for few weeks after being added to the network

Jason Robinson Sat 11 Jul 2015 4:16PM

Updated proposal specifications - and now with some PoC code

See here: https://wiki.diasporafoundation.org/Relay_servers_for_public_posts

I'll continue working on the code part and hopefully aim to submit a PR to diaspora core for at least post relaying within my summer holiday (so under 2 weeks).

Jason Robinson Wed 15 Jul 2015 8:52PM

hey @jhass @dennisschubert and others. What do you think about the proposed pod settings this stuff would need? I'm kinda ready to implement the last part of the relay ie querying single pods and pushing posts out to them. So I could also do the PR towards diaspora - for the inbound configuration part first, then second the outbound configuration.

Jonne Haß Wed 15 Jul 2015 9:03PM

statistics.json/NodeInfo is about metadata, not protocol extensions, that is protocols shouldn't make decisions based on its output.

I'd say just add a .well-known route, /.well-known/x-diaspora-relay or something.

Jason Robinson Thu 16 Jul 2015 9:47AM

@jhass Fair enough, that might make sense. I'm supposing it should be constructed by the diaspora-federation gem?

But in general, you or other core members don't object to the extra configuration for diaspora as proposed, assuming the change is that it is reflected in .well-known, not nodeinfo?

Jonne Haß Thu 16 Jul 2015 9:56AM

Well, it's needed for the feature to work, right?

Jason Robinson Thu 16 Jul 2015 10:18AM

@jhass as written currently, yes. The other option is each podmin configures their web server to serve a manually maintained file :P

But really, if the configuration would not be accepted to diaspora, as a fall-back I would centralize the idea to the-federation.info and add to it a form for podmins to register their subscription preferences. The send part is more difficult, that would require a patch commit that podmins could pull in if they wanted.

I personally don't see any harm in including the configuration in diaspora itself. I believe this is a good way forward to attempt to fix some of the issues caused by the federation model ie small pods not able to receive enough public posts for it to make sense to set up a single user pod. Also, this would allow pods to customize their scope to specific interest areas, if the solution gets wider adoption within the network.

Jason Robinson Thu 16 Jul 2015 6:47PM

I'll make it .well-known/diaspora-relay. RFC5785 doesn't state prefixing names with x- and it doesn't seem common looking at the registry. I'll submit a registration request also for this .well-known.

Thinking about it, we should really put version and protocol generic information in .well-known/diaspora, not in headers/statistics as currently is done. But not in scope of this :)

Jason Robinson Thu 16 Jul 2015 6:48PM

Or actually, it should probably be .well-known/social-relay, not diaspora-relay. Nothing in the relay concept the way I think about it is diaspora specific, except the initial implementation is geared towards diaspora.

Jason Robinson Thu 16 Jul 2015 7:54PM

{
  "$schema": "http://json-schema.org/draft-04/schema#",
  "id": "http://the-federation.info/social-relay/well-known-schema-v1.json",
  "type": "object",
  "properties": {
    "subscribe": {
      "type": "boolean"
    },
    "scope": {
      "type": "string",
      "pattern": "^all|tags$"
    },
    "tags": {
      "type": "array",
      "items": {"type": "string"},
      "uniqueItems": true
    }
  },
  "required": [
    "subscribe",
    "scope",
    "tags"
  ]
}

Jason Robinson Thu 16 Jul 2015 8:02PM

social-federation can now generate it.

Jason Robinson Thu 16 Jul 2015 8:42PM

Actually, @jhass @dennisschubert do you want the .well-known/social-relay generation in diaspora core or diaspora-federation gem? It's not really part of the federation, more like add-on system to push posts around, so I'm kinda hesitant to push it there. Can I just add a new route/controller/presenter etc, like the current statistics.json is done?

Or should I make a gem well-known-social-relay? :P

Jonne Haß Thu 16 Jul 2015 8:44PM

I guess having it in the core would be okay for now, should be fairly easy to push elsewhere if needed.

Flaburgan Sun 26 Jul 2015 9:58AM

Okay I read the specification and discussed with @jasonrobinson on IRC about it.

First of all Jason, thank you very much for dealing with this important problem of diaspora*.

Although this proposition solves most of the problem, there are some points we should should be careful about:

[Warn] External dependency for a core feature is dangerous. To send messages to other pods is the core feature of a pod. To use external app servers to do that means the network would have a big dependency to a few servers, which can be attacked or not correctly maintained. This looks dangerous to me.
[Warn] On the same topic, to use a centralized list of pods is a potential vector of attack / problem. We're loosing part of the force of diaspora* here.
[Warn] Pods are not equal anymore. Until now, the difference between pods was on side features like services enabled or chat. With this proposition, we would have to explain to user that, depending where they choose to register, they will not have the same content available. This is the opposite of what we always said, and this is exactly the problem we are trying to solve: we don't want users to choose a pod because they have to go there, because that's where the content is.
[Blocking] The interactions on posts that are transmitted by relay are not federated. This point is a blocking point to me. It completely breaks the usage of diaspora* and means a lot more complains about the federation being broken. I don't see the point of displaying a post if I know that only the users of my pod will see my reaction on it. Most of the time, I want to answer to the author of the post.

For those reasons, I think your proposition is not a good solution. I'll try to propose something else soon.

Jason Robinson Sun 26 Jul 2015 11:45AM

For those reasons, I think your proposition is not a good solution. I’ll try to propose something else soon.

This is not a proposition any more. It stopped being one when I changed the original one not to depend on the core so much. Right now it depends on only the carbon copying of posts outwards - even if that which is now in develop was reverted I could do a single commit patch which podmins could pull in if they want.

So this is pretty much live now, just not fully functional. I already see diasporapr.tk sending posts out to the relay :) I'll push the latest changes to the relay live early next week so posts will be relayed for the first time and start real world testing.

[Warn] External dependency for a core feature is dangerous.

It's not a core feature. The core feature is to NOT deliver posts by design to all pods. And that works and will continue to work.

[Warn] On the same topic, to use a centralized list of pods is a potential vector of attack / problem. We’re loosing part of the force of diaspora* here.

Part 2 would be decentralizing the relays themselves. Initially yes each pod configuring a single relay makes it weaker. But less weaker than pod email delivery or hosting, which is the weakest part of diaspora, users being locked into a single server for life. And since this is not a core feature, like user login is...

[Warn] Pods are not equal anymore.

They are even less equal now. Right now it makes sense to join a large pod, to see many public posts. Setting up your own pod doesn't make sense. Using relays will make pods more equal.
But, the relay will also enable pods to be more strongly themed, for example a pod could subscribe to only linux and open source posts, ignoring all the other stuff.

[Blocking] The interactions on posts that are transmitted by relay are not federated.

Well, the same problem is with reshares. And the interactions can be solved, just have to decide which way to go, to relay them or to only use relays for the initial post delivery. I think only using this in real world will tell which is better. Anyway, it needs to happen before 0.6 is released and also before that the participations bloat needs to be dealt with and the federation tuned to be more efficient. Will be submitting something for both these for consideration.

I don’t see the point of displaying a post if I know that only the users of my pod will see my reaction on it.

Not entirely true. Since a pod which gets a post via a relay will fetch the author contact (by diaspora protocol design), interactions will be sent to the original pod as if the pod had delivered the post. The problem is that afaict the original pod will only relay the interaction as normal, not to other pods that depend on relays. This is how I understand it:

pod A <--- author of post
pod B <--- contact of pod A author
pod C <--- not in contact with pod A author
pod D <--- not in contact with pod A author

So when pod A author sends a post it will be delivered to pod B user directly and pod C and pod D users via relay (assuming both subscribe in this case).

Initial relay concept doesn't relay interactions, so when a user on any of the pods comments:

pod A will receive it
pod B will receive it (since pod A relays it)
pod C will not receive it (unless done from pod C)
pod D will not receive it (unless done from pod D)

But this situation is fixable by defining whose responsibility is to do what. Of course, either the relay should take care of whatever "broken" links it creates OR it should create participations so that interactions flow as they should. Though as said, the current reshare concept also has these kind of bugs.

Thanks for your comments and while I'm looking forward to seeing a proposal to the core that would solve federating all posts to whoever wants them but allow still pods to not receive all posts, I really doubt that kind of solution is doable to the core and it wouldn't even make sense to bloat the core with it.

Flaburgan Sun 26 Jul 2015 11:51AM

I wrote what I have in mind on https://wiki.diasporafoundation.org/Follow_other_pods_tags

I'll now read your answer ;)

Flaburgan Sun 26 Jul 2015 12:09PM

In my opinion, it is a core feature to deliver the message. Currently, this feature is incomplete because it doesn't allow to follow tags on other pods. So, we have to patch the core, not to build another tool to balance its weakness. Related, the fact that pods are not equal now is due to this incomplete federation, not because of a setting. This is really important to me. In the first case, it only means we need to improve the software, when in the second case, it means the equality is broken by choice. A bad thing in my opinion.

Part 2 would be decentralizing the relays themselves.

With the "perfect situation" becoming one relay per pod? And then, to make relays to forward interactions? I can't loose the feeling that we're building another network on the top of the diaspora one instead of patching it here.

Jason Robinson Sun 26 Jul 2015 2:47PM

In my opinion, it is a core feature to deliver the message. Currently, this feature is incomplete because it doesn’t allow to follow tags on other pods. So, we have to patch the core, not to build another tool to balance its weakness.

Well I still disagree - the core doesn't have to be a does everything solution. It's bloated as it is and already takes too much resources to run. Granted, the relay system will increase the load across the network, but it will increase it less than if all the pods did all the work.

Related, the fact that pods are not equal now is due to this incomplete federation, not because of a setting. This is really important to me. In the first case, it only means we need to improve the software, when in the second case, it means the equality is broken by choice. A bad thing in my opinion.

Well, as we are talking about a decentralized place, pods should be allowed to be not equal if they want to be.

I read your tag based proposal and it could be a nice improvement to the core. However, as you note, it would not help in the case of new pods which would still have to do a lot of manual work to register with this and that pod. The relay system only requires a new pod to register with a pod list - and relays could even use many pod lists or even be pod lists themselves.

Also I don't believe this is true:

Every interactions is possible on the posts received with that solution, so answers (comments), likes and reshares will be received by the original pod which created the post and all the others which received it

Assuming you mean that participations would also follow the tags in the post, then this is true only to the point where users don't stop following tags. If the last user stops following a tag on a pod, the relations would stop going through.

All in all, that could be a nice addition to consider for the core (with maybe the addition that only active users tags are considered, not everybody) but IMHO it doesn't solve the broken network problem like the relay does. It only makes the broken network problem less dissipate faster, but the effect is the same for brand new pods. The solution would also be heavier on every single post for post delivery.

Flaburgan Sun 26 Jul 2015 5:20PM

I read your tag based proposal and it could be a nice improvement to the core. However, as you note, it would not help in the case of new pods which would still have to do a lot of manual work to register with this and that pod.

That is true, but it is a different issue in my opinion, this is what I would call "network discovery". It is not only about tags, we can want to find users too for example.

About the tag following problem and my proposition, if the pod knew every other pod on the network, the problem would be solved. So we can choose to solve this by simply fetching the list of pod from the-federation.info, as you propose to do for the relays.

Assuming you mean that participations would also follow the tags in the post, then this is true only to the point where users don’t stop following tags. If the last user stops following a tag on a pod, the relations would stop going through.

Not sure what you meant here. What I meant was, if you write a post about #diaspora from your pod, that I receive it because my pod told yours that it is interested about diaspora*, and then Jonne answers on your post from his pod, I will receive Jonne's answer because your pod knows it sent me the message so it is able to forward Jonne's comment.

active users tags

I don't get what you're talking about?

Jason Robinson Sun 26 Jul 2015 5:40PM

and then Jonne answers on your post from his pod, I will receive Jonne’s answer because your pod knows it sent me the message so it is able to forward Jonne’s comment.

You mean pods would explicitly track who they've sent posts to? I think it works currently the way that contacts are checked through (sharing and shared with) when deciding where to send. I don't think posts "remember" where they have been sent. I might be wrong :)

active users tags

I don’t get what you’re talking about?

Just a small detail. For active tags it makes sense to only look at tags followed by active users. Otherwise a user that logs in once and follows a tag will cause the pod to forever follow that tag. The relay subscription prefs work (if set so) using the 6 month active users.

Jason Robinson Sat 3 Oct 2015 7:14PM

Added some notes and ideas regarding the participations relaying to our Paris board. Would love to discuss at least for some brainstorming.

Richard Decal Mon 14 Dec 2015 7:45PM

Re: Jason's "pods should be allowed to be not equal if they want to be."

I strongly believe that which content any user wants to subscribe to should decided at the user-level rather than the pod admin level. If one user wants to follow basketball posts, and another wants to follow Linux posts, they should make that decision rather than it be imposed on them by some stranger. I don't want to join a pod only to find out the admin severed my access to one of my interests because they don't share that interest.

Jason Robinson Mon 14 Dec 2015 7:56PM

@richarddecal

If one user wants to follow basketball posts, and another wants to follow Linux posts, they should make that decision rather than it be imposed on them by some stranger.

I couldn't agree more. Currently, the defaults are probably not the best, for the diaspora* relay code. Could probably change them before the relay hits "mainstream" in 0.6, currently it's only in development pods.

The defaults are:

inbound:
  subscribe: false
  scope: tags
  include_user_tags: false
  pod_tags:

So, podmin must change "subsribe" to true to enable the functionality and "include_user_tags" to true, if user tags should be collected. I think the latter should be changed to "true" by default.

I'd enable the whole relay functionality for user tags on by default but that would never pass :) And since the relay is third-party stuff, it's prob a good idea to keep it off by default.

Jason Robinson Mon 14 Dec 2015 7:57PM

(also, the code allows mixing, podmin can define tags and still have user tags being subscribed to)

Jason Robinson Sat 9 Jan 2016 1:06AM

My proposals to solve the participations and relay decentralization issues.

Deleted User Fri 22 Jan 2016 12:04PM

If we use relays would that information on which relays are up be available on a site such as podupti.me as that also shows if pods are down (for what ever reason) it could help developers improve the network and identify problems

Alex Stacey Tue 17 May 2016 10:11AM

Hi guys. I'm #newhere :smiley: and don't know much about the existing architecture of d* but I read through lots of this thread yesterday with interest, and have a couple of comments...

If I understand correctly, some proposals involve pushing public posts to pods that have users following certain tags. This seems problematic to me as it ignores past public posts. For example, if a user starts following #privacy and happens to be the first user on that pod to do so, they will only get future posts; they won't be able to look through the history of that tag. I don't think that would be the expected (or desired) behaviour.

The alternative that came to mind (which may well have been suggested already) is that each pod could publish a list of the tags that they have public posts for, and then they could be pulled in when needed. So, using the example above, when the first user starts following #privacy, the pod then (somehow) finds all of the other pods that have public posts for that tag and pulls them in. Something like that also gives more power to certain pods to decide what they want to pull in. Some pods might want to ignore #nsfw for example.

Anyway, just my thinking while reading this thread. Excuse me if I'm repeating what has already been said.

Public post federation

Jason Robinson · Mon 14 Oct 2013 6:28AM

Elm · Wed 23 Oct 2013 7:15AM

Brad Koehn · Thu 14 Nov 2013 12:52PM

Brad Koehn · Fri 15 Nov 2013 3:30AM

Brad Koehn · Fri 15 Nov 2013 11:40AM

goob · Fri 15 Nov 2013 4:48PM

Brad Koehn · Fri 15 Nov 2013 5:11PM

goob · Fri 15 Nov 2013 6:43PM

Brad Koehn · Fri 15 Nov 2013 11:19PM

Jason Robinson · Sat 16 Nov 2013 11:40AM

Flaburgan · Mon 18 Nov 2013 4:09AM

Maciek Łoziński · Tue 19 Nov 2013 8:21PM

Elm · Wed 20 Nov 2013 3:53PM

Maciek Łoziński · Wed 20 Nov 2013 9:43PM

Maciek Łoziński · Wed 20 Nov 2013 10:53PM

goob · Thu 28 Nov 2013 4:04PM

Jason Robinson · Thu 28 Nov 2013 6:11PM

goob · Thu 28 Nov 2013 6:47PM

Jason Robinson · Thu 28 Nov 2013 8:06PM

Jason Robinson · Thu 28 Nov 2013 8:11PM

goob · Sat 21 Dec 2013 11:47AM

Jason Robinson · Sat 21 Dec 2013 1:13PM

goob · Sat 21 Dec 2013 1:24PM

goob · Mon 17 Feb 2014 10:04PM

Ryuno-Ki · Mon 17 Feb 2014 10:30PM

Relay receives a post from a pod

Jason Robinson · Tue 18 Feb 2014 7:56AM

Ryuno-Ki · Tue 18 Feb 2014 9:36PM

Maciek Łoziński · Mon 17 Mar 2014 1:44AM

Rasmus Fuhse · Mon 17 Mar 2014 7:59AM

Maciek Łoziński · Mon 17 Mar 2014 9:30AM

Rasmus Fuhse · Mon 17 Mar 2014 10:59AM

Maciek Łoziński · Mon 17 Mar 2014 1:51PM

Jason Robinson · Mon 17 Mar 2014 7:02PM

Mark Williams · Sat 5 Jul 2014 2:05AM

Melroy van den Berg · Wed 6 Aug 2014 9:49PM

Melroy van den Berg · Thu 7 Aug 2014 7:34AM

Melroy van den Berg · Thu 7 Aug 2014 8:12AM

Jason Robinson · Thu 7 Aug 2014 8:16AM

Melroy van den Berg · Thu 7 Aug 2014 8:24AM

Melroy van den Berg · Thu 7 Aug 2014 8:29AM

Jason Robinson · Thu 7 Aug 2014 8:38AM

goob · Thu 7 Aug 2014 10:22AM

Melroy van den Berg · Thu 7 Aug 2014 10:28AM

Jason Robinson · Thu 7 Aug 2014 10:38AM

Jason Robinson · Thu 7 Aug 2014 10:40AM

Melroy van den Berg · Thu 7 Aug 2014 10:47AM

goob · Thu 7 Aug 2014 10:55AM

goob · Thu 7 Aug 2014 10:58AM

Melroy van den Berg · Thu 7 Aug 2014 11:38AM

Mark Williams · Thu 7 Aug 2014 4:18PM

Jason Robinson · Thu 7 Aug 2014 5:05PM

Trolli Schmittlauch · Fri 8 Aug 2014 9:11AM

Florian Staudacher · Sat 9 Aug 2014 10:25PM

Jason Robinson · Sun 10 Aug 2014 7:10AM

Melroy van den Berg · Thu 14 Aug 2014 11:50AM

Maciek Łoziński · Thu 9 Oct 2014 11:08AM

Melroy van den Berg · Thu 9 Oct 2014 12:41PM

Jason Robinson · Sat 30 May 2015 2:16PM

aj · Sat 30 May 2015 2:40PM

Jason Robinson · Sat 30 May 2015 5:33PM

aj · Sun 14 Jun 2015 10:46PM

Jason Robinson · Sat 11 Jul 2015 4:16PM