Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ReactDataTracker to Addons #3920

Closed
wants to merge 1 commit into from
Closed

Conversation

jimfb
Copy link
Contributor

@jimfb jimfb commented May 20, 2015

Internally, there has been serious discussion about exposing an API that allows data reads and writes to be tracked, as an alternative to the Observe API proposed a couple months back. This new API allows React to automatically track a component's dependencies, thereby allowing React to efficiently re-render a component when one of the dependencies changes. Usage is a little more intuitive (and a little more flexible) than the Observable API (also more intuitive if you are using Observables, see #3858, so it's a win-win), but since the data bindings are tracked automatically, it's a little less explicit. For anyone following along at home, take a look at the unit test in this commit for example usage.

Posting this commit/PR mostly to continue/inform the discussion. We aren't ready to merge this quite yet, as the final disposition of this API proposal is still undecided.

Fixes: #3858

@chicoxyzzy
Copy link
Contributor

why not to move observeRead and observeWrite to ReactDataTracker module? This way we'll be able to import only ReactDataTracker without the whole ReactWithAddons

    import { observeWrite, observeRead} from 'ReactDataTracker';    

    class Person {
      constructor(name) {
        this.setName(name);
      }

      setName(name) {
        this.name = name;
        observeWrite(this);
      }

      getName() {
        observeRead(this);
        return this.name;
      }
    }

@chicoxyzzy
Copy link
Contributor

also possibly it'll be better to take map polyfill from core-js

@jimfb
Copy link
Contributor Author

jimfb commented May 21, 2015

@chicoxyzzy Good question. The answer is, you can import only the ReactDataTracker without the whole ReactWithAddons! The methods you will want to look at are startRead, endRead, startWrite, and endWrite, all of which are provided by the ReactDataTracker (to be called before&after reads&writes respectively). The start and end methods are actually a little more powerful/flexible (though a tiny bit more complicated to use). If you call multiple startWrites before the final endWrite, then all your writes can be batched together, so by using the ReactDataTracker directly, you actually get a little more flexibility/control/power to optimize performance!

The implementation of observeRead just calls startRead and endRead immediately after one another, similarly for observeWrite. The reason for adding only two observe functions (a read and a write) to the addons was purely to make the usage a little simpler (less error-prone). They are absolutely trivial to implement in your own code, if you decide to use the ReactDataTracker directly. Keep in mind, however, that ReactDataTracker is currently internal API. We may end up exposing these APIs for a final release (tbd, primary use case would be to support data binding transforms that users might want to apply to their datastores).

Re: core-js polyfill, good advice. I'll look into it and may update the diff. We may end up scrapping the polyfill altogether in favor of expando props, so I'll sync with Sebastian about this before churning the diff too much.

@waldreiter
Copy link
Contributor

@jimfb I like it. 👍

What would happen if during rendering a non-component uses a getter? For example if a component would ask for a greeting:

class Person {
  constructor(name) {
    this.name(first);
  }

  setName(name) {
    this.name = name;
    React.addons.observeWrite(this);
  }

  getName() {
    React.addons.observeRead(this);
    return this.name;
  }

  getGreeting() {
    React.addons.observeRead(this);
    return 'Hello ' + this.getName();
  }
}

@jimfb
Copy link
Contributor Author

jimfb commented May 21, 2015

@cody Great question! The answer is that it does the expected/desired thing. Multiple invocations of observeRead are deduplicated appropriately. Functions can be called recursively, can have intermediaries, etc. It all just works. Also invoking observeRead outside of a render/reconciliation is effectively a no-op, exactly as you would expect.

So you can have your component render function call a helper function like this:

function calculateChecksumOfName(person) {
  return MyMd5Library.calculateChecksum(person.getName());
}

React will still detect the dependency on person.getName() and re-calculate the MD5 checksum when the person's name changes. And you can also use this function outside of React (in your other code) with no problems.

This means that in your getGreeting() function, you don't need to call observeRead, because the read will be observed when getGreeting() calls getName(). But of course, it's not a problem that you did put an observeRead there!

@waldreiter
Copy link
Contributor

@jimfb Okay, you have convinced me. I had misunderstood something. 😃

@gaearon
Copy link
Collaborator

gaearon commented May 22, 2015

Some context would help me understand this better.

How is this API expected to be used in real apps? Is Person just a naive example to illustrate the API? It's been so great living with plain objects and arrays as models, I wouldn't want to go back to fat or mutable models.

Would this API be used by entities like Flux stores? Is raw performance the goal of such a low level API? I'd expect it to be hard to track startRead/endRead correctly in the userland. Does this try to solve the perf problem of deep-but-small updates?

If this corresponds to a store, there's not much granularity there. (e.g. can't track individual entities) If this corresponds to individual entities, how is this compatible with immutable models whose references change all the time?

I think elaborating on this:

primary use case would be to support data binding transforms that users might want to apply to their datastores

would be very helpful.

var person = new Person("jimfb");
React.render(<PersonView person={person} />, container);
person.setName("Jim");
expect(container.children[0].innerHTML).toBe('Jim');

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add additional expect for jimfb

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, good idea, will do!

@jimfb
Copy link
Contributor Author

jimfb commented May 22, 2015

Yes, person is just a native example. In practice, you would likely be firing reads and writes on data that was loaded over a network and managed by a flux store. You should still use props and immutable data whenever possible/practical, but in real world apps, sideways data loading is sometimes necessary (see the motivation for the observable api, which this new api could supersede).

Yes, the start and end variants of observe read/write do intend to optimize performance by allowing the system to do batching. Since these low-level functions are difficult to use by hand, we currently only expose the higher-level observe variants.

The ability to transparently/seamlessly change the granularity is one of the advantages of this API. Flux store implementors can observe reads/writes at any granularity. As long as they are consistent with regards to where they fire reads and writes, the implementor can choose to fire them at the store level, at the 'root lookup node' of some immutable data, or at an individual element level. They can change their decision at any time without affecting the implementation of components that depend on those stores. This plays nicely with immutable data and allows you to observe anything that is mutable. Datastore implementors can change the mutable/immutable boundaries without changing their public/component API.

This API is intended to be easy enough that a user could use it by hand, but the long-term hope is that they would use a data library (analogous to an immutable data library like immurablejs), or that they would use a javascript transform to enhance their datastore code. Internally, we have a transform demo that finds all classes which extend a particular flag class, and automatically adds the calls to startRead, endRead, startWrite, and endWrite. Using the transform makes your code ultra-simple, but is also a little magical, so we are unopinionated and users should do what makes sense for them. It is similar to JSX, in that you get some fancy magic if you use a transform, but you shouldn't feel obligated to use a transform.

@waldreiter
Copy link
Contributor

@jimfb I have another question. If I understand it right, then the ReactDataTracker will add overhead to every render and unmount of all components, no matter whether they use this feature. What is the cost for that?

@josephsavona
Copy link
Contributor

Some extra context: this is analagous to how Relay works.

GraphQL and Relay represent application data as a cyclical graph, because objects in a system are typically interconnected (ex: a user can be the author, liker, and commentator of the same story). But you can't directly represent a graph as a persistent data structure: a single change invalidates the whole graph and little memory can be shared. If nothing is shared between the old and new version of data, there is no way to know what actually changed and what didn't. By extension, there's no way to write an efficient shouldComponentUpdate. Ex:

// Previous data:
user: {
  name: 'Joe',
  posts: [...],
  friends: [
    {name: 'Jim'}
  ]
}

// New data (friend's name changed):
user: {
  name: 'Joe',
  posts: [...],
  friends: [
    {name: 'James'},
  ]
}

At first glance it might appear that posts would be unchanged, since only the friend's name changed. But it's a graph, so all the objects are new. There is no way to know whether the name change affects those posts are not; we would have to re-render everything and find out.

The alternative is to record the list of record IDs that were read during the course of rendering each component, set up a listener at each component for any changes to its set of read IDs, and update the component when the data for those IDs changes (eg via setState). Components whose listener did not fire can return false in shouldComponentUpdate, and the render time becomes proportional to the changes.

That's (in part) what this proposal is about.

  • Note that the alternate, indirect graph representation is a self-referential Map of IDs to Records, where records can link to one/many other records. While this approach allows for immutability & structural sharing, it does not solve the above problem, because we still don't know what part of the graph did and did not change.

@josephsavona
Copy link
Contributor

cc @gaearon ^^

@jimfb
Copy link
Contributor Author

jimfb commented May 22, 2015

@cody No, if you don't use this feature, there is no overhead. Certainly nothing that could show up in any measurement.

}

for (var componentIndex = 0; componentIndex < componentsToNotify.length; componentIndex++) {
componentsToNotify[componentIndex].setState({});
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe use ReactUpdateQueue.enqueueForceUpdate(this)? This seems like it wouldn't work with components with strict shouldComponentUpdate, and will throw on non-Component-descending components.

@gaearon
Copy link
Collaborator

gaearon commented May 22, 2015

@josephsavona Thanks for sharing these details, this proposal makes way more sense to me now.

@kevinrobinson
Copy link

@josephsavona This is interesting, much respect for hashing these ideas out in the open, it's really valuable. :)

I'm surprised that Relay, even though it works with a full graph structure itself, ends up feeding a cyclical structure to components. I thought one of the primary motivations was to make explicit in the component precisely what data should be side-loaded. In this example, why would you be traversing back from a friend like 'Jim' to the current user 'Joe'? I understand the general utility of a cyclical graph, but expected that with being able to more precisely specify the needs of a component, traversing the graph in multiple directions, beyond just top-down from the root node, would have been irrelevant.

Also, if Relay feeds data to components as a cyclical graph, doesn't that also mean it's not serializable without a transformation first? This seems like it would be undesirable, but if so I trust there are good reasons, so just curious to learn more. :)

In general, this seems somewhat different than the overall design of React, where shouldComponentUpdate is an optimization hook. Has performance been enough of concerns on low-end devices that approaches to fine-grained change tracking like this become appealing?

@jimfb
Copy link
Contributor Author

jimfb commented May 22, 2015

@kevinrobinson Relay can flatten the graph because React components will ultimately form a DAG, but then the problem becomes: "how do you know which DAG nodes to update when the source data changes?". It turns out to be the exact same problem as knowing which components to update. Also, in the more general case, you can't flatten the data ahead of time unless you know how deep the component hierarchy is going to look (something that is impossible to know for a general React app).

Regarding serialization of cyclic graphs: Json does not directly support cyclic graphs, but it is possible to implement your serialization on top of json, or using your own protocol/format. Either way, cyclic data is actually pretty common in large applications.

Performance is always a consideration. For instance, internally, we are working on an animation API that allows React to render super smooth animations. This is not possible without super high-performance bindings. But having said that, performance is not the primary motivation for introducing this API; please take a look at the related issues (#3858 and #3398) for background/motivation information.

@kevinrobinson
Copy link

@jimfb Thanks for the thoughtful and precise response!

you can't flatten the data ahead of time unless you know how deep the component hierarchy is going to look

This is the bit that's most confusing to me. I thought each component gave a GraphQL expression to Relay to communicate exactly this information?

Here's a bit of my current understanding, since I'm not 100% sure I'm following so maybe this will help check. :)

I'm picturing Relay has an internal data structure to represent the cyclical graph. I'm assuming it's pure data that have keys/ids that describe circular references.

During render, the render tree produces a tree of GraphQL queries that Relay can merge into a single GraphQL query. In order to figure out what to request from the server, it needs to diff to see what pieces of the current graph is fresh and local, and what pieces the merged GraphQL is asking for that it can't provide. After that, it makes a request for the new bits the components need. When that request comes back, we can transform the previous graph by adding or updating nodes and edges.

From the original tree of GraphQL queries, we know which components will need to be updated based on which ones contain nodes or edges in the merged GraphQL query to the server. Is what you're saying that the operations on these graph structures is too expensive?

Makes sense to look at this for animations 👍, although for the general use cases described in the other threads, this seems like a more invasive change.

@waldreiter
Copy link
Contributor

@jimfb

No, if you don't use this feature, there is no overhead. Certainly nothing that could show up in any measurement.

I only ask so many questions because I really like the ReactDataTracker. Measuring performance is hard, so this might be completely wrong. Here 10k components that don't use ReactDataTracker get rendered with 60fps:

var comp = [];
for (var i = 0; i < 10000; i++) {
  comp.push({});
}

var start = Date.now();

for (var fps = 0; fps < 60; fps++) {
  for (var i = 0; i < comp.length; i++) {
    ReactDataTracker.startRender(comp[i]);
    ReactDataTracker.endRender(comp[i]);
  }
}

var end = Date.now();
console.log(end - start);

With iojs (v1.3.0) it is over 100ms on my computer.
With node (v0.10.31) the polyfill for Map is needed and that costs 8 seconds.

@jimfb
Copy link
Contributor Author

jimfb commented May 23, 2015

@kevinrobinson Wow, you've really been following along and keeping up with all the technologies! This is a great discussion!

You are correct that each component does give (via GraphQL) this exact information to Relay, but incorrect in your assertion that this happens at render; it actually happens statically before the first render, thus allowing Relay to pre-fetch all the information that the component could ever need. As a result, some perfectly legitimate apps can't be described using Relay (eg. a recursive TreeView, where each node is a person and the nodes can be expanded to see the person's friends). Our goal with this API is to solve the Relay use case, but also to provide a more general solution (eg. allow for apps like a recursive TreeView). This is what I was implying when I said that you would need to know "how deep the component hierarchy is going to look".

One could argue that this limitation could be overcome by doing as you originally suggested, computing the query at render time. And this is true, but... suppose you did generate an immutable DAG representing your data as needed by a component. Now suppose your underlying data changes, and you need to provide a new DAG; how do you update the DAG? You want to re-use as many DAG nodes as possible, in order to preserve triple equals equality and thus a fast re-render, so you can't just re-generate the DAG. What you really need to do is track which parts of the DAG need to be "fixed" as a result of the data change... but... that's just as hard as figuring out which components need to be updated. In fact, it's actually the same problem, wrapped in a different box. Rather than have each user of React re-implement the same diffing logic, we can use this new API proposal to allow React to figure it out for you! (As an added bonus, by having React do the diffing at the component level, you can automatically avoid re-rendering all the parents of the modified nodes).

@cody Yes, the final implementation will change substantially before this is ready to be merged. The source code of this PR is mostly for API demo purposes. The comment in ReactDataTracker (right above the first usage of the Map polyfill) explicitly calls out the perf ramifications of using Map (see the first bullet point in that comment). A final implementation will not have a Map polyfill in the hot codepath of render for components not using this API. I can assure you that the performance impact of this API for components not using the API will be asymptotically close to zero before this code ever ships/merges.

@josephsavona
Copy link
Contributor

@jimfb - I couldn't have explained it better myself!

@kevinrobinson thanks for the great questions! @jimfb covered just about everything, but a couple extra notes: My comment above was just to illustrate the problem - ultimately, in a cyclical graph when one thing changes, everything changes - and it isn't meant to describe how Relay actually stores data or provides it to components.

One of the design goals with Relay is to make it so that applications do not have to worry about the data representation or efficient updates; we push this complex work into the framework. The internal data representation is abstracted away (so that we're free to store data however is most efficient) and the data provided to components is opaque: we guarantee that it will match the shape of the query fragment. Whether we provide a circular data structure or a DAG or something else doesn't matter to the application because the contract is the query format.

@kevinrobinson
Copy link

@jimfb @josephsavona awesome explanations, thank you!

it actually happens statically before the first render, thus allowing Relay to pre-fetch all the information that the component could ever need

This is more about Relay in general, but is still a bit confusing to me - how can this approach work with any branching code in the render method? Say there are different types of stories in the news feed, and some stories are rendered as VideoStory while some are AdCarouselStory and others are PictureStory. I'm assuming these are all rendered with different components and require slightly different data, and that this branching can't be done until some data is read about each story. In real life, I assume that the list of stories coming back from the server has been optimized to inline the different data needed for the initial render of each of these different stories. This was what Laney's talk (awesome slides!) seemed to suggest as well, that there would be a single CommentItem that would ask for all the data any of its subtypes might need. I'm curious how doing this entirely statically could scale up to larger component trees though (which isn't to say this approach isn't a great improvement even if it doesn't).

Note that the alternate, indirect graph representation is a self-referential Map of IDs to Records, where records can link to one/many other records... we still don't know what part of the graph did and did not change

As for this particular API, I'm still not entirely following why issues like this are fundamental problems. Maybe an example use case of a component/query that's not top-down and requires the cyclical structure would help? Reading code with actual data structures seems like it'd be the most useful, so I'll also just be curious to follow along and see where you all end up going here. Thanks! :)

@kevinrobinson
Copy link

@leebyron
Copy link
Contributor

leebyron commented Jun 1, 2015

@josephsavona - GraphQL and Relay represent application data as a cyclical graph, because objects in a system are typically interconnected (ex: a user can be the author, liker, and commentator of the same story). But you can't directly represent a graph as a persistent data structure: a single change invalidates the whole graph and little memory can be shared.

This of course depends on how you represent the graph. If you represent it as cyclical direct pointers then you're absolutely correct. If you represent it as an adjacency list then you can represent it as a DAG thus immutably and dramatically simplify the "what changed" story. If you use an adjacency list then the "pointers" to other objects just become keys to the top of the graph again. e.g. if you change your profile picture the link "Lee's friend is Joe" will remain the same, so the "Lee" object will be the same post update, but "Joe" will become a new value with the new profile picture applied.

Now, when you use an adjacency list like this, any React component that "walks an edge" needs to subscribe to updates from the data store as each node in the graph ("Lee" and "Joe" in this example) is an independent path into the data store DAG which can change independently.

This could create a less good API though. I imagine that you might want a getter function for "getFriend" so instead of doing dataStore.get(lee.friendID) you might do lee.getFriend() - you would need a more custom data type to do this though. The more important drawback from this API is remembering to subscribe to updates from the store - this is where @jimfb's data tracker is interesting.

However @jimfb I have one major point of critique for this pull request as it stands: I want to be able to implement this completely in library space - what I would like to do to fulfill the API I described above is pretty different from what you've done in this PR. However since you're baking this PR into the React Core, I won't have the ability to do that. I believe the right way to add the right pieces to React is to add an API to query for the currently rendering component (to query when "reading") and subscribe to unmounts (to clean up).

@leebyron
Copy link
Contributor

leebyron commented Jun 1, 2015

@leebyron
Copy link
Contributor

leebyron commented Jun 1, 2015

@swannodette should help make sure this change makes om less custom since its really similar to ref cursors.

@swannodette
Copy link

Right so I've implemented the general concept that this PR proposes not one but two times now :) The first time with ref cursors and I'm doing so again with Om Next sans cursors. Each time I have done this I've had to supply custom base components (that every Om users must use/extend) that we watch for mount/unmount so we can record what data path they are associated with so that we can perform precise updates later. React provides no reasonable hooks so we've had to go our own way each time.

I have no specific comments about the specific API design proposed, but I agree with @leebyron's assessment that any such facilities should be exposed so that this feature and similar features can be written purely as libraries. I believe this is an area of React that warrants much broader experimentation and innovation and it's not going to happen via private APIs.

@gaearon
Copy link
Collaborator

gaearon commented Jun 1, 2015

This of course depends on how you represent the graph. If you represent it as cyclical direct pointers then you're absolutely correct. If you represent it as an adjacency list then you can represent it as a DAG thus immutably and dramatically simplify the "what changed" story. If you use an adjacency list then the "pointers" to other objects just become keys to the top of the graph again. e.g. if you change your profile picture the link "Lee's friend is Joe" will remain the same, so the "Lee" object will be the same post update, but "Joe" will become a new value with the new profile picture applied.

This is pretty much how I'm solving nested entities with normalizr — instead of reference cycles, I always use IDs and have entity bags appear on the top. It does require additional subscriptions though.

@tgriesser
Copy link

Now, when you use an adjacency list like this, any React component that "walks an edge" needs to subscribe to updates from the data store as each node in the graph ("Lee" and "Joe" in this example) is an independent path into the data store DAG which can change independently.

you would need a more custom data type to do this though.

@leebyron If I understand correctly, this sounds similar to what I've implemented with the "custom classes" Immutable fork. I extend to keep a reference to the "root store" (wrapper for an Immutable.Map) in the inheriting Map constructor, and retrieve the path into the DAG on the custom data type (each custom Map must implement a getPath method)

Now, when you use an adjacency list like this, any React component that "walks an edge" needs to subscribe to updates from the data store as each node in the graph ("Lee" and "Joe" in this example) is an independent path into the data store DAG which can change independently.

In my case, the subscription happens in getInitialState - on mount, the returned state object is iterated and immutable objects implementing this "observable path" convention are subscribed to the path on the root store. When a change occurs at this path, I re-call the getInitialState (expected to be free of side-effects), which looks into the root and calls setState with any updated "root store" values. I've also implemented an "observing" metadata api to only subscribe to specific keys in a Map. Finally, on unmount, all subscriptions into the root store are automatically disposed.

getInitialState() {
  return {
    // subscribes: ['totals', job.id] -> Immutable.TotalsMap
    totals: this.currentJob().totals(),
    // subscribes: ['report', job.id, 'element_costs'] -> Immutable.ReportMap
    report: this.currentJob().report().observing('element_costs')
  }
}

This has worked excellently, but this use of getInitialState is a complete hack for the fact that something like @sebmarkbage's proposal in #3398 doesn't exist. In fact, #3398 sounds like pretty much exactly what I hope might be implemented, I was surprised to see the amount of hesitance at the API.

Also, from @sebmarkbage in #3398:

observe() executes after componentWillMount/componentWillUpdate but before render.

IIRC a more generic hook here (after componentWillMount/componentWillUpdate but before render) was the main piece missing to be able to implement this in library code, and could be a good first step toward more userland experiments as @swannodette mentions.

Perhaps componentWillRender?

@jimfb
Copy link
Contributor Author

jimfb commented Jun 1, 2015

@kevinrobinson In Relay, a component's data needs are specified as a query, and the query may specify deep dependencies based on data (example query: "for each news feed item, get all comments, for each comment, get the author, for each author, get the name and profile image"). Obviously you need to know who commented in order to know which profile images to fetch, but that's why the GraphQL queries are so expressive. And yes, a component needs to fetch all the data for any branch that it will take in render; the programmer needs to figure that out in advance (it is not automatic). That was one of the design decisions made in Relay. But let's move this tangential discussion to another thread; the details of Relay aren't required for an understanding of this API. The key insight is that when underlying data changes, determining the ramifications of those changes on the component tree is non-trivial (and that's what this API aims to fix).

@leebyron Absolutely, great points! Regarding API: In general, I totally agree, it's great to be able to push code into user/library space whenever possible. In this case, however, we thought it through, and there are several reasons we would want this in the core instead of user/library land. One of which is that by managing subscriptions internally, the "side effects" are an internal implementation detail (we keep our options open to refactor later and make the entire core immutable and render pure). Another is that there are optimizations we can build into the core, which are not possible if this is implemented in user-land. Jordan made an internal fb-only post a few hours ago (about timer propagation), which is worth reading. Another is that we might want to use this functionality internally (for things like context? pruning) which would require that this functionality be in the core anyway. In summary, we want to push things into user/library land whenever possible, but it was a conscious design decision to expose this higher level API instead. Feel free to grab me and Sebastian if you'd like to discuss more.

@swannodette points out that he has written this same logic multiple times. Internally to Facebook, we've also seen this exact logic implemented several times (Animations, Relay, etc). That's exactly the motivation behind this API (it is a pattern we see people implementing over and over again, and it's difficult to get it right/optimized).

@swannodette
Copy link

@jimfb to be clear the API so far described is completely useless for Om. We will continue to do our own thing.

@leebyron
Copy link
Contributor

leebyron commented Jun 1, 2015

I'm out this week, but I'd love to meet up and discuss more with you guys. Concretely, what I want from React Core in this instance are the low level API hooks which would allow ReactDataTracker in a generic form. As presently implemented, it would require mutable data which I obviously am concerned about and won't help me build the kinds of APIs I want to support for React + Immutable.js and I believe also will not help @swannodette simplify the development of om for the same reasons.

Specifically the observeRead and observeWrite APIs rely on observing a long lived object reference which is at odds with how immutable values are represented. A more powerful API would be one level lower so the observeRead and observeWrite APIs are part of a library external to React, but the API hooks to enable them are built into React core in a way that could be utilized directly by other libraries. I agree strongly with @swannodette that this is an area ripe for experimentation and we should be supporting tools for enabling that kind of experimentation.

@jimfb
Copy link
Contributor Author

jimfb commented Jun 1, 2015

@leebyron Yeah, I agree, that's ideal, just don't know how to do it without disrupting our other/future plans. With regards to the dependency on long-lived object references... this is easily solved by emitting reads/writes on whomever manages the data (eg. datastore) instead of the actual immutable data objects themselves. This shouldn't be incompatible with immutable in any way. But yeah, let's chat more when you get back!

@swannodette True, any system which tracks dependencies outside of React wouldn't benefit from having React track dependencies (sorta self-evident). In that sense, this API is no better or worse than the alternative Observe API (for your use case; both proposals are semantically equivalent in this way; they both allow components to respond to a new-data event). But as I mentioned above, there are reasons we don't want to expose this as a lower-level hook.

@kevinrobinson
Copy link

@jimfb Thanks for explaining the Relay bits further, and apologies for the tangential questions. What @leebyron was describing is how I figured Relay actually worked, and why I think I was misunderstanding why "determining the ramifications of those changes on the component tree is non-trivial" and asking about the internals of how you're representing the query and data. :)

If the problem here is that Relay is feeding React components large object structures with getter methods, and it is difficult to write an efficient shouldComponentUpdate, this seems like something that Relay should be able to solve outside of patching React's internals to support tracking finer-grained reads.

To propose a naive solution for the Person example here, what if an externalTracker were passed to components, letting them all implement shouldComponentUpdate to false, and then funneling all reads through externalTracker.read(this, 'key') or a sugared-up version like externalTracker.read(this) that could return the full JS object graph for the component, instrumented for reads as you like. The externalTracker can track writes on its own, and then lookup which components read that data and forceUpdate() on them from the outside, the same way you're doing here. And it can periodically clean up to clear references to components that have since been unmounted.

For the more typical case of side-loading from an external store, setting the component's state, and implementing shouldComponentUpdate, at some point the side-loaded data needs to be set as component state. This seems like a natural hook for instrumenting what needs to be tracked for changes, rather than on the entity itself as you've done here. Tracking this in the machinery of side-loading also lets that be split out from render, rather than having to be collected as a side-effect of render.

I believe the right way to add the right pieces to React is to add an API to query for the currently rendering component (to query when "reading") and subscribe to unmounts (to clean up).

@leeb This sounds like an interesting avenue - this and what @elierotenberg suggested in #3398 (comment) sounds like the most appealing direction for encouraging community experimentation. Or @jimfb if this has been a recurring problem inside FB perhaps sharing more stories there would help others understand the motivation.

Another tactical suggestion would be to take what you have here and do the same thing with a fn wrapping the component or just the render method.

trackingRender = function(render) {
  var tracker = new ReactDataTracker(render);
  tracker._cacheValid = false;
  return function() {
    console.log('before render');
    var renderedComponent = tracker.read();
    console.log('after render');
    return renderedComponent;
  };
};

// in your component, wrap `render` or wrap the whole class and inject this tracking code.
render: trackingRender(function() {
  // ...
}

If the desire is to make doing this more first-class, perhaps the more general change in the same spirit as this PR that would allow other libraries to experiment would be to support wrapping all render calls in the entire component tree by passing in a wrapper fn to the top-level render call. That would support side-effecting or transform the return value from render but of course opens a lot of space for foot-shooting. :)

@jimfb
Copy link
Contributor Author

jimfb commented Jun 2, 2015

@kevinrobinson Yep, absolutely. That's a completely valid alternative API (semantic derivative of what Lee was proposing). The issue is that externalTracker and/or wrapperFunction require external mutations and are therefore impure (ie. render performs mutations external to React-core, so we can't refactor React to use immutable/workers without breaking those APIs).

By having React control the subscriptions, all side effects are internal to React (therefore, React can undo them or choose not to apply them, or whatever). render remains, conceptually, pure. This is important because many of our experiments and planned future work will revolve around making the core immutable, rendering in workers with immutable data, aborting renders part way through, etc. This is not possible unless render remains pure and side-effect-free, at least from an external perspective. Those alternatives were rejected for this reason, and other related reasons that prevent core level optimizations.

@mweststrate
Copy link

Hi @jimfb,

As far as I can deduct from this threat, I think we've implemented something quite close to intentions of your PR in the MOBservable library. It allows react components to sideways load data if annotated with a simple mixin (or decorator). Similar to the PR, and similar to Meteor Tracker, it tracks reads during the render phase, but a lot more implicit; especially, you don't need to explicitly call a tracker from inside your data structure, as long as you annotate your data objects. Besides plain values MOBservable also supports observing complete arrays and objects and updates are always processed atomically (without needing flushes) to avoid unnecessary re-renders.

Since, like Meteor Tracker, it doesn't rely on an API exposed by React, it IMOHO more cleanly separates responsibilities, as these reactive data structures are useful independently of the fact whether they are used in a React component. Anyway, if that sounds interesting, here is a diff from rewriting TodoMVC to use reactive structures instead of immutables, and here is an in-depth motivation including a performance analysis.

I referred to this lib in #3398 as well so you might have already seen it. I don't want to be spamming, but I would argue that it is best for react to leave as much problems in userspace as possible, to keep the learning curve small because I think that that is exactly what makes React so awesome (compared to Angular for example ;-))

@jimfb
Copy link
Contributor Author

jimfb commented Jul 13, 2015

@mweststrate Yep, saw it, and read through the docs. Although I haven't played with MOBservable, I agree it looks pretty similar to this proposal for React. I would even go so far as to suggest that the implementation is likely be better than this proof-of-concept implementation, which I wrote up in a few hours (we would fix our implementation before a final release). Conceptually, I agree with you 100%. However, there are reasons such an API really needs to be supported by the React core in order to work efficiently/correctly.

The problem is: either you need to do subscriptions in advance or lazily. Let's explore both options...

If you do subscriptions in advance: you necessarily must oversubscribe to your data, since you need to subscribe to anything you could ever need to read in render. You often get into situations where virtually every component on the page is subscribing to almost everything, and now you're back to where you started (slow performance).

If you do subscriptions lazily: you can solve the oversubscription problem, but now you need to change your subscriptions on the fly based on which mobservables your component is reading during render. This works (it's the solution used in this RFC/PR) but it means one of two things:

  • You are mutating state during render (which is against the rules of React render because React requires that render remains pure).
  • You are the React core, and managing subscriptions transactionally within the core. This allows all 'side effects' to be within the core. Having the subscriptions within the core is mathematically equivalent to having render return the set of subscriptions. Therefore, render remains pure as per the API rules, and everything works.
    This is why such an API, if supported, MUST be supported directly by the core.

MOBservables is a super cool proof of concept, but it is necessarily insufficient because it is only possible/legal to do dynamic read subscriptions if you are the React core. I encourage you to continue iterating and exploring the area, because we pull a lot of our inspiration from the community's work. Having said that, if this is a style you'd like to see us support, let us know so we can prioritize it!

@mweststrate
Copy link

Hi @jimfb,

Thanks for the explanation. I think the only way for subscriptions to perform is to subscribe lazily indeed. This indeed introduces side-effects. I thought about how this fact could make the React abstractions leaky during the design of MOBservable, but I couldn't come with anything since its basically just a short circuit to forceUpdate but doesn't influence the rendering itself directly. So it might not make much difference in practice. Nonetheless I can perfectly understand you want to keep your abstractions pure.

To achieve the desired purity I think however that you could slim down your API a bit. Unless you want to build a complete observable framework on top of React, the API bridges should be as small as possible. In think the pseudo code below could make it possible for external libs to be less aware of the fact that React relies on their observables. I think the following API might fit in more cleanly (but its just a thought experiment)

// register a observable tracker
ReactDataTracker.registerTracker(tracker:ITracker)

interface ITracker {
    startRead();
    endRead():IObservable[];
}

// pseudo code:
reactComponent.render() {
    this.currentSubscriptions.forEach(sub => sub.dispose()); // see note below
    ReactDataTracker.tracker.startRead(); // given there is one, or there might even be multiple different trackers in the system
    render();
    var observables = ReactDataTracker.tracker.endRead();
    this.currentSubscriptions = observables.map(o => o.observe(this.forceUpdate.bind(this)));
}

I think this API would be easier to implement by libs. It uses a general Observable API after all, without needing different observable implementations to be concerned about registering writes. The reads are easier to register as libraries like MOBservable and Tracker already have a central place where reads are tracked. With the proposals above the libraries don't have to support a second way to register reads and writes but can keep their existing subscription management system. Although not all tracking concerns are pulled into React, React itself is now responsible for managing subscriptions. I think there is a similarity compared to the DOM; while not all side-effects happen inside React, but they are at least controlled by React.

On a side note: there is a lot of performance to gain by not disposing subscriptions before rendering; it is cheaper to diff the list of observables first, since the list of observed things stays usually pretty constant over time. MOBservable gained significantly from such an optimization.

@sebmarkbage
Copy link
Collaborator

@jimfb We've talk about an issue internally about how could you use this API to stop listening to a data source based on its value? What's the alternative solution when you can be sure you're not going to need the value anymore? E.g. the timer / alarm scenario. I guess you can never be sure since you should be able to do time travel etc. unless something in state changes.

@jimfb
Copy link
Contributor Author

jimfb commented Jul 13, 2015

@sebmarkbage Yeah, I think that you solve that the same way you would solve this if it were standard javascript and you wanted to stop reading from a stream. You need to have some state variable to indicate when you're done.

For a timer/alarm/server-subscription, which requires an imperative unsubscribe, I think we'd just use a wrapper like in #3858 to be notified of the unsubscription and imperatively dispose of the subscription.

@gaearon
Copy link
Collaborator

gaearon commented Mar 26, 2016

It appears that there was no consensus on this. I’m closing as we are trying to avoid stale pull requests and be more decisive, and discuss in the issues instead. Please feel free to continue the discussion in #3858.

@gaearon gaearon closed this Mar 26, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Alternative to Observe for Sideways Data Loading