Skip to content

Version 2 RFCΒ #18

Open
Open
@DaddyWarbucks

Description

@DaddyWarbucks

Hey all!

This library has moved to feathers-ecosystem. Along with that move, I would like to make some updates and improvements. The library is amazing, but hasn't received the adoption I believe it deserves. I have used it in many production apps with a few different patterns and have landed on what I believe to be some nice additions to the library.

Before reading further, you should be familiar with DataLoader (also see this great video about the source code). You can also checkout the docs directory for some detailed examples of how the current implementation works and its benefits.

First things first, this package should be renamed to feathers-dataloader or feathers-loader. There is already a complimentary (yes, they are complimentary and not competitors) package feathers-batch. So to avoid confusion, this package should be renamed. For the remainder of this blurb, I will refer to the "BatchLoader" as "DataLoader" to reflect this change. Long story short BatchLoader === DataLoader.

The Problem

Using dataloaders requires the developer to instantiate a new class every time they want to use a loader, and these classes require lots of config. This causes the developer to duplicate configuration (mainly local/foreign key relationships), It also causes the developer to know ahead of time which loaders are going to be used in any given request. For example,

const usersLoader = DataLoader.loaderFactory(
  app.service("users"), // service
  "id", // foreignKey
  false // multi
);

const postsLoader = DataLoader.loaderFactory(
  app.service("posts"),
  "user_id",
  false
);

This may not seem painful at first. Its explicit and can offer some great performance benefits. But, as you use more loaders and your loaders get more complicated the problem starts to get unwieldy. Because calls to load() are batched, params must be static. You cannot pass different params to each load. For example,

// A loader to load the post author. We want to load the whole user
const authorsLoader = DataLoader.loaderFactory(
  app.service("users"),
  "id",
  false
);

// A loader to load users onto comments. We don't want the full user and just need
// the name and email
const usersLoader = DataLoader.loaderFactory(
  app.service("users"),
  "id",
  false,
  { query: { $select: ['name', 'email'] } }
);

Bummer, we had to create two loaders because params must be static within the class instance. This problems grows even further when you want to have variable params per load() call. This is not currently possible.

const usersLoader = DataLoader.loaderFactory(
  app.service("users"),
  "id",
  false
);

const query = post.private ? { $select: ['name'] }  : { $select: ['name', 'email', 'bio'] } 
const user = await usersLoader.load(1, { query }) // not possible

The Solution

Create a syntax and pattern that makes using dataloaders easier and more approachable. Specifically, dataloaders should be lazily created as needed. This solution introduces two new classes ServiceLoader and LazyLoader.

The ServiceLoader lazily creates DataLoaders for a particular service. It creates a new DataLoader for any given set of id and params. This allows the developer to create one loader for a service, and then use any combination of local/foreign key relationships and params.

const usersLoader = new ServiceLoader(app.service('users'));

const user1 = await usersLoader.load(1) // creates a new DataLoader
const user2 = await usersLoader.load(2) // uses the same DataLoader as above

const user3 = await usersLoader.load(3, { query: { $select: ['name', 'email'] } }) // creates a new DataLoader
const user4 = await usersLoader.load(4, { query: { $select: ['name', 'email'] } }) // uses the same DataLoader

// The service loader allows for multiple local/foreign key relationships in the same loader
const user5 = await usersLoader.load(5) // uses the users' service default "id" key 
const user5 = await usersLoader.load({ id: 5 }) // define the foreignKey adhoc (same as above bc "id" is default)
const user5 = await usersLoader.load({ user_id: 5 }) // use any key

// Furthermore, you can use variable params
const query = post.private ? { $select: ['name'] }  : { $select: ['name', 'email', 'bio'] } 
const user = await usersLoader.load(1, { query }) // this is now possible!

The LazyLoader is a super simple class that lazily constructs ServiceLoaders.

// use this in app.hooks() before all
const initializeLoader = context => {
  const lazyLoader = new LazyLoader();
  context.loader = lazyLoader.loader;
  // or
  context.params.loader = lazyLoader.loader;
};

// Now the developer does not have to setup any loaders throughout the rest of the request. 
const user = await context.loader('users').load(1)  // lazily creates the ServiceLoader and then a DataLoader

This accomplishes a very convenient, natural, service-like interface. The developer really doesn't need to know anything about ServiceLoaders, DataLoaders, etc, etc. If the developer knows how to use a service, they basically know how to use a loader.

// Remove this in your hooks
const user = await app.service('users').get(1);

// Replace it with this
const user = await context.loader('users').load(1);

Thats the general gist! I wanted to make dataloaders as convenient and approachable as possible and I think this mostly accomplishes that. I recently replaced my older implementations of loaders in a production app with this new version and it was a drop in replacement that went really well!

So some things I need feedback on

  1. Should I remove the default key and only allow passing an "id object"? In an attempt to make this as "service like" as possible, I allowed the load() method to take two different shapes. You can pass just an id, or an "id object".
// Use the service's id that was defined in its options. This makes sense and is convenient.
loader.load(1);

// The developer needs to be able to define the local/foreign key relationship. So they can pass an object with
// exactly one property where the property is the name of the foreign key.
loader.load({ id: 1 });
loader.load({ user_id: 1 });
loader.load({ some_foreign_key: 1 });

// While its "convenient" to just pass the id, it could be a source of confusion, which is what I am aiming
// to avoid...and passing { id: 1 } is not that painful. 
// loader.load(1); // remove
loader.load({ id: 1 }); // must always pass "id object"
  1. Is the extra params concept bad? Any other solutions? The ServiceLoader uses a deterministic stringify function to create keys for the underlying cache map. This also means that the developer cannot put functions into the params. I want to keep the experience as service-like as possible, but it is certainly different to split your params into two arguments. For example,
loader.load({ id: 1}, { query: { $select: ['name', 'email'] } });
// under the hood we stringify this
const key = stableStringify([{ id: 1}, { query: { $select: ['name', 'email'] } }]);

// This will throw an error if you pass functions, models, transactions, etc
loader.load({ id: 1}, { query: { $select: ['name', 'email'] }, user, transaction, loaders, etc });

// So instead, the `load` method takes a third argument `extraParams`. These params are not
// strinigified into the key but are passed to the service. This could be error prone...it could
// return different results on each call even though the key is the same. The developer has
// to be aware of this. In 99% of my use cases I haven't had to use it and when I have it works
// as expected, but there is margin for error there.
loader.load({ id: 1}, { query: { $select: ['name', 'email'] } }, { user, transaction, loaders, etc });

Is there another solution where we combine the id object and query into one argument? The above solution works pretty well and makes sense to me, but I feel like someone out there may have a better solution.

  1. Other cached methods. Should they stay or go?. When building this, I actually built in cached get() and find() methods. These have their use cases, but I don't want the classes to be too convoluted.
loader('users').get(1)
loader('users').get(1) // cache hit

loader('users').find()
loader('users').find() // cache hit

I have found use cases for these. The get() method is almost always easily replaced with load(), but there are some cases where you want to throw an error if id not found, or if the request must be a get and not a find (which load does under the hood) for some hooks/auth reason. The cached find() definitely has value because there are cases where you simply don't have a local/foreign key relationship, but can still benefit from caching

// For example, I have a `directions` service that calls the Google directions service,
// which costs money per request. It also does not have a local/foreign key relationship
const directions = context.loader('directions').find({
  query: { origin: result.origin_location, destination: result.destination_location }
})

Bonus thoughts

  1. I want to show an example of the most complicated and the simplest this can be. I want this to be as convenient as possible, but if the developer needs ALL the options...they have to pass theme somewhere. My goal is simplicity, but at the end of the day those options still have to be available.
// Easy Peasy
const user = await context.loader('users').load(1);
// or...depending on where we land on question #1
const user = await context.loader('users').load({ id: 1 });

// Full config
const user = await context
  .loader('users', { matchBaxSize: 20, cache: false }) // you can actually pass low level DataLoader config here
  .load({ id: 1}, { query: { $select: ['name', 'email'] } }, { user, transaction, loaders, etc })
  1. I want to point out that this isn't all about populating/joining. This concept has lots of potential for any service calls throughout a request. For example, validation, data creation, etc.
// Validate the incoming user_id is valid
const validate = async context => {
  const user = await context.loader('users').load(data.user_id);
  if (!user) {
    throw new Error('Invalid User ID')
  }
};

// Populate users onto posts
const withResults = withResult({
  user: (post, context) => context.loader('users').load(post.user_id)
});

app.service('posts').hooks({
  before: {
    create: [validate]
  },
  before: {
    all: [withResults]
  },
})

const data = { user_id: 1, body: '...' };

app.service('users').create(data);
// So we get lots of performance gains here...by using `load()` in the validation,
// the lookups to the user service are batched/cached. Furthermore, the lookups in the 
// `withResult` are totally free! We are populating the same user as the user_id we validated.
  1. Loaders are most advantageous when we pass them to nested service calls. The common example would be loading a user and comments onto a post, then passing the loader to the comments that further load the commenter.
// use this in app.hooks() before all
const initializeLoader = context => {
  if (!context.params.loader) {
    const lazyLoader = new LazyLoader();
    context.params.loader = lazyLoader.loader;
  }
};

const user = context.params.loader('users').load(post.user_id)
const comments = context.params.loader('comments').load(post.id, null, { loader: context.params.loader });
// We have already loaded some users, so when passing the loader to the comments service
// which will load on their own user_id, the loader is already primed and gets some for free

This gets really exciting/convenient when considering feathersjs/feathers#2074.

Conclusion

Thanks for reading all this! Any and all feedback is welcome! I would specifically like some feedback on the 3 questions above, but anything else is definitely appreciated too.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions