I have read that it is best practice to only return an ID when querying for results, and then populate metadata from the database. Is this true? I am worried about performance.
4 に答える
In my opinion, it is almost always best to store and return the fewest fields possible — preferably just the ID, unless you explicitly need a feature such as highlighting.
Storing a lot of data in your index can have a negative impact on your search performance as your index grows. There is no data that loads faster than no data. Plus, looking up objects by their IDs should be a very cheap operation in your primary data store of choice.
Most importantly, if your application is using an ORM to interact with its data store, then the sheer utility of reusing all your domain modeling consistently throughout your application would be hard to overstate.
Returning values straight from your search engine can be useful. But, short of using the search engine as a primary data store, I would need a very compelling reason to fragment my domain logic by foregoing an ORM.
IMO, If you can retrieve the search results and the data within a single call would be a huge boost to performance in comparison with getting just the ids and making a DB call to retrieve the metadata for the same.
Also, Solr/ES provides in built Caching solutions so the response would be faster for subsequent queries. For DB you may have to use a Solution or probably some other options.
this all depends on your specific scenario.
In some cases, what you say might be true. For instance, Etsy does exactly that (or at least used to do that), they rationale is that they had a very capable mysql cluster and they know very well how to manage it, and is very fast, so Solr returning only the id was enough for them.
But, you might be in a totally different scenario, and maybe calling the db will take longer than storing everything needed in Solr and hitting just Solr.
In my experience Solr performs bad on retrieving results when you either have highlighting on, or the fields you retrieve are very large and the network serialization/deserialization transfer overhead increases. If that is the case, you might be better off asynchronously retrieving these fields from the DB.