What is the best way to speed up queries in big data? MySQL?
I need to speed up queries that are being run from web applications. These go through many records to pull the correct data. I would like to limit this to only look into the past 3 months, or from a zero date where I know the data is correct up to that date.
Indexes are the key to speed up queries. If you have large tables you need to find the fields you require and create indexes on them. This will speed up the SQL query massively.
You should manage your database in such a way thay you can access data from multiple tables in single query. It is the only way to speed up queries in big database.
As Rory notes - depends a lot both on the nature/amount of data as well as type of queries.
You may not realize this but within SQL databases these can be internally structured in other than 3rd normal form including Dimensional & EAV.
I'm working with elastic search these days - could be a viable nonSQL solution for you - especially if you are working with text.
Rory gave you a good rundown of what you need to do. Know that optimizing SQL becomes a science unto itself. Recall that Rory mentioned avoiding multiple joins. Sometimes it is possible to stage the data, denormalized, in a auxiliary table and run your queries agains that.
I know in the past, I have had to graduate from MySQL to a different database in order to better handle the data volume.
That's really going to depend on the type of data you're collecting and the type of table you already have set up. Also depends on the containing rows of data and how large the queries are. How many gigs of data per million rows and how many gigs in index per million rows?
The short answer to your question though, is yes. The best way to speed up queries would be through the SQL.
I would use something like this to generate limitations by date/user.
FROM (SELECT st.*, MAX(st.effective_date) OVER (PARTITION BY st.user_id) max_dt,
MAX(st.effective_sequence) OVER (PARTITION BY st.user_id, st.effective_date) max_seq
FROM local_table lt JOIN sometable@remotedb st ON (lt.user_id = st.user_id)
WHERE effective_date = max_dt
AND effective_seq = max_seq;
Next, you need to optimize your site to speed up queries. Avoid Multi TVF's. Position a column in an index. Avoid multiple joins in single queries. Don't use non-correlated scalar sub queries. Get rid of unused indexes. Revise your schema. Many steps you can take for optimization to improve SQL performance.