Yesterday I helped someone who was seeing a lot of "server has gone away" error messages on his website. While investigating this problem, I noticed several things amiss, which appeared to be related but really weren't. The biggest measurable sign was
PLAIN TEXT CODE:These two status variables are actually unrelated (see the manual page that explains them). The first was related to the errors the client was seeing: the server was closing inactive connections after a while, and I fixed it by increasing the wait_timeout configuration variable.
The second error does not indicate that an active connection is closed at all. Rather, it shows that a connection cannot be made for some reason. Perhaps it's networking, or perhaps there's an issue with permissions or something else. The first thing I did was look for packet loss between the database server and the web server; the network appeared to be working fine.
With that ruled out (at least, to my satisfaction) I turned to tcpdump to see what was happening with these connections. I ran the following command in one window of my screen session, so I could see when a connection was aborted:
PLAIN TEXT CODE:And then I started tcpdump in another window:
PLAIN TEXT CODE:After I saw an aborted connection, I cancelled tcpdump and looked at the resulting file. Inspecting the session with tcpdump -r showed that there was a complete TCP session; nothing bad was happening at that layer. So I used the strings utility to look at the text sent in the packets:
PLAIN TEXT CODE:I've anonymized the offending IP address. However, I checked the server's grant tables and indeed. that IP address (which is a machine in the local network) is not allowed to connect.
I don't actually use tcpdump much, but this was a fun little exercise that I thought I'd share with you.
Entry posted by Baron Schwartz | No comment
Add to:
|
|
|
| 
We’ve discussed today how we should implement MySQL Version advisory in mk-audit tool. One obvious questions was to look at the end of life - it is often bad idea to run MySQL versions past end of life as even security bugs may not be fixed in these (though do not get paranoid, if you’re running MySQL in isolated environment the risk may be low).
So how does EOL schedule looks ?
MySQL defines Active Lifecycle and Extended Lifecycle for release where first one is 2 years since initial GA release and second is further 3 years of life in “critical bug fixes only” mode with releases available for premium (Silver+) Support offerings.
For MySQL Community users this means only releases within Active Life Cycle will be made. For example MySQL 4.1 had end of its Active Lifecycle in the end of 2006. and indeed Latest MySQL 4.1 available for the public is 4.1.22 while as Manual Says there were number of further releases with last one in March 2008 containing fixes for security and critical bugs.
It is also worth to note even though MySQL 5.0 successor (MySQL 5.1) is still not released as GA, MySQL 5.0 Active LifeCycle will end in end of 2008, unless there are changes means. If same policies as of MySQL 4.1 are followed we’ll soon see stop in MySQL community releases of MySQL 5.0 most likely before MySQL 5.1 will proven MySQL 5.0 replacement.
There is no blame on MySQL - it is no fun to support these old versions both for Support team (remembering these all old versions limitations) and for development team, and it costs, so somebody has to pay for this and this is exactly what premium MySQL Support levels are for.
My main point is - make sure you understand MySQL Release Policy and so what to expect whenever you’re MySQL customer or community user.
Shameless Plug: I guess hundreds of Percona customers are reading this blog so I should say how Percona treats old versions. We obviously recommend to upgrade when it makes sense while at the same time we have no restrictions in terms of supported versions. If customer chooses to run older version he may have more problems and these may take more time to deal with, so the bill would be higher. We are also happy to provide builds based on updated trees and backport fixes from the newer releases if MySQL has chosen not to backport bug because of its severity. We believe in freedom of choice.
Entry posted by peter | No comment
Add to:
|
|
|
| 
After my previous post there were questions raised about Index Merge on Multiple Indexes vs Two Column Index efficiency. I mentioned in most cases when query can use both of the ways using multiple column index would be faster but I also went ahead to do some benchmarks today.
I'm using couple of simple tables:
PLAIN TEXT SQL:I have populated this table with random data for i and j having both of them having 1000 of distinct values, independent on each other. I also created couple of other tables with same data just with really low cardinality with i and j having just 3 values each. The table contained about 18M rows though was small enough to fit in the systems memory.
I've benchmarked simple queries using where clause which covers multiple columns:
PLAIN TEXT SQL:As some of them there way too fast if run once I ran them multiple times and measured time appropriately. To remove the overhead of starting MySQL etc from equation I also measured execution of "SELECT 1" query using same script and subtracted this time from result in the table.
time for ((i=0;i<100;i+=1)); do mysql test -e "SELECT sum(length(val)) FROM t1000idxmerge WHERE i=2 AND j=1"; done > /dev/null
In the result table I compute per query results and present results in milliseconds.
| Query | 1000 - 2 indexes | 1000 - 2 columns | 3 - 2 indexes | 3 - 2 columns |
| Q1 | 20 | 0.2 | 6940 | 2530 |
| Q2 | 25 | 0.3 | 7400 | 7500 |
| Q3 | 25 | 0.3 | 7200 | 3830 |
| Q4 | 70 | 3800 | 4700 | 4700 |
| Q5 | 25 | 2980 | - | - |
Note1: Q1 will not use Index Merge technique for low cardinality table but instead pick to do single index scan. I'm not aware of the optimizer hint which would allow to force index merge as you can do with index accesses in general.
Note2 Q2/Q3 can't use Index Merge however as it is currently implemented so they would use single index range scan. The 2 indexes however benefits to Q3 because it can only use first keypart of index (j,i) to resolve BETWEEN part of the clause which may not be very selective.
Note3 You may be surprised why 2 column index is faster for Q3 in case of low cardinality even though MySQL can't use index well. You're right MySQL can't and MySQL does not - Full table scan is performed and in this case turns to be faster than scanning 1/5th of the table using index. Also Full Table Scan is preferred for Q4 in all cases but in case of high cardinality multiple index configuration.
Note4 Q5 was just run on high cardinality tables to show what difference large BETWEEN can make.
Conclusion: For benchmarked queries we can see Multiple Column index beats Index Merge in all cases when such index can be used. It is also worth to watchout a MySQL may decide not to do Index merge (either intersection or union) but instead do full table scan or access table picking only one index on the pair.
Entry posted by peter | No comment
Add to:
|
|
|
| 
Quite common beginners mistake is not to understand how indexing works and so index all columns used in the queries…. separately. So you end up with table which has say 20 indexes but all single column ones. This can be spotted with a glance view. If you have queries with multiple column restrictions in WHERE clause you most likely will need to have multiple column indexes for optimal performance. But wait. Do not go ahead and index all combinations. This would likely be poor choice too ![]()
Entry posted by peter | No comment
Add to:
|
|
|
| 
So you need to design highly available MySQL powered system… how do you approach that ?
Too often I see the question is approached by focusing on expensive hardware which in theory should be reliable. And this really can work quite well for small systems. It is my experience - with quality commodity hardware (Dell,HP,IBM etc) you would see box failing once per couple of years of uptime which is enough to maintain level of availability needed by many small systems. In fact they typically would have order of magnitude more availability issues caused by their own software bugs, DOS attacks and other issues.
However as your system growths the reliability goes down. If you have 100 servers with each failing every 2 years this is about a server a week which is bad and if you’re into thousands and tens of thousands of servers server failures are becoming common place so it is important to make sure failing server does not affect your system and also what you can recover from server failure easily
So you should assume every component in the system can fail (if it is Server,Switch,Router,Cable, SAN) etc and you’re ready to deal with this. It does not mean you always have to ensure you stay fully operational after any failure but at least you should understand the risks. For example you may want to choose to keep single Cisco router because it has its own internal high availability on the component level which makes it extremely unlikely to fail, because you have 4 hour onsite repair agreement and because it is just freaking expensive. Though may be redundant less expensive systems could be better choice.
I would highlight again every component can fail it does not matter how redundant it is inside. The SAN is very good example - I’ve seen Firmware glitches causing failure in the SAN which was fully redundant on the component level. It is not every hardware component but also any code may fail as well. This is actually what makes your own code often the weakest link in availability.
Depending on failure rate you also should be thinking about automation - for frequent failures you want to recovery (like getting spare Web server and putting it online) to be automatic or done with simple manual command. For complex and rare failures you may have less automation - if certain type of failure happens once per couple of years for many evolving systems there is very high chance the old automation tools may not work well (this is of course unless you always test all automated failure scenarios regularly).
So if we’re designing the system so it can tolerate hardware failures should we bother about hardware quality at all ? The answer is yes in particular for classic database/storage systems. Few systems are design with so much error detection and automated handling in mind as Google File System.
In particular you want to make sure Error Detection is on the good level. For example if you’re running the system without ECC memory chances are your data will melt down and you will not notice it for long time (in particular if you’re using MyISAM tables) which can cause the error to propagate further in the system and make recovery much more complicated than simply swapping the server. This is exactly one of the reasons many high scale installations prefer Innodb - it is paranoid and this is how you want your data storage to be. This is also why Sun is so proud about checksums on the file system level in ZFS.
What is about RAID when ? As strange as it may sound but you should not relay on RAID for your data safety. There are many ways to loose data on RAID system even if you’re running RAID6 with couple of hot spare. The RAID is just dramatically reduces chance of data loss in case of hard drive failure and this is good because recovering database servers is not fully automated in most cases. Plus there may be system performance impact and (in particular if you use MySQL Replication for HA) the switch to the new server may not be 100% clean with few updates lost. RAID, especially with BBU also makes a good sense to get extra performance out of the box.
Some installations are using RAID0 for slaves - in these cases there are typically many slaves and recovery of the slave is very easy and causes no business impact. This is fine assuming you do the math and the performance gains or cost savings are worth it.
Another good RAID question is if Hot Spare should be used. I normally do not use it because it a large waste, especially as most of systems have even number of drives, so if you’re looking for RAID10 setting up hot spare costs you 2 drives. Having hot spare does not add a lot to high availability - if you have proper RAID monitoring in place and keep couple of spare hard drives on the shelf in the data center we’re speaking about couple of extra hours running in degraded mode. Even if you do not have spare hard drive you can often pool the one from the spare server and have the “warranty man” to replace it instead.
It is also a good question if you need redundant power supplies. In my experience they rarely fail so having redundant power supplies does not increase availability when it comes to hardware failures that much and so if you just look from this angle it may be justified only for the most critical servers. Do not forget redundant power supplies also increase server power usage a bit. Redundant power supplies however are helpful if you have multiple power feeds, so server can stay up if one of the phases has a power loss. Another benefit is - in redundant power supply will often allow to do some power work (like moving server to different circuit) without downtime which may be or may not be something important for you.
Finally I should mention about spare component. These are paramount if you’re designing highly available system. Having spare drives on the shelf, spare switches, spare servers (which are same as better as servers which are in production) is paramount. It is important promotion happens easily and there are no performance gotchas (ie 8 core server can be slower than 4 core with MySQL). It is best if you just put couple of spare servers in each purchase batch so they are absolutely same configuration but I know it is not always possible. Dealing with spares is yet another reasons to avoid the “server zoo” and have limited set of purchased configurations which are reviewed yearly (on other regular interval) rather than finding different best configuration each week.
Having spare servers also means you often do not need most expensive support agreements and Business Hours Monday-Friday is good enough for you - you’re not waiting for support for production anyway just fall back to another server and use it. Of course you can imagine cases when problem could affect all servers of the same type but it is not that frequently seen in practice.
To avoid multiple servers failing at the same time it is of course important to QA/Stress test servers before you put the load on them. I’ve seen multiple cases when something would go wrong and all servers of same configuration will experience the same problem. Proper QA/Stress test reduces the chance of this but you better to be testing with load similar to what you expect in production.
Requirement to have Spare hardware is also the reason why commodity inexpensive hardware is often better choice. If you have couple of $1M in production you need another $1M server as a spare and this is expensive. If instead you have 10 pairs $10K boxes having couple of spares would cost you only $20K plus I found it in many cases much easier to convince “finance” people to buy something cheap which is not used most of the time when to spend a lot of money on the server which will be where sitting doing idle.
How many spare servers you need - you would see it in practice. As I mentioned at least one for any hardware class you have. If you have many failures you need more of course. You may also decide to keep more spare systems when you can use them to help capacity management, especially if you have multiple applications which do not share hardware but share the data center. You may have “spares” to provide extra on demand capacity for web servers or memcached quite easily, or say increase number of slaves if you have unexpectedly high number of reports launched by users etc.
Entry posted by peter | No comment
Add to:
|
|
|
| 
Today I worked on rather interesting customer problem. Site was subject what was considered DDOS and solution was implemented to protect from it. However in addition to banning the intruders IPs it banned IPs of web services which were very actively used by the application which caused even worse problems by consuming all apache slots which were allocated to the problem. Here are couple of interesting lessons one can learn from it.
Implement proper error control In reality it took some time to find what was the issue because there was no error reporting for situation of unavailable web services. If log would be flooded with messages about web services being unavailable it would be much easier to find.
User Curl PHP Has a lot of functions which can accept URL as parameter and just fetch the data transparently for you. They however do not give you good error control and timeout management compared to curl module. Use that when possible it is easy. You can implement your own class to fetch required URL with single call while having all needed timeout handling and reporting to match your application needs. If you’re using PHP functions make sure default_socket_timeout has proper value or set it per session.
Set Curl Timeouts Set both TIMEOUT and CONNECT_TIMEOUT as these apply to different connection stages and just setting timeout is not enough.
Beware of PHP sessions “files” handler I already wrote about this topic, but when troubleshooting this all takes another angle. Default file handler means file gets locked while PHP request is being served. In this case because of network stall request could be taking 100+ seconds. Users are inpatient and do not wait so long pressing reload multiple times… which just adds to the list of users waiting on session file lock. This not only makes apache slots consumed at much higher pace but makes it harder to find what exactly is causing the lock because most of offending processes you can find from apache “server-status” will be just waiting on file to be unlocked. I used “gdb” to connect to the process showing high number of seconds since start finding where it is stuck. If it is somewhere in curl module (or mysql - waiting on long query to come back) - this is our query if it is waiting on the session file lock we can get that file and use fuser to see what other processes are using that files - these would be either waiting on locks or owning the lock and so one of them is the process we’re looking for. Things are much easier with say memcached session storage - this does not cause any locks for parallel session use so only the process which actually stalls waiting on external resource will show high number of seconds since request start.
Entry posted by peter | No comment
Add to:
|
|
|
| 
There are quite a few “tuning primers” and “my.cnf generators” and “sample my.cnf files” online. The ultimate tool for generating an optimal my.cnf is not a tool. It’s a human with many years of experience, deep knowledge of MySQL and the full application stack, and familiarity with your application and your data.
I don’t know exactly the percentage, but quite a few of the servers I take a look at have been “optimized” with some tuning primer or question-and-answer script that spits out “optimal” parameters for my.cnf.
Most of the time these servers are far from optimal. Sometimes the my.cnf parameters are extremely wrong, to the point of causing a severe performance penalty.
If it were as easy as writing a tool to do this, don’t you think Maatkit would have mk-optimal-mycnf already? In my opinion — as someone who knows very well the complexity of creating a good my.cnf — it’s practically impossible. Much harder than syncing data, or manipulating a replication hierarchy, or any of the other things Maatkit can do already. And I doubt I’ll ever even feel motivated to try creating such a tool.
Don’t bother with scripts. Don’t waste your time with most of the advice you see on the web in forums — much of it is fundamentally wrong, even when it seems to come from an informed source. Don’t put too much faith in the my.cnf samples that come with your operating system; many of them have very bad advice in the comments, such as instructing you on how to set up replication in ways that guarantee breakage.
If you want solid advice, ask someone who knows what they’re doing (and can prove it). Or buy our book.
But even more fundamentally, you should not focus so much on my.cnf. It is not the be-all and end-all of performance. Tuning your server settings has far less impact on performance than tuning your schema, indexing, queries and — you guessed it — thinking deeply about your application architecture. Server settings are a distraction and a waste of time for most people.
Most my.cnf files I see only need minor tweaks, which give only so-so performance improvements. Tuning my.cnf only helps a lot when my.cnf has extremely bad parameters. The kind you’ll get from tuning primers and automated my.cnf optimization scripts.
Entry posted by Baron Schwartz | No comment
Add to:
|
|
|
| 
Today I was working with the client loading data to MyISAM tables at very high rate. Hundreds of millions rows are loaded daily into single MySQL instance with bursts up to 100K of records/sec which need to be inserted (in the table with few indexes). It was good not all records had to go to the same table and so in theory using multiple thread to do inserts in multiple tables would not be bound by table locks and would be able to use multiple cores efficiently to get good insert rate… or so it seemed.
In reality inserting in parallel into different tables when indexes fit in memory results in very bad contention causing hundreds of thousands of context switches per second with far less work done than one would hope. As we already discussed
MyISAM key cache has serious mutex contention issue as there is global mutex which is held for the time of key_block being copied from key_cache to the thread local space.
Happily MyISAM allows you to create multiple key caches
. We use “keycache per table” this case and caused number of context switches to drop almost tenfold and performance almost doubled.
Another interesting result was - adding PACK_KEYS=0 to most actively updated tables almost doubled throughput again (and I would expect to see it more than double if all tables would be converted with this option. This is of course because some of indexes on the tables were on CHAR/VARCHAR columns. Honestly I did not expect so large difference for Inserts. I have already wrote about very large impact on joins this option has but I did not expect modification of packed keys would be so much more expensive.
After doing such changes we got insert rate to MySQL close to 200K rows/sec using standard multi value inserts which is pretty good number for indexing tables, especially considering application was doing some updates along the way too.
Entry posted by peter | No comment
Add to:
|
|
|
| 
I should be faster with publishing slides but things are how they are. The slides from my OSCON2008 talk are now published at Percona Presentation Pages.
Enjoy ![]()
Entry posted by peter | One comment
Add to:
|
|
|
| 
Quite commonly in the applications you would need to use some kind of “status” field - status of order - “new”, “confirmed”, “in production”, “shipped” status of job, message etc. People use variety of ways to handle them often without giving enough thought to the choice which can cause problems later.
Perhaps worst, though quite common thing is to define such field as VARCHAR(255) . Even though the stored value is often short the full specified length can be used for internal processing, such as when creating temporary table or sorting.
f
Before we go to the list of variants which can be used lets talk about two important types of these status fields. First is BOOLEAN or YES/NO type. You would frequently see it as columns VISIBLE, DELETED , CAN_ACCESS etc. In case things are as easy as YES/NO using TINYINT UNSIGNED NOT NULL is quite a good idea, though using ENUM is also fine choice as you’re not planning to change allowed values for such field anyway.
For status fields with more than two values there are commonly following choices:
VARCHAR(N) - This is the most straightforward approach though it is also most inefficient one from performance standpoint. However how much of performance are you going to lose? This depends on number of rows as well as their average length. If you have 1000 of rows you would not save much of space by using TINYINT instead of VARCHAR. Same if you have 1000 bytes per row using 10 byte VARCHAR(10) will not waste too much space. The other problem you may run into is slower JOIN performance especially for MyISAM Tables. Though it is solvable as well.
The benefit of using this solution is obviously self documenting. Values such as “New”, “Shipped”, “Confirmed” are much more descriptive than 1,2,3 and do not need to be separately maintained. However this benefit also comes comes as a drawbacks as it is very easy to store not intended statuses in the system and it is harder to spot them. For example you may have somewhere in the code use “Shiped” value which can cause hard to catch errors. Also if you’re using actual values you may have temptation to display them directly in your application interface which can cause extra problems if you need to support multiple languages. In any case if you’re picking this approach you should make sure the values you use are short. Pick 8, may be 12 characters and use single byte character set to keep the size as short as possible.
ENUM - This is what VARCHAR(N) columns get converted during the process of application optimization. It is very nice as it often allows to save on the storage and get extra performance without application modification. ENUM takes just 1 byte for small sets so it is rather efficient. The problem with ENUM is adding new values using traditional way (ALTER TABLE) causes table rebuild which is not acceptable for many environments. It is possible to hack this around though I would not like to do this as a standard production practice. It is also worth to note ENUM comes with couple of gotchas - first if you store value which is not allowed by ENUM list it will be always converted to ” - empty string, the additional ENUM columns can have. The other gotcha is sort order which is done for ENUM in the order in which values are specified in the ENUM declaration - not according to their string values. So make sure to add values specify strings for ENUM in sorted way to get same sorting as before.
TINYINT - Or you can just use TINYINT which has very small footprint, fast for queries and has no problem with adding values to it, however it has also drawback of integer values not being self documenting and can’t be used in interfaces. TINYINT is a good choice if you can maintain a discipline and have the table which has integer to string mapping
or at least have constants with descriptive names defined, so you do not use numbers in the application directly. I prefer to have the table though because correctly architecting your queries you can use such table with very low overhead while it can become very useful if you plan to extend statuses with further meta data, for example descriptive help message, information when this status is added or flag to mark this is deprecated status value and can’t be used.
In general there are reasons to use all 3 approaches in the applications, just do not do it by randomly picked advice but think what kind of application do you have and what are the most important goals for you. Hope this article will help you to do the right pick.
Entry posted by peter | No comment
Add to:
|
|
|
| 
Wow!. New MySQL Community release - MySQL 5.0.67 is just out which as manual says first community release since 5.0.51b. I just recently complained about community release irregularity and I’m glad to see the new release after all.
I only hope this will not be one time event but MySQL will follow its own promises of regular schedule of source and binary MySQL Community releases.
My even deeper hope, though, is MySQL will finally become rational and will stop walking their customers (users of MySQL Enterprise version) over mine field and allow community to check code changes first because of pushing them to the customers. MySQL Enterprise branch had enough serious problems which have been caught quickly by community testing. This also will mean turning back from “what is easy to sell” to “what customers need” as a product driving priority.
Entry posted by peter | No comment
Add to:
|
|
|
| 
One common question I guess is how much should I see performance improved in case I increase memory say from 16GB to 32GB. The benefit indeed can be very application dependent - if you have working set of say 30GB with uniform data access raising memory from 16GB to 32GB can improve performance order of magnitude by converting very IO bound load to CPU bound, it is well possible to see limited gains - if your working set already fits in 16GB you may not see any significant gains upgrading memory to 32GB. Interesting enough similar can happen for very large working set - for example if your main queries do full table scan of 100GB table it does not matter if you have 16GB or 32GB the load is going to be way too much IO bound anyway.
Interesting enough because of MySQL scaling issues it is also possible to see performance to go down as you increase buffer pool size. Some threads which would be safely sleeping waiting on IO completion are now finding their data in buffer pool so they start to compete on hot latches and performance go down.
Now back to original question - how do we predict the benefit from increasing the memory and so cache sizes ? I typically start by looking at the type of load we’re dealing with. If it is CPU bound and there is little IO wait we typically do not expect to gain much by increasing the memory.
This however have to be watched carefully. Performance does not always stays the same and the goal may not be optimizing average performance. It may be heavy data processing batch job which is IO bound and which runs too slow (and affects other transaction) and may be increasing memory is helpful to solve this problem.
If it is IO bound (high IO system utilization, low CPU usage) one should think about how much CPU capacity is available. If your CPU is 25% busy you will unlikely get more than 4x even if you eliminate all IO completely (unlikely because there is increased CPU overhead going IO path as well), so account for that.
Besides pure CPU based computation you should account for locking. Consider for example bunch of transactions updating single row in the table. Having such workload you would likely see no IO and a lot of CPU idle and not because of internal Innodb limits but because your application logical serialization problems.
So what if we have very IO bound application without serialization issues (say reporting slave) which is very IO bound showing 100% IO subsystem utilization on 5% of CPU usage ? This is the true challenge becomes because MySQL has no tools to analyze working set (we have per query working set in our patches but it is not enough). We have couple of ideas how to do global working set profiling but it should wait for now.
At this point I typically use my intuition to try to guess how much data application to get some ballpark figure and often it is enough.
If you would like to be more scientific there are couple of other things you can do. First - you can test by scaling down the data. If you have data for say 500.000 users on the 16GB server get it down to the half of that and you will often be close to seeing performance 32GB server would have. You however have to be careful and understand how data is used in your application. If you say have data for 10 years and load data for 5 years only to compare performance you may get misleading results if reports are typically done for last few months. Basically in such exercise your goal is load data so the working set would be half of original so it would have cache fit similar to one on the larger system you’re trying to compare to. Using this approach you also should be careful with your estimations and take IO subsystem into account - even with same cache hit ratio more data and more load means there are higher demands for IO subsystem performance.
By far the best method is trying, if you can afford it - just get memory upgrade and see how it affects performance. With many vendors you can get the memory upgrade or the whole system to try and return it back if it does not fit your needs. This approach especially works well if you have many slaves (or many shards) in which case you can see performance or capacity improvements quite easily.
Entry posted by peter | No comment
Add to:
|
|
|
| 
People are asking me if Percona will support Drizzle and what is in general our position regarding this project.
First about Support. We surely will support the customers if they select to run the Drizzle instead of MySQL Server. For us it is same as supporting MySQL Server with custom patches, which we do. In general our Support Policy is very open ended - we would support wide variety of systems, and we’re just being open about our experience with such system and ability to help if need arises.
Will we Recommend Drizzle ? We recommend what makes sense to the customers. If MySQL is not the best choice for the customers we’ll be open about it. Drizzle will need to prove it stable and being better fit for certain group of customers and we will recommend it it in such cases. It is similar to storage engines - will we recommend Falcon or Maria instead of Innodb ? Sure we will, in cases then they will work better. Drizzle is in active development right now and it will take some time before it stabilizes to be ready to be used by wide groups in production.
Will we Contribute to Drizzle ? Our focus with Percona Patchset having MySQL improvements which are ready to be used in production now, which speed things up, improve operations or help analyze performance. We release all out patches as GPL and we would be happy for them to be included in Drizzle or any other MySQL Forks or Patch Sets. Whenever we will have some people actively working on Drizzle remains open question. As customers will be interested in having Drizzle work better on them we surely will do it. We’re also likely to be testing Drizzle to understand sweetspots and problems.
Entry posted by peter | No comment
Add to:
|
|
|
| 
I did not usually go to “Elite” servers on Dell web site but looking at customers system today I went to check Dell Poweredge R900. This monster takes up to 4 Quad Core CPUs and has 32 memory slots, which allows to get 128GB of memory with 4GB of memory chips. This means upgrade to default configuration to 128GB of memory will cost you just $9600 (list price). I’ve been able to configure on a web the system with 8*2.5″ hard drives RAID and 2 CPUs (just as we usually configure PowerEdge 2950) with 128GB of RAM for about $16000. This means talking to Dell Sales rep it can purchases within $15000. This may sounds as a lot but if you’re memory constrained it is cheaper per GB than buying 32GB box for $6000
So am I scale-up advocate ? No. But it is quite frequently systems are designed to have “working set to fit in memory” to perform well and such systems can avoid good consolidation factor for such application, or would allow them to delay sharding.
This box also has 4 CPU sockets which means 16 fast cores and 128GB of memory becomes commodity - Quite a challenge for MySQL to take
I have not had a chance to play with such box myself besides couple of customer production installations but it looks pretty sweet.
Entry posted by peter | No comment
Add to:
|
|
|
| 
We recently examined a customer’s system to try to speed up an ETL (Extraction, Transformation and Loading) process for a big data set into a sort of datamart or DW. What we typically do is ask customers to run the process in question, and then examine what’s happening. In this case, the (very large, powerful) database server was almost completely idle, with virtually no I/O activity or CPU usage. So we looked at the server where the ETL process was running. It was running at 25% CPU usage and was writing some files to disk, but not waiting on I/O.
What’s going on here? Where’s the bottleneck? The process is slow, and neither machine is really doing much work. Why?
Maybe you guessed the network. Nope, not the network either. There was plenty of spare network capacity.
If I told you the ETL machine was using exactly 25% of its CPU capacity, would you guess that it had 4 CPU cores and one of them was running at 100% usage? This is what was happening. The ETL app was single-threaded and CPU-bound.
Of course, we measured the entire process, so we could say authoritatively what was going on. But this problem actually took only a few minutes to diagnose. The point here is to look beyond the database server for what seems to be a database problem. This is why we call ourselves “full-stack performance tuning experts.” We try not to have tunnel vision. This reminds me of another problem I helped debug a few weeks ago — a really slow website was due to a curl call that was hidden in the code, and timing out because of DNS issues.
One of the best things you can do to improve your performance is build profiling into your application, or run the application under a profiler (Google for “profiling <language>”) and find the places where it consumes the most time. It’s especially valuable to profile “external resource calls” such as calls to the database, web services, and so on. These approaches can make it much easier to find the slow parts.
We devoted part of a chapter to profiling in our book. We explain a lot of useful techniques to help you build “light-weight” profiling into the application from the start (a very smart thing to do).
Sometimes our clients believe they already know the source of the problem, but they can’t prove it. They ask us to either prove them right or show them the real problem. The way we do this is to prefer measurements to guesses.
Of course, finding the problem is only part of the battle. Fixing it is another matter. But “how to fix every performance problem” doesn’t fit into a single blog post!
Entry posted by Baron Schwartz | No comment
Add to:
|
|
|
| 
Recently, we added information schema support to Google’s userstats patch.
There are three information schema tables added: user_statistics, table_statistics, index_statistics.
One can now use select * from information_schema.user_statistics along with show user_statistics.
Links:
Download 5.0.62 version
Download 5.1.26 version
Entry posted by Evgeniy | No comment
Add to:
|
|
|
| 
Quite frequently I see customers looking at recovery as on ability to restore data from backup which can be far from being enough to restore the whole system to operating state, especially for complex systems.
Instead of looking just at data restore process you better look at the whole process which is required to bring system to the working state, including data consistency requirements and times. This has to be considered for different data loss scenarios which may happen.
Let us look at simple example - a master with 1TB of database size replicating to 50 servers in 5 different Data Centers via single Replication Relay server in each. Forget the single point of failure for the second and just think what problems we may have to deal with.
First lets look at the master. What may happen to it ? We can have Master having soft crash in which case it will be unavailable for some time but we can get all the data back… or sort of. In practice you have to be very careful such as using sync-binlog innodb_flush_logs_at_trx_commit=1 and only using Innodb tables to be OK in most cases. There are still some edge cases such as modifying meta data stored in MyISAM tables which can get master out of sync with slaves in case of soft crash. Unless you got into one of these rare cases slave should be able to continue after Master is back online.
Do you have to wait for master to recover ? This is where your data consistency requirements come in play. Remember replication is asynchronous so whenever you switch to the slave in case of master failure you may loose transactions. Google semi-synchronous replication patches can help with it… but they are not yet in the stock MySQL. Yet another way is using DRBD to get a standby MySQL server or at least synchronously replicated master binary logs. If you can’t loose any single transaction you’ve can’t simply switch to the slave.
What if you can ? The switch to the slave in this case is not very easy too - all slaves can be on different positions at the master and you need to pick the most up to date to promote. Plus you need to recompute positions as they should be on promoted slave and slave should have –log-slave-updates enabled so it somethat has copy of master logs. In many cases I’ve seen people do not do that and simply point slaves to the starting position of the promoted master - this is dangerous because you’re risking all slaves to be inconsistent withe each other, plus if server was seriously behind you’re risking to get major inconsistence because relay logs will be lost if you just re-point slave. So at least you should wait for slave to process all its relay logs before re-pointing it.
Interesting enough Google has solution for us again which comes as “log mirroring” patches which make sure Slaves has copy of logs as they are on the master.
Now what do you do in case of hard crash this is when the data is lost on the master ? This is when you have master data lost, such as you have RAID or disk failure. Though it also can be things like Innodb corruption or soft crash which you can’t recover promptly enough.
In this case most typically you would plan recovery by switching to the slave (as described) or standby server via DRBD or SAN.
As you can see we never mentioned recovering from backup so far. It will be needed in the worst case of data loss which is trashing the data which gets to all slaves via replication. This can be caused by user or application error or security breach.
What choices do you have in this case ? Your main options are using Backup or Slave with delayed replication (which you already could have set up with mk-slave-delay from Maatkit.
Delayed slave is especially helpful if application can operate with just master as in this case you can switch very quickly (just skipping bad statements and catching up)
The main challenge in such failure is the fact you have many trashed copies to deal with. If you have just one or several small tables corrupted you can reload them. One option is to reload them on the master (and they will be replicated down to all slaves) the faster however (especially if you have many tiers of replication) is to bring all slaves to the same point in time and load data locally with SQL_LOG_BIN=0 set for session.
If the large portion of data trashed you may need to recover full database on all slaves which is best done in binary mode for large data sets. Such global recovery can also put very high stress on your network and backup storage and take a lot of time. It also may be extremely difficult to get the large backup in timely fashion over long distance network, meaning it is best to have local backup (and delayed slave if you use one) in each data center you have.
The complexity of recovery is another “liability” of compex replication tree setup. On the contrary sharded master-master pairs (or master with few slaves) are much easier to deal with.
Recovering the data with replication you always have to keep replication positions in mind. Such if you recover master you need to recover slaves to matching snapshot - either it has to be same state (which is hard to manage) or you need to ensure you understand the position on the master to which backup corresponds to. This becomes more complex if you have complex replication hierarchy as slave only knows its position on its own master not on the “root” master.
Note there are also some solutions based on “Continuous Data Protection” class of backup which can be very helpful to go back in time with your data. One of vendors offering solution for MySQL is R1Soft. Though I have not had a chance to look at it in details.
What is about slave loss ? The slave loss is normally less of the problem. You can reclone slave from the master, another slave or restore from backup. So this is just question of having decent capacity planning (such as being able to shut off 2 slaves and still operate normally), have LVM setup if you want to avoid shutting off slave or master to clone the data and making sure the logs on the master go far back so you can restore from several of backup generations and do point in time recovery.
Timing recovery is also important. Especially for write intensive environments it may take many days to catchup from weekly backups by binary logs so make sure to time it properly.
In the real life environments can be even more complicated - one may use partial replication, replication to different storage engine, add some tables beyond tables which are being replicated which all has to be accounted for for in the process of replication.
It is also worth to note beyond these 3 main recovery scenarios there are number of other cases which you have to deal with (which often can be resolved by doing recovery be one of these 3 protocols, but you can also take as shortcut) - for example you may have master or relay binary log corruption. Master or Slave running out of space, Slave crashing (and loosing its position on master), Replication breakage (or running out of sync) due to MySQL bugs or wrong use.
Interesting enough very few people have their data recovery practices ironed out so they can answer how they would handle at least these 3 data loss cases for each of servers they have deployed. Even fewer have gone beyond theory and have tested the processes or have regular testing in place.
Entry posted by peter | No comment
Add to:
|
|
|
| 
Have you ever seen the replication stopped with message like this:
Last_Error: Could not parse relay log event entry. The possible reasons are: the master's binary log is corrupted (you can check this by running 'mysqlbinlog' on the binary log), the slave's relay log is corrupted (you can check this by running 'mysqlbinlog' on the relay log), a network problem, or a bug in the master's or slave's MySQL code. If you want to check the master's binary log or slave's relay log, you will be able to know their names by issuing 'SHOW SLAVE STATUS' on this slave.
This is relay relay log corruption and you can check details in the MySQL Error log file. The error message describes few reasons and indeed because there is little validation (ie no checksums) in the replication there are multiple reasons for bad event to show up in relay logs.
Really this is only one of various error messages you could see if relay log corrupted. You could also see malformed queries (with some junk), complaining about event to big etc if there is a garbage in relay logs.
If relay logs are corrupted it is surely worth to check what could cause it - it could be network (especially if replicating over unreliable long distance networks), MySQL bugs on master or slave, hardware problems and few others. In any case it is worth investigating.
Investigating is what you do later but how do you fix the problem first ? The important question you need to have answered - are logs corrupted on the master ? If logs on the master are OK you can just run SHOW SLAVE STATUS on slave experiencing error and use CHANGE MASTER TO to re-point replication to Relay_Master_Log_File:Exec_Master_Log_Pos:
PLAIN TEXT SQL:This will purge existing relay logs re-fetch all events which have not been executed yet. Doing this command make sure your master is operational and it still has all the logs needed to re-fetch events.
How would you know if logs are OK on the master ? Well in this case there were probably 5 another slaves which did not have the problem - which means Master is most likely OK. In any case it is little harm to try restarting from the same position - if logs are bad on the master you would get the same error message again and can continue with investigation.
What if logs on the master are corrupted ? In this case you have couple of choices (and you also potentially have multiple slaves to deal with). You can use mysqlbinlog (or you favorite hex editor if mysqlbinlog does not work) to find the next event start and potentially recover "corrupted" event to be manually executed on the slaves.
Skipping around event makes master and slave potentially inconsistent and you should access the risks depending on applications (and on amount of events which were corrupted) you may want to let replication continue from the new position or resync the slaves to the master.
How can you recover the slave ? As all slaves are likely to be affected in this case you can't clone another slave. You also can't use classical method of recovery from backup - because you would need relay logs to roll forward, and they are corrupted. You can either re-clone the data from Master. (This is where LVM or similar techniques can help you a lot) or skip bad events as described and when use Maatkit mk-table-checksum to check what tables are out of sync and when use mk-table-sync to resync them.
Last method works in particularly well in case you can afford to run for a while with slaves which are a bit out of sync, which is quite often better than having just master available (also having extra load of data copied from it).
Entry posted by peter | No comment
Add to:
|
|
|
| 
JOINs are expensive and it most typical the fewer tables (for the same database) you join the better performance you will get. As for any rules there are however exceptions
The one I'm speaking about comes from the issue with MySQL optimizer stopping using further index key parts as soon as there is a range clause on the previous key part. So if you have INDEX(A,B) and have a where clause A BETWEEN 5 and 10 AND B=6 only the first part (A) of the index will be used which can be seriously affect performance. Of course in this example you can use index (B,A) but there are many similar cases when it is not possible.
I have described couple of solutions to this problem - using IN list instead of range or UNION which however require rather serious application changes and also can result in huge IN lists and suboptimal execution for large ranges.
Lets take a look at very typical reporting query which queries data for date range for multiple of groups (these can be devices, pages, users .... etc)
PLAIN TEXT SQL:As you can see from the EXPLAIN this query is expected to analyze over 300.000 of rows which is relatively fast for this (in memory) table but will become unacceptable as soon as you get to do random disk IO.
Note this is also interesting case of EXPLAIN being wrong - it shows key_len=7 which corresponds to the full key while only first key part is used.
Let us now replace the range with IN list in this query:
PLAIN TEXT SQL:So we get same result but approximately 50 times faster. In this report we had just one month worth of data - what if you would have a year ? 5 years ? What if you get say thousands of groups at the same time ? Performing such query MySQL has to build (and do lookups) for all combinations which is 31*10=310 in this case. But if it gets to hundreds of thousands this method starts to break (and newer MySQL versions will stop using this optimization method if there are too many combinations to check).
Instead you could use JOIN to get list of days matching range from some pre-generated table and use the join to retrieve the rows from original table:
PLAIN TEXT SQL:As you can see it does not work while I know I used exactly this trick to optimize some nasty queries.
It looks like equality propagation is working here (note the number of rows for second table in join is estimated same in original query) and we get the range clause on "info" table instead nested loops join - exactly what we tried to avoid.
It is easy to block equality propagation by using some trivial function:
PLAIN TEXT SQL:So we stopped equality propagation but now have another problem - for some reason MySQL decides to only do "ref" on the date only instead of using range on day and list of groups for each join iteration.
This does not make sense but this is how it is.
I also tried to increase cardinality by having all rows to have different group_id and it still does not work.
The trick however does work if you have just one group_id (and in this case you do not even need to trick around equity propagation to make it work)
PLAIN TEXT SQL:For original query form with single group_id query was taking 0.95 sec. The query with BETWEEN range replaced with IN list was instant 0.00 sec same as the query using join with day list table.
So we finally managed to get better performance by joining data to yet another table though why it does not work for multiple group remains question to check with MySQL Optimizer team
Entry posted by peter | No comment
Add to:
|
|
|
| http:
[bugs optimizer ]
View original post
|Add to del.icio.us
| Share
July 31st 2006 was my last day working for MySQL and August 1st I started what later was incorporated Percona with Vadim joining me September 1st as co-founder.
Two years is a significant anniversary for any startup - surviving (and being profitable) for 2 years can be seen as validation of our business model and strategy and we’re quite happy about this.
So what is our strategy ? I left MySQL with idea of building company which will be fair in rewarding their employees for their contribution, in particular engineers which do a lot of heavy lifting in technology companies. I really liked many of Monty’s ideas as he implemented during early years of MySQL (you can see many of these same ideas described in Hacking Companies article). We’re not just like that but we’re very close in spirit which you can describe as lets smart engineers to gather and do cool stuff together.
The second part of our strategy is being fair to the customers and providing them with great service at fair prices. We decided from the start we’re making money as consulting company being for work it takes to deliver service rather than focusing on maximizing leverage by selling software or subscription.
We develop software to be able to provide better services with lower cost for the client. This makes sense because we can help more people and builds efficiency as our competitive advantage.
Third part which is important for us as a founders (and we try to hire people which share our values) is giving back to community. It works as a great marketing vehicle for us but it just feels right. We feel open source software is a great way to give back to community for technology company. We’ve sponsored MMM, Maatkit, Released Innodb Recovery Tools (we probably would have made a lot of money keeping this inhouse, but it just does not feel right to leave people in need without a tool to get data back if they can’t pay), Sponsored some Sphinx development. We also published variety of patches for MySQL. Though our giving back to community does not stop there. On the technical Landscape we try to provide a lot of information via Blog, Forums or Presentations. We also contribute to other worth causes like gathering money for Ivan surgery.
Where do we plan to go ? We’re helping customers building and maintaining high quality applications. Currently our focus around MySQL and surrounding technologies but this is so because it is “pick of the web”. We’re constantly looking at emerging technologies to see what can be used for building large scale web application, which is there core of our interest is. We see what other challenges our customers have and we have consultants joining us with different backgrounds which allows us to provide additional services such as capacity planning, migrations, web layer optimizations, MySQL Customizations/Optimizations etc. We want people having their own great ideas to join us and develop them in entrepreneur friendly atmosphere.
In these two years we’ve grown from 2 person company to company employing over 20 full time employees in Europe and US. We’re still virtual company having no office where people would work.
The MySQL was a great school to show how this is possible.
We’re staying profitable all the time attracting no external money as venture fundings or the loans. This allows us to develop company on our own pace and have no obligations to deliver huge returns to anyone. We believe as consulting company we do not need these to maintain comfortable growth pace without putting undue pressure on our employees and retaining team values.
For us with Vadim the the change was the serious one. As we started delivering high quality services was out main challenge and as engineers this was something we knew pretty well how to do. As the company grew our roles change to include a lot of challenges in organizing administrative sales process, ensuring we’re paid and paying our consultants, managing people and leadership on leading the company. We’re learning a lot as we go and we’re listening to advice of Mentors we can find. We’re also growing team by looking not only for great engineers but also for people with great management and administrative skills.
Yesterday Monty visited us for dinner and I told him it is 2 year anniversary since I left MySQL. He asked us if we’re happy with the choice or have regrets - we have none and looking forward the next two years. Getting your own company up and running is a lot of hard work but is is a lot of fun too.
Entry posted by peter | No comment
Add to:
|
|
|
| 
We prepared RPMs of our release for RedHat 5 / CentOS 5 x86_64 platform.
http://www.mysqlperformanceblog.com/mysql/RPM/RHEL5/5.0.62/
There was question what patcheset includes and if there is manuals.
We have:
Entry posted by Vadim | No comment
Add to:
|
|
|
| 