June 15, 2015
In episode 42 of Apply Filters we talk through a few updates from our projects and dig into an issue that has been plaguing the WordPress community for some time now: Taxonomy Term Splitting. A recent article by Boone Gorges describes the problem very well and explores some of the potential ways to solve it.
This episode is sponsored by Dreamhost and their new Dream Press 2 managed WordPress hosting platform. Dream Press 2 is different from most managed WordPress hosting platforms in that it gives you two virtual machines (WebVPS and MySQL) for each site you have installed on the platform. For more information, check out dreamhost.com/applyfilters.
We got a really fantastic iTunes review this week from McDunna. This really helps us confirm that we’re on the right track with the show and is exactly what we’re trying to achieve very episode. Keep them coming.
- Updates to Easy Digital Downloads
- Coaching local high school team’s ultimate frisbee team
- WP Offload S3 – started using Toggl for time tracking which was enlightening about where I’m spending the majority of my time
- Just finished up our first company meetup around WordCamp Miami
- WP Migrate DB Pro 1.5 update is in final testing, and the release is coming soon
Today’s Topic: Taxonomy Term Splitting
Post relationships in core
- Potential roadmap for taxonomy meta and post relationships
- Combine `wp_terms` and `wp_term_taxonomy` tables
- Backward compatibility
- Table aliases `$wpdb->terms` and `$wpdb->term_taxonomy` should contain the name of the point to the same merged table
- MySQL view of the removed table
- An update on the taxonomy roadmap (4.1)
- Removed UNIQUE index on `term_slug` column
- Stopped creating new shared terms
- Taxonomy term splitting in 4.2: a developer guide
- Shared terms are split when one is updated
- Fixes: Updating a term in one taxonomy affects the term in every taxonomy
- Breaks plugins/themes that store term IDs
- Eliminating shared taxonomy terms in WordPress 4.3
- All existing shared terms will be split upon upgrade
If you had a plugin that is potentially affected by these issues of taxonomy please let us know. Whether your plugin has been fixed or not, shoot us an email. Thanks.
If you’re enjoying the show we sure would appreciate a Review in iTunes. Thanks!
PIPPIN: Welcome back to Apply Filters, Episode 42. Today, Brad and I have got a few things to discuss, some recent things that we’ve been working on, recent trips, upcoming trips, conferences, etc. We want to have a discussion about some of the changes to the taxonomy term splitting that happened in WordPress 4.2 and some more that’s happening in WordPress 4.3.
However, before we get started, let’s hear a word from our sponsors.
BRAD: This episode is sponsored by DreamHost and their new managed WordPress hosting platform, DreamPress 2. I wanted to learn more about DreamPress 2, so I reached out to someone over at DreamHost who you may be already familiar with.
MIKE: I’m Mike Schroeder, and I am the WordPress platform lead at DreamHost. I work both on developing products for or around WordPress, like DreamPress, and I also donate about half of my time to work on WordPress core and other related community projects.
BRAD: I asked Mike about DreamPress 2 and how it’s different than other managed WordPress hosting platforms.
MIKE: One thing that makes it a little bit different is that we give you two separate virtual machines that just belong to you, both the Web VPS and also MySQL VPS. That Web VPS will automatically scale its resources for the RAM utilization needs that you have.
BRAD: Hang on. There’s a Web VPS, and what’s the other one?
MIKE: MySQL. Yeah, there’s a separate VPS just for your MySQL.
MIKE: And it has, well, you know, MySQL, great hardware that’s specifically engineered to work well with MySQL. That goes along as a companion to your Web VPS. Those resources aren’t shared between those two.
BRAD: And they’re not shared with any other customers or anything. They’re your VPSs.
MIKE: That’s correct, and you get a set of those. You get a set of those for each of your sites.
BRAD: Wow, for each site. Wow!
MIKE: Yeah. Yeah.
BRAD: That’s interesting.
MIKE: If you added two domains, you have four VPSs.
BRAD: There you have it, folks. You get two dedicated VPSs for each site on DreamPress 2. For more information, check out DreamHost.com/ApplyFilters. Now back to our show.
Okay. Just before we start getting into what we’ve been up to, I’d like to read an iTunes review, a really cool one that I found when I was browsing through them. This one is from McDunna, and it goes, “I’ve been programming since 1980 and developing custom WordPress sites since 2009, and still I learn something from every episode of Apply Filters. Brad and Pippin clearly know what they’re talking about, and they share experiences with tools, plugin development, coding standards, and the business of WordPress. There’s enough banter to keep the show lively, but they don’t waste your time going down personal rat holes. I especially appreciate that they post references on their website.”
PIPPIN: Wow. That’s awesome.
BRAD: Pretty cool, right?
PIPPIN: Yeah, it’s great. Also, I like it because it’s exactly what we’re trying to do. And so, to hear that we’re succeeding, at least to some degree or other, is encouraging.
BRAD: Yeah, for sure. What have you been up to, man?
PIPPIN: The last couple of weeks have been really working on a couple of big changes in Easy Digital Downloads for an upcoming version. We’re working on version 2.4, which is slated to be released around the end of June, so here within the next three weeks or so. We’ve got several major changes coming, major improvements, as well as a bunch of little things, but I want to talk about three of the distinct changes that we’re making because they presented really interesting challenges. Two of them presented interesting challenges.
The first one is, we introduced a change to our rest API. EDD includes a simple rest API that allows you to retrieve product data, sales data, customer data, coupon data, and a few other things. It is a read only API right now, but it still allows you to consume information. It’s what powers our mobile apps, so we have an android, and we have an iOS app that allows you to see sales and customer data on your phone or your iPad.
We also have some other apps. We have a reporting app for the iPad. It hooks into the program called Status Board from the company Panic. Then we hae a bunch of other plugins, widgets, etc. that tie into the API to display product feeds, for example, on other websites, kind of like if you would parse in an RSS feed.
We have all of these things interact with our API. The API has become a very, very important part of the plugin. It is foundational to a lot of the stuff that we do.
We realized a major problem with it, so we had a couple of improvements that we wanted to make. It started with a very simple change that we wanted to add to improve the data that’s available in the API for coupons related to a payment. If you go and retrieve sales data, it will tell you any coupons (we call them discount codes) that were applied to that order.
But, what it doesn’t show is it doesn’t show the amounts that were applied for each coupon. It doesn’t say: This coupon gave $5 off. This one gave $10 off. This one gave–, etc. And so, we wanted to make that change.
We realized, however, that we actually couldn’t make that change, even though it was a really, really simple change, because it would break anybody’s device or application that was reading that data in the previous way because we were changing the data format. And so, we realized that we can’t make these kinds of changes to our API because we don’t currently have API versioning.
For anyone that’s don a lot of work with APIs from services like Stripe or PayPal or really anybody who has an extensive rest API, you’re probably familiar with the idea of API versioning where you could have version 1 of an API, you can version 2, version 3, version 4. As the API changes, new versions are released, but the old versions stay available and the data format that’s returned, the structure, doesn’t change.
Well, we realized that we didn’t have a way to do that. We didn’t have this concept of API versioning. We simply had the API, and however we chose to return the data is what everybody got no matter whether you started using it two years ago or yesterday. That was a problem.
BRAD: Right. I’m familiar with APIs being retired too because that happens sometimes.
PIPPIN: Sure, entire versions of them.
BRAD: Yeah. Twitter has retired old versions of its API and–
PIPPIN: Yep. MailChimp.
BRAD: –broken things.
PIPPIN: MailChimp is getting ready to retire version 2 of their API because they’ve introduced version 3.
BRAD: Right, and so is version 2 just not going to work anymore?
PIPPIN: You would have a new version of it. Well, so we realized that we have to have API versioning if we really want to progress our API. At some point we want to add the ability to have a write API as well and not only a read. When we do that, we’re going to have to introduce a new version of the API. A lot of things are going to have to change.
And so, we went ahead and tackled the problem. We said let’s introduce API versioning, but we can’t break anything. Nothing can break for existing people, so everybody that’s currently using the API, they’re using version 1.
When we upgrade, we’re going to introduce version 2. But, version 2 is only going to be used by people that install the plugin after version 2 was introduced. We had to do that, but it also got a little complicated because we had to do things like we had to parse the URLs and see whether or not a version was applied.
If they didn’t supply a version, okay, let’s go figure out what version we need to show them based on whether they installed the plugin before or after versioning was introduced. All of these different things were challenging, but really fun. It’s one of my favorite changes I’ve made to the code base in the last, probably, six months or more.
BRAD: What’s satisfying about it?
PIPPIN: Satisfying is taking a problem, spending a lot of time talking about how to do it, the best way to do it. We ended up sitting down as a team and having a couple of conversations, just in Google Hangouts, to say, “Okay, look. What are our options? We could version the entire API. We could add revisions to endpoints. We could do this. We could not do this. We could just break everything.”
Just having all of these different discussions about what are the consequences, what do we want to do, what are we going to strive to do. That was the most challenging. Writing the code itself was not that difficult, but it was figuring out the approach to take. That’s the big thing that I’ve been doing recently.
Then, not to take up too much more time, but I had two other things that we’re pretty excited for. We’re introducing Amazon payments for Easy Digital Downloads. We’re introducing it as a gateway that’s included with the core plugin, so everybody gets it.
BRAD: How much did Amazon pay you to do that?
PIPPIN: We did actually have sponsored development from Amazon.
BRAD: Oh, nice.
PIPPIN: Which was nice, but it also meant the big advantage to having a sponsored development from them was that we actually get a contact from Amazon to work with. So, if we have any problems, we get to work with them. They work with us to make sure that the implementation is solid, that it matches their standards, etc. And so, when this roles out, it’s going to be really solid.
BRAD: Cool. Did you solicit them or did they come to you?
PIPPIN: They reached out to us, but we had actually been planning on doing it for a while. It just hadn’t happened yet.
BRAD: Right, and then you know they’re going to sponsor your development, so hey, no later time like the present.
PIPPIN: Absolutely. Yeah, and it also gave us a good reason to put it into the core plugin as opposed to making it a paid add-on, which I’m excited for. It’s going to make it so instead of everyone just having PayPal by default, now you’re going to have two options: PayPal or Amazon.
Also, Amazon, as a merchant, as a host, as a payment processor, are really globally trusted really well. And so, it’s going to be a great option for just about anybody that wants to set up a store.
Then the last thing was batch CSV exporting. Our export options in EDD have had problems for a while because they did everything in a single process. If you wanted to go export customers, it just did one big query, translated that query into a CSV, and said, “Good luck,” pretty much.
If you have 20,000 customers and you tried to export it, it failed. And so, for EDD 2.4, we’ve introduced batch processing. And so now you can export 20,000 customers, 100,000 customers, and it all works. That was kind of a fun project to work on.
BRAD: Was it failing because of timeouts, I’m guessing.
PIPPIN: Right. Yeah, because if you try to do everything in one process, you may be able to query that much data from the database, but then to loop through every single one, add it to a CSV, and then download that CSV, it takes time. It takes memory. It takes processing power. While my host, Pagely, who I have a nice VPS through, could handle it just fine, a lot of people would have issues if they’re on a shared host from HostGator or BlueHost or even a pretty decent host. If you get enough data in there, it’s going to start failing.
BRAD: Yeah. Actually, a lot of the managed hosts have this problem because they use the PHP-FPM, an Nginx technology stack. There’s no way really to tell that stack from PHP, like from your batch exporter. There’s no way to tell it, “Hey, I’m going to be taking a while here, potentially, so don’t time me out.” There’s no way to tell it that.
PIPPIN: No, and that was a problem that we ran into a lot, like to the point that we actually had to add a description underneath the export option in the UI that says, “By the way, if you have a lot of data, this might fail. Sorry. Not much we can do for you.”
We fixed this by adding batch processing. It actually also allowed us to do something really cool. Before, because we were limited by timeouts and things like that, we tried to keep our export very simple. We tried to get just the data that we can easily access. And so, we avoided doing complicated join queries or, if we needed to do multiple queries for each row, those just weren’t very good options because they took a lot more processing power to do that.
But now, because we’re doing things in batch processing, we can get any data we want. We can build giant CSVs. We can build really advanced reports because, when you’re doing batch processing and everything is done via ajax, we’re not going to run into those problems. And so, if we need to do one line per batch, that’s fine. We can do that. It ended up working out really, really well.
PIPPIN: Yeah, so that’s enough about me. Brad, what have you been up to?
BRAD: A few weeks ago, I started coaching the local high school ultimate Frisbee, and so that’s been taking up a crazy amount of my time.
PIPPIN: You were a pretty serious ultimate Frisbee player yourself, right?
BRAD: Yeah, I still play. I played in a tournament last weekend, actually. But, yeah, I’m kind of getting too old to be playing at the highest levels now, so I’m kind of in the old guys division now. But it’s still fun. I still like to play, so I’ll be going to nationals this year in Canada.
PIPPIN: Wow. Congrats.
BRAD: Out in Winnipeg at some point.
PIPPIN: It’s funny to hear that you guys have ultimate Frisbee leagues. Where I am, ultimate Frisbee exists, but it’s more of a bunch of friends got together, and we played in a field. There’s no organized Frisbee here.
BRAD: I bet’cha there is, and you just don’t know about it.
PIPPIN: If I go over to Kansas City, which is about three and a half hours from here, a large metropolitan area, then there are some, but not where I live.
BRAD: Right, right. Okay, yeah. I gotcha. I was going to say there’s definitely.
PIPPIN: No, there’s definitely. There are some in the U.S.
PIPPIN: But I think it’s a lot more probably on the coasts, east and west.
BRAD: Yeah. Yeah, for sure. Anyway, I didn’t really think ahead of how long it was going to take me to coach these kids, and each practice is two hours, so two hours twice a week, four hours a week. And this is during work time, right, like after school until 5:00.
I’m just like: this is eating my time. And it’s like, oh, man. It’s a good thing it’s over. School is almost over, right, so it’s going to be done soon.
It’s just funny. I hadn’t planned to give up that much of my time, and so I’m glad it’s going to be–
PIPPIN: Yeah, but I’m sure it’s worth it in so many ways, and not just getting out there and playing and coaching. But I bet it also helps your work side of things in terms of it makes the time that you are working more valuable and more focused.
BRAD: Oh, yeah. I’ve been real focused in the last few weeks, for sure, because I can’t afford to waste any time.
BRAD: Yeah, it’s been all right, and it’s going to be done soon, so it’s all good.
PIPPIN: Very cool.
BRAD: I’ve also been working on the Amazon S3 and CloudFront. Now we’re calling it WP Offload S3. The site for that, I’ve been working on off and on.
PIPPIN: Right, so that’s getting its own standalone site, right? It’s not part of Delicious Brains.
BRAD: Oh, it is. It’ll be under DeliciousBrains.com/WPOffloadS3.
PIPPIN: Right, but it’s like a site within a site.
BRAD: Yeah, it’s like a little micro site of the umbrella site, I guess. But it’s funny. We started tracking our time a few weeks ago using Toggl. I don’t know if you’ve ever used that.
PIPPIN: Yep. I used to use Toggl. It was the time tracker I used when I was doing contract development.
BRAD: Yeah. I haven’t tracked time since then either, so it’s been a couple years, which is really cool. The record timeline feature where you have this little desktop app, and you enable this record timeline, and it records what apps you’re using. If you forget to turn on your timer, you can go back and see, like, oh, I was using my Web browser and I was using it in GitHub, so I guess I was reviewing issues or something.
PIPPIN: That’s awesome. I always found that was one of the biggest issues with time tracking outside of, like, tracking for just contract projects is that I would forget to track, or I would get distracted. And so then I realized that I spent an hour on this, but I really spent 30 minutes browsing Reddit.
BRAD: Yeah, or you’ve got a timer running, and then you start doing something else and you forget to change it.
BRAD: You’re like, when did I actually start doing something else? You have no idea.
BRAD: Now you can actually look back on your timeline. It’s super cool.
PIPPIN: That’s really cool.
BRAD: It’s super cool, and it’s kind of given, like, renewed my faith in time tracking because I found that was a huge problem.
PIPPIN: You’ve mentioned this now, I think, over the last couple of episodes that you guys were looking at doing time tracking and starting it. What’s your overall goal with it? Is it just to get a better idea of what everyone is working on, or are you trying to refine processes?
BRAD: It’s the former, so I just wanted to know how much time everyone spends on support, and how much time our guy are spending writing their articles for our blog. I just wanted more insight into how much time is being spent in the aggregate on certain things. Then I can decide in the future, maybe it’s worth it to hire a support person, a dedicated support person now because we’re spending 40 hours a week on support, or something, you know. Those kinds of things are pretty valuable.
I’ve also noticed that it’s just, for me personally, really good to keep me on task because, when you go to browse Reddit for half an hour, that doesn’t get tracked. Your timer is not running because you’re screwing around, right? It kind of keeps you on task a little bit better, I’ve found. I’m noticing an improvement there.
It’s funny. I was going through my time that I’ve tracked for the last couple weeks for this podcast. I just wanted to know what I was working on. I’m like, holy crap. I’m a project manager now. I don’t spend much time doing any coding. I’ve been working on the site a little bit, but that’s about it. Everything else has been just managing the team, managing the project. Yeah.
PIPPIN: It’s interesting when that realization hits you.
BRAD: Yeah. I kind of knew, but I was kind of in denial, I think.
BRAD: I’m totally still a developer.
PIPPIN: I think I realized that or started to maybe three to six months ago. It just kind of hit me. In ways, it made me sad. In other ways, it was really validating because, when you realize that you’ve stepped into that role, it means that you’ve done something cool. There are other people working with you for a bigger vision than just what you personally want to work on.
BRAD: Yeah, absolutely. It’s also made me realize that maybe it’s time to start looking for a project manager to take over some of these tasks from me, and so that I can do things that maybe I’m better suited to do.
PIPPIN: Yeah, definitely.
BRAD: That’s another byproduct.
PIPPIN: It might show you that you are the project manager, and you need to find someone to take over the other stuff that you were doing.
BRAD: Yeah, exactly.
PIPPIN: There you go. It could be either way.
BRAD: This is all a byproduct of tracking time, so I’ve found it to be a really good exercise, and I think we’re going to continue.
PIPPIN: Do you think it’s something that you will continue to do?
PIPPIN: Or will you stop tracking it after you’ve kind of figured out what percentages of your time is spent where?
BRAD: I think it’s good to just keep doing it because keeping yourself on task is good all the time, right? It will be good into the future. Then, I think we can still use the data for making decisions in the future about other things, so I can’t imagine why I would stop doing it at this point, unless the team really hates it and it’s causing grief. Then that would be a reason to stop it.
PIPPIN: Sure. It always made me reluctant to track time because I was always afraid that it was going to be a burden or a distraction on top of what I’m actually trying to do. In a way, that’s actually what you’re trying to resolve by tracking time. You’re trying to get yourself more focused. You’re trying to see how much time you’re actually spending working on the things that you need to be working on. It always made me wonder if it would be that distraction, it would be that extra thing that I have to do that’s preventing me from getting my work done.
BRAD: Yeah. It is a little annoying when you have to switch. If you switch between tasks a lot, it’s a little annoying. But, really, should you be switching tasks that rapidly?
PIPPIN: I was just going to ask that. Do you think it makes you switch between tasks less rapidly?
BRAD: I think it does because I hate having to stop and start a new timer every time. Maybe sometimes I’ll just say, “No, I’m going to keep the timer going. I’m going to keep on with this task, and then I’ll go to the next one.” I think it has definitely some positive impacts.
I also realized, looking at the time, that I spent a lot of time planning the Miami trip. We did a company meet up in Miami just around WordCamp Miami.
PIPPIN: Yeah. That was, what, last week or the week before that?
BRAD: Yeah, it was the week before last, yeah. Yeah, the end of — kind of the last week in May. It was just so great to meet. We all met for the first time. Not one of us had met another one in the team in person.
PIPPIN: That’s awesome.
BRAD: It was almost overwhelming at first.
PIPPIN: How many people are on the team now?
BRAD: There’s five, plus myself, so five full-time, plus myself. It was a bit overwhelming that you’re walking in the door and, like, having to meet because, when I got there, there was already four, yeah, four other guys there. I walked in the door with my wife and my eight-month-old son. I’m shaking guys’ hands for the first time and meeting their spouses and girlfriends. It’s just like, “Whoa! This is crazy.”
PIPPIN: Yeah. Nuts.
BRAD: The roof – the roof is leaking from the shower upstairs.
PIPPIN: Oh, no. House issues?
PIPPIN: This was an Airbnb you rented, right?
BRAD: Yeah, it was an Airbnb in Miami Beach, and it was a villa that could accommodate, I don’t know, it said 16 or something. But that was sharing beds and stuff, and it only had five bedrooms, but we had really a need for six, so I rented out the pool house as well, but the pool house was pretty sketchy. I ended up having to send one of the guys to a hotel for the duration of the stay.
PIPPIN: That’s unfortunate.
BRAD: That was kind of a bummer. But, in the end, after the shower was leaking, the roof was leaking, the place was a little dirty, it wasn’t as advertised. But, in the end, we all had a great time, and we enjoyed the pool outside and everything.
PIPPIN: Yeah, you make it work.
PIPPIN: You’re there to hang out as a team, get to know each other, not to just spend time in a nice house.
BRAD: Yeah. Yeah, exactly. It was a little bit of a bummer, and a little embarrassing for me because I was the one who booked it and stuff.
BRAD: And I’m meeting these people for the first time, and the house isn’t very good. It wasn’t ideal, but we persevered and figured it out. Yeah, that was really cool to meet everyone and just enjoy the weather for a while.
PIPPIN: Yeah, I bet it was awesome.
BRAD: The guys from the U.K. were just melting. They’re not used to that heat.
PIPPIN: Let’s see. You have, what, two from the U.K., one from New Zealand, and one from Australia?
BRAD: California – one from California, yeah. Three from the U.K., one from New Zealand, but he’s actually originally from the U.K., and one from California. The California guy was like, “Heh, it’s not that warm.”
PIPPIN: Right. Southern California, I assume?
BRAD: Yeah, Long Beach, so he was fine. I love the heat, so it didn’t bother me at all. But, yeah, the other guys were dealing, were coping, but I think they enjoyed it nonetheless.
PIPPIN: What else? Now that you’ve been back from Miami and you’ve had a week or two to get back into things, what’s keeping you busy?
BRAD: Well, we’re in the final stages of testing for Migrate DB Pro 1.5 and the multisite tools add-on, so that should be out in the next couple weeks. The free version will be out this week. It will have WP-CLI integration, so you’ll be able to run WP Migrate DB Export. You’ll be able to export your database with things found and replaced as a file.
PIPPIN: Wow. Beautiful.
BRAD: Without using a UI, so all on the command line. We’ll have docs and everything for that when we do it.
Another thing that I’ve been digging into a little bit is, we’ve had an issue with SiteGround’s staging site. They have a staging platform or whatever. The way it works is they do find and replaces on everything. They have a module in Apache that does find and replace on all output, so replacing the URL, the production site URL with a staging URL.
BRAD: We return serialized data in our output.
PIPPIN: It’s breaking your serialized arrays.
BRAD: It’s been breaking that, so what we’re going to do with that situation is that we’re going to scramble our serialized data. We’re just going to use a function of PHP called ROT13, I think it is (R-O-T-13).
PIPPIN: I think I’ve used it once.
BRAD: Yeah, and I think it just changes the letters. It adds 13 to the ASCII code or something like that, and so it just changes all the letters.
PIPPIN: Is there a reason you guys used serialized data instead of json?
BRAD: Good question.
PIPPIN: If you had used json, you wouldn’t have that problem, right?
BRAD: Absolutely. You’re right. Although, we would still have a problem with the URL changing, you know, if that actually caused a problem.
PIPPIN: Right, but it wouldn’t break your data integrity.
BRAD: It wouldn’t break the data. The problem with json is that it’s UTF8 only, so if there’s any data in those requests that’s not UTF8, then it breaks. The json won’t–
PIPPIN: I always forget about that.
BRAD: Yeah, it won’t even in-code the json, I don’t think. I think you’ll just get an empty string back or something. Yeah, you have to be careful of that.
BRAD: I’d love to be able to use json for that stuff because it would really save us a lot of headaches that we have with serialized stuff.
PIPPIN: Well, cool.
BRAD: But anyways, maybe we should move on.
PIPPIN: I’m sure those updates are going to be awesome. I know I can’t wait for them. We’re getting ready to use Migrate DB again for our EDD site. We’re pulling it over to a staging site, and I was going to use that to update our staging from our live site.
PIPPIN: Can’t wait.
PIPPIN: Yeah, let’s move on.
BRAD: Let me know how that goes.
PIPPIN: I will, definitely.
PIPPIN: I expect it will work flawlessly.
BRAD: Cool. Do you want to introduce this topic?
PIPPIN: Sure. All right, so a few months ago, around when WordPress 4.2 came out, I think we actually covered it before 4.2 came out. We talked a little bit about the taxonomy term splitting that was happening in WordPress 4.2. This has been an ongoing project for WordPress core from a long time ago.
BRAD: Yeah, I think the original post Nacin put out there was 2013.
PIPPIN: Yeah. It looks like July 2013, which I think that’s pre-4.0 or just after 4.0. The taxonomy tables in WordPress have had a problem for a long time. Now, this problem is pretty much solved now, but it was a problem for a long time. It was that it had shared taxonomy terms.
What that really meant was that if you have two taxonomies, let’s say genres and categories, and you have a term called “book” in both of those, or let’s say we’re talking about music, and so you have a category of “rock,” and you have a genre of “rock.” Both of those could actually be the same term in the WP terms table shared across both taxonomies. This was a problem. This was a major problem.
BRAD: Yeah, because if you update that–
PIPPIN: It gets updated in both taxonomies.
BRAD: Both taxonomies.
BRAD: That becomes a real big problem if it’s, like, let’s say you had a taxonomy that was music genres and another one that was textures, and both had “rock” in it. You updated one to “rock-n-roll,” right? Then your texture would be called “rock-n-roll,” not “rock.” That’s an example where it would totally fail.
PIPPIN: Another place where it was probably super prevalent were plugins that use maybe like private taxonomies or they’re storing data in a taxonomy and the plugin goes and updates a term. Well, if that term, even though it’s a more “private” taxonomy, if that term existed in another taxonomy, you may be breaking that there. This was a problem. It was a big problem.
In WordPress 4.1, the first step was done to resolve the problem. There’s a column in the terms table, I believe, called Term Slug. It’s just a slug of the term. If your genre was called “Book” with a capital B, your slug would be “book,” all lower case, usually.
Well, that used to have a unique on it, so you could not have two slugs that were the same. Instead, you would have books-2, books-3, books-4, etc. if you ended up having multiple terms, or they would just be the same term.
In WordPress 4.1, the unique was removed, allowing us to have slugs with the same term, but different rows. This was a great improvement because it prevented shared terms from getting created in the future because, previously, if you tried to create a term with a slug that already existed, it just used the ID from the existing term. Now, this no longer happens because the term slug is no longer unique – a big improvement.
Now, in WordPress 4.2, the core developers tackled the problem of, well, how do we fix the existing data? So we’ve prevented this from happening to future data, but how do we go back and fix old data? Well, this was a two-step process.
The first one, this is where things get a little crazy. They had to change it so that if you update a term in one taxonomy, it doesn’t affect the term in the other taxonomy. That change was introduced in WordPress 4.2. Now, if you say wp_update_term, I think is the function, it would no longer affect both terms. This is a pretty important change because it meant suddenly — well, it meant a couple of things.
First, it meant that if you update a term that was a shared term, what’s going to happen is it’s going to create a new term, so it’s going to say, “Here’s the old one. Let’s make sure that term still exists. And now let’s create a new one based upon the update that we’ve just made.”
PIPPIN: And so now we have two terms.
PIPPIN: This causes a bit of a problem if you have another plugin or you have something else that is looking at that term and suddenly the term ID has changed.
PIPPIN: You disconnect it.
BRAD: Or it references the old term, which is not the right one.
PIPPIN: Right. Right, yeah, which is a problem. Brad, why don’t you walk us through what they did in 4.2 to help plugins here. Know that this is a problem that affects some pretty large plugins. Just to give a few names: Jetpack, WordPress SEO by Yoast, also Yoast’s Google XML Site Maps, Ninja Forms, Advanced Custom Fields, Paid Memberships Pro, WordPress Download Manager. These are all plugins that are potentially affected by this.
BRAD: Yeah. The article that we’ll link up in the show notes says that @mboynes. I don’t even know who. Who is that? Do you know? Anyway, somebody, some developer scanned the top 100 most popular plugins on .org and only found 11 that were vulnerable or had problems that needed to be dealt with due to this change. You know, that’s pretty good, 11 out of 100 of the top.
PIPPIN: To be fair, 11 out of the 100 most popular include plugins that are installed on millions and millions of websites.
BRAD: Right, but I mean that’s–
PIPPIN: Right, it’s not like a massive number of plugins, but it is a massive number of websites.
BRAD: Yeah, and those are very advanced plugins too. Those are very complex, mostly, those plugins. I don’t think that’s a coincidence, right? The ones that are more complex are more likely to have this problem because they’re doing a lot of things. This just happens to be one of them. The WordPress core developers, I don’t think they did anything special to mitigate the problem besides just letting these plugins know that they need to update their code, right?
PIPPIN: There is one thing they did.
PIPPIN: In this post that we’ll link up, which is called Taxonomy Term Splitting 4.2, a Developer Guide, this is from Boone Gorges. Boone is one of the primary developers that worked on this entire issue to improve it.
They introduced a new action. Right now, in 4.2, if you call wp_update_term and it ends up splitting a term, so splitting a term simply means that there were two terms. There was a shared term. You updated it. Now there are two terms. We’ve split it in two so that they are no longer shared. Whenever that happens, there’s an action that’s fired.
BRAD: Aah, right.
PIPPIN: It’s called Split Shared Term.
PIPPIN: What they did is, Boone showed an example to illustrate that if your plugin or a plugin does rely on terms like this and you run into an issue where spitting shared terms breaks some of your functionality because now you’re term ID is wrong, there’s an action that you can use to update it at the time the term is split.
And so, the example that they gave actually came from Jetpack where Jetpack had an option like featured post is one of the modules in Jetpack. If you tagged or set a post as featured, that used a taxonomy. Well, that taxonomy could potentially have terms shared with another taxonomy, which could mean that you could then split that term.
Any time a term is split, Jetpack goes in and looks at it and says, “Hey. If the old term is the term ID that we are using for our featured content, or if it’s identified as featured content and it comes from the post tag taxonomy, let’s go and update our settings to make sure that we have the new term ID so that we don’t lose our featured content.”
PIPPIN: Okay. This is a very long and intensive process, but now that gets us to WordPress 4.2.
BRAD: Yep. These are just baby steps, really.
PIPPIN: Yeah, yeah.
BRAD: For what’s coming.
PIPPIN: Yeah, so this is where we are today. These are all really important changes that have been done. I want to point out that even though they’re little changes, doing things like splitting terms and suddenly having two terms and having two term IDs could have had catastrophic consequences if not done really well and if large developers like Yoast and Advanced Custom Fields, etc. were not notified ahead of time. As far as I am aware, there was no massive damage caused by any of this anywhere.
BRAD: Not yet.
PIPPIN: I have never seen a single example.
BRAD: Not yet because they’re only splitting terms when a term is updated right now.
PIPPIN: Right. But still, if you did have a plugin or there were sites where terms were updated all the time, you would see that issue.
PIPPIN: I have never heard of it, right.
BRAD: I don’t think terms are updated very often. I think it’s a very–
PIPPIN: Honestly, I don’t think they are either.
BRAD: –rare event, right?
PIPPIN: I think that is a great example of doing this the really smart way of saying, “Hey, look. This is a major issue. Let’s address it in small steps over time.”
BRAD: Yeah. Yeah, and time is important because it gives developers time to become aware of it and act, right?
PIPPIN: Absolutely. And, when you do it in small steps, you get the option. You get to look and see, “Okay, so we’re splitting terms now. Here’s where we saw things fail or have trouble. Now we solved that on a small scale. Let’s address it now before we cause massive problems.” What are they doing in 4.3 now?
BRAD: 4.3 is, well, we’re starting to work on it, right? Isn’t that it’s underway?
PIPPIN: Well, right. 4.3 is being worked on. But related to shared terms, what’s happening there?
BRAD: Yeah, so the plan is to do a split, so split all the shared terms when the upgrade happens. When you upgrade to 4.3, it’ll run a database or run some code and go through all your terms, find the ones that are shared and split those into as many extra rows as are needed so that you don’t have any shared terms anymore. Depending on the size of your database and how many terms you have and how many are shared, this could take a long time, especially if you’re running a multisite network site or a network of networks, which is a funny thing.
PIPPIN: But it’s a very real thing.
BRAD: Yeah, it is.
PIPPIN: It’s something that I’ve never played with, but I keep meaning to just turn it on just to see what it’s like. I’ve never done it.
BRAD: It’s like inception or something.
PIPPIN: Yeah, it really is. Can you have a network of networks of networks?
BRAD: I don’t know, man.
PIPPIN: I’m not sure if it goes three levels or more. There may be only two levels.
BRAD: That’s crazy.
PIPPIN: Yeah, ridiculous.
PIPPIN: This is happening in 4.3, or at least, I think, as far as we know it’s happening. That’s the plan.
BRAD: They’re planning to do it in 4.3. There’s a post that Boone put out that says, “That’s the plan. That’s what we’re doing.”
PIPPIN: I like the way that he put it. “Plugin authors, dress rehearsal is nearly over.”
BRAD: Yeah, exactly, because once they do this, it’ll really expose any issues.
PIPPIN: Right, because instead of just saying, “Hey, let’s update it when a term is updated,” we’re just going to fix them all immediately.
PIPPIN: And so, if a plugin hasn’t added the split shared term action and is not updating appropriately, it’s going to break the plugin.
BRAD: Yeah. I think the biggest risk here is sites that a developer built. I built sites ages ago that are still running today, right, years ago, like three, four, five years ago. They’re still running as-is today, and I don’t even know if they’re updating WordPress, the clients that I used to have, or what. My theme from back then, maybe it used a term ID here or there, and maybe this upgrade will break it. I could see there might be some of that happening when this upgrade goes through, but it still think it’d be fairly rare, right? There won’t be that many instances.
PIPPIN: Right. I think it’s likely that we will definitely see something break somewhere, but I don’t think it’s going to be that prevalent. It’s not going to be — it’ll be a far less major of an issue than the UTF8M — what was it called?
BRAD: UTF8MB4, yeah.
PIPPIN: MB4 – there we go. That upgrade was far more significant in terms of problems it caused than this will be, I think.
BRAD: It’s still causing problems because that issue is a tricky one because it’s tough to go backwards from that. So, when you upgrade your site to UTF8MB4, it’s hard to go backwards back to UTF8. And so, if your live site is running whatever it is, MySQL 5.5.3 or greater, and it upgrades the database to UTF8MB4, and your staging site, for example, is running an older version of MySQL, then you’re in trouble. That’s going to be a problem.
But it’s interesting. This is just, again, baby steps. Even the 4.3 change will really be a baby step. I think the end goal is to get rid of one of those tables, right? Get rid of one of the term taxonomy tables. I believe they’re going to get rid of the terms table and then take the columns from that table and put it into the term taxonomy table.
PIPPIN: According to Nacin in the post that we mentioned from July 2013, that’s the idea is to potentially drop WP terms.
BRAD: Right, and that’s huge because you’re getting rid of a bunch of joins, right? You’re taking one join out of a lot of queries. That’s going to be–
PIPPIN: Just super cool.
BRAD: That’s going to be–
PIPPIN: For anybody who is not familiar with them, and this is something I only just started learning about recently, but the idea of — okay, let’s step back for a second and think about this backwards compatible. We have a table. We have our database, and we have tables inside of it. This table is used everywhere. Suddenly, people are talking about dropping the entire table.
Your immediate response should be, “Well, you’re going to break every single query calling WP terms, and your head should explode.” The cool thing is that there’s actually a way to get around that by having what’s called an SQL view, and it’s really, really cool. I love that idea that we can remove all of this data over here and yet, anybody who is trying to access it, oh, it still works just fine.
BRAD: Right. Yeah, exactly. It’ll be interesting to see how they implement that or if they even go ahead with it because it was really just a proposal at this point.
PIPPIN: Right, this is not set in stone.
BRAD: Yeah, because I think — yeah, I can’t remember what version of MySQL started to support views and how well it is supported. It’s probably been supported for quite a long time though, I would guess.
PIPPIN: Yeah. There’s a line in Nacin’s blog post that I think is pretty cool. If you use a view table and so you use a view and you get rid of WP terms, do you know how much code it takes to make WordPress run without that table?
PIPPIN: It’s about 20 lines of code to delete, like remove the entire table and make everything work just fine.
BRAD: Right, right.
PIPPIN: Which is awesome.
BRAD: Yeah, that’s cool.
PIPPIN: All right, so there’s a whole lot that’s happening with this, in baby steps, which I think is a really excellent way to go about it, both for WordPress core, but also for any plugin developer that’s building large plugins. And not just plugins, but any development, any project that you’ve worked on, I think this is a really great example of how you can solve painful issues over time in really smart manners.
Aside from just making things better, do you think there’s a bigger vision for this? What’s the real reason all of this needs to happen? I don’t know if we have a lot of time to get into it, but maybe just think about a couple of ideas. What does this mean?
BRAD: Well, one of the things that Nacin mentioned in his original roadmap was how it could potentially, like the table, the relationship table could morph into or morph from a post to taxonomy relationship to an object, to object relationship table. And so, you could build relationships between posts and users because right now it’s just posts and taxonomy, right? Post to taxonomy, posts and users, posts and posts: you could do all these kinds of relationships in WordPress core itself without the need for any external plugins or any additional custom tables, and it would all be performant is the big thing. People are doing this now, but mostly using post meta, so sticking an ID in a post meta, like the value column in the post meta table, which is really bad for performance.
PIPPIN: It’s … really quickly on a large scale.
BRAD: Yeah, it’s terrible. You have experience with this recently, didn’t you, with EDD reports or something?
PIPPIN: Yeah, it’s an issue. It actually goes back to what I was mentioning with our batch processing. A lot of the EDD data, due to an unfortunate decision early on, is stored in the WP post and WP meta table, which makes it difficult to do large-scale queries or calculations on data because everything gets so slow.
PIPPIN: It takes a lot of–
BRAD: The reason for that is because the value column in the post meta table, the type is long text, which you do not want to be running, for example, a sum aggregate function.
PIPPIN: Right. You shouldn’t be doing queries on that or searching it or calculations, like a long text. You should just be dumping that data straight out. That’s it. And so, it’s caused a lot of issues in our reporting because we’re limited on what we can do with reports because if we try and query a ton of the post meta and to do calculations on it, it runs into problems.
That’s where our batch VSP processing came into play because now, because of the fact that we can do everything in batches, even though it can take awhile, if we have to break things into 10,000 little tiny queries, we can in order to extract the data. This is going to lead us into a lot better reporting in the future because we will take the same infrastructure that we have with the batch export and use it to build, say, batch reporting. When we go to build a graph, right now when you load the page, we just run a whole bunch of queries, and we build the graph dynamically.
What could very likely happen in the future is, you’ll load a graph. You’ll say, okay, this is the data I want, this date range, these statuses, these parameters. We will then run a batch processing to go build that data set. Then, once we have the data set, we present it. This leads us back to this, back the terms and taxonomy stuff because of what you were starting to mention about post relationships.
BRAD: Yeah. It’s interesting what you just described because, early on, you made that decision largely because you didn’t want to create additional tables or columns in the standard WordPress schema because it’s not really the WordPress way of doing things. It doesn’t really fit into WordPress core. That’s a good reason not to do it.
PIPPIN: I think it is a good reason to a degree.
PIPPIN: It was also extremely naïve.
BRAD: Right, but–
PIPPIN: Which I found out after the fact.
BRAD: Right, but imagine if, in the future, WordPress has a proper mechanism for creating those relationships properly. You know, it’s going to alleviate a lot of those performance problems–
BRAD: –due to people just trying to fit things into the WordPress schema.
PIPPIN: Right. I love the idea of moving WordPress core in the direction of having a full database layer and API that is designed to fit data in, instead of– I guess what I mean by that, right now it’s so easy to go create a custom post type, drop some post meta in, and just be like, “Hey. Great. It works.”
BRAD: Yeah – for you.
PIPPIN: It’s too easy is the problem because it means that you do this, and everybody starts here. You do this, and you say, “Awesome. It worked,” without stepping back and thinking about the consequences of that, which is very common. Everybody goes through this. Honestly, you don’t usually have a reason to think about the consequences until you have experienced those consequences.
If WordPress core makes these changes to the database tables and makes it possible for us to build these better relationships with custom data, that’s going to be awesome.
BRAD: Right. I think another thing that Nacin mentioned was taxonomy meta data was the other thing. Taxonomy or a term right now really just has a title and a slug, I believe, and so you can’t really attribute any extra information to it. That’ll be cool if you can create additional bits of data that you can associate with….
PIPPIN: Yeah. I think there’s a discussion of even having a table for term meta. Kind of like we have post meta and user meta, we would have a term meta table.
PIPPIN: When you have those good relationships, it could be very, very valuable. Have you ever done custom term meta right now?
BRAD: I don’t think I have, no.
PIPPIN: It’s kind of a pain in the ass.
PIPPIN: You store it in an options table or your own custom database table.
PIPPIN: It’s actually kind of funny. I wrote a blog post on it two and a half, three years ago, for a tutorial on how to create custom term meta. The idea was like, if you just want to create one or two little things that relate to a specific term. For example, at the time when I wrote it, I was actually doing some contract work for a guy that wanted to automatically password protect any post filed in a particular category.
And so, I created “term meta” that contained the password for that. Then when you tried to view a post that was in that category, it would ask you for the password. Then you would enter the password. Basically, it worked the same way that the post password does, but it was for any post in that category.
I wrote this tutorial on it, and the idea was based around, like, let’s do something really basic. Let’s have one value for a couple of terms. Here’s how you can do it. It is the most commented post on my website, or very close to it. Every single week, there are new comments coming in from people trying to create meta data for, like, hundreds or thousands of terms, doing all of this advanced stuff. In a way, it’s kind of cool because it showed that there was a need. In another way, I kick myself every time a comment comes in for writing that damn tutorial.
BRAD: Right. Why? Because it’s just a bad way?
PIPPIN: Because it’s really not a good — the way that I presented it, which is just creating an option in the options database that’s something like term_theideaoftheterm_uniquekey, is not something that we want to encourage people to do.
BRAD: Right, right.
PIPPIN: And so I kick myself when I see people trying to do a whole bunch because I was helping to promote this. At some point I would like to go back and maybe revisit it and say, “Hey, okay. If this is something that we’re going to do, let’s actually go in. Let’s create a table, and let’s really do this well.”
PIPPIN: But it takes time.
BRAD: Yeah, exactly. Well, should we wrap it up?
PIPPIN: There’s one other thing I want to throw out real quick, which is, if you have a plugin that is affected by term splitting and you have or have not taken care of the issue of term splitting in your plugin, I would love to hear about it. If you had something break, I would love to hear about it. If everything worked great, I think we’d love to hear about it.
For anybody, if you have anything, or just any questions about it, shoot us an email, post a comment, go post a comment on the make blogs. We’ll have links to all of the posts on there. It’s a really cool and fascinating change in WordPress.
BRAD: Well, I guess I’ll say another thank you to DreamHost for sponsoring this episode.
PIPPIN: Absolutely. Thanks, guys.
BRAD: Should we give the WP Ninjas a shout out as well?
PIPPIN: Oh, yeah. WP Ninjas always need a shout out. They’re a bunch of awesome guys over there at WP Ninjas doing some really cool things, and they’ve been kind enough to permanently sponsor Apply Filters, and so their continued sponsorship, along with the sponsorship of DreamHost and anyone else is really appreciated because it’s what allows us to bring you the show.
BRAD: For sure.
PIPPIN: Thanks for chiming in, everyone.
BRAD: Thanks, everybody.