[英]PHP-based LaTeX parser — where to begin?

The project: I want to build a LaTeX-to-MathML translator in PHP. Why? Because I'm a mathematician, and I want to publish math on my Drupal site. It doesn't have to translate all of LaTeX, since the basic document-level stuff is ably handled by the CMS and wouldn't be written in LaTeX to begin with; it just has to translate math written in LaTeX into math written in MathML. Although I feel as though I've done my due diligence, this doesn't seem to exist already. Maybe I'm wrong---if you know of something that would serve this purpose, by all means let me know, and thank you in advance. But assuming it doesn't exist, I guess I have to go write it myself.

项目:我想在PHP中构建一个latexto - mathml转换器。为什么?因为我是数学家,我想在Drupal上发布数学。它不需要翻译所有的乳胶,因为基本的文档级的东西是由CMS处理的,并且不会以乳胶的形式开始写;它只需要把用乳胶编写的数学转换成用MathML编写的数学。虽然我觉得我已经做了尽职调查,但这似乎还不存在。也许我错了——如果你知道有什么事可以达到这个目的的话,请让我知道,并提前谢谢你。但是假设它不存在,我想我必须自己去写。

Here's the thing, though: I've never done anything this ambitious. I don't really know where to begin. I've used PHP for years, but just to do the standard "build a CMS with PHP and MySQL"-type of stuff. I've never attempted anything as seemingly sophisticated as translation from one language to another.


I'm just dumb enough to consider doing it with regex---after all, LaTeX is a much more formal language, and it doesn't allow for nearly the kinds of pathological edge-cases, as say, HTML. But on the other hand, I'm just smart enough to realize this is probably a terrible idea: now I have two problems, and I sure don't want to end up like this guy.


So if that's not the way to go (right?), what is? How should I start thinking about this problem? Am I essentially writing a LaTeX compiler in PHP, and if so, what do I need to know to do that (like, should I just go read the Purple Dragon book first?)?


I'm both really excited and pretty intimidated by the prospect of this project, but hey, this is how we all learn to be programmers, right? If something we need doesn't exist, we go and build it, necessity is the mother of... you get the point. Tremendous thanks to everyone in advance for any and all guidance you can offer.


6 个解决方案













what is wrong with any of these?




Don't write the parser yourself unless you want to do that as a learning experience. Just call existing LaTeX toolchains from PHP.


LaTeX2HTML is about as good as you're going to get, and here's an (old) description of a LaTeX to MathML convertor from the maintainer of LaTeX2HTML.




I actually had a go at this last year. I got something working, though I wouldn't claim it had any elegance or charm to it, nor was it fully functional.


If you want to convert equations to MathML, rather than full LaTeX conversion, then you could use itex2MML. If you can load extensions into your PHP, it's possible to compile itex2MML with PHP-bindings and use it natively in scripts. The Makefile might need a bit of hacking to get all the configurations right.






Alright this answer was a mess.


Here's a cleaned up version:


Since regex's clearly won't cut it for a translator for this type of thing, you have two options, based on your goals:


  1. You just want to be able to display LATEX on your site one way or another.
    • If this is what you want, there is a simple solution somewhere out there for you that is easier than picking up an advanced book on compiler theory. Either some way to include latex on your site, an existing translator, or something.

    • 如果这是你想要的,那么有一个简单的解决方案,比阅读一本关于编译理论的高级书籍更容易。要么是在你的站点上包含latex,要么是现有的转换器,要么是其他东西。
  2. 您只是希望能够以这样或那样的方式在站点上显示乳胶。如果这是你想要的,那么有一个简单的解决方案,比阅读一本关于编译理论的高级书籍更容易。要么是在你的站点上包含latex,要么是现有的转换器,要么是其他东西。
  3. You are a keener, and want to learn about compiler theory.
    • If this is the case, I cannot recommend the PDB highly enough. It's a fascinating book, and you'll learn a lot from it; After the first two chapters, you will have learned enough about lexical analysis to complete this project. Best money I've spent on an educational resource to date!
    • 如果是这种情况,我无法充分推荐PDB。这是一本迷人的书,你会从中学到很多;在前两章之后,您将学习足够的词汇分析知识来完成这个项目。到目前为止,我花在教育资源上的最好的钱!
  4. 你是个热心的人,想学习编译理论。如果是这种情况,我无法充分推荐PDB。这是一本迷人的书,你会从中学到很多;在前两章之后,您将学习足够的词汇分析知识来完成这个项目。到目前为止,我花在教育资源上的最好的钱!



If you are okay with converting formulas to pictures, there are tons of solutions. If you want MathML specifically, there are several of those as well. However, you might consider jsMath which uses javascript to render (a subset of) LaTeX in the browser. It's used by Sage and works well there.




Wikipedia is using a LaTeX to HTML(or image) translator written in OCaml. You can borrow some code there, or just use it as is.




  © 2014-2022 ITdaan.com 联系我们: